queenlua | I Write Like

"Wings Dancing in the Darkness" reads like Margaret Atwood
"Every Little Thing" reads like Chuck Palahniuk
"Delicately, Madly" reads like Charles Dickens
"White Like Bone" reads like Anne Rice
"Pyre" reads like Raymond Chandler
"Dog in the Vineyard" reads like Dan Brown (...yuck)
"Crush" reads like Chuck Palahniuk
annnnd Remnants of Restoration reads like Kurt Vonnegut

Conclusion: either my writing is wildly inconsistent or the website's algorithm is, and I strongly suspected the latter...

...but then I discovered the source code for IWL is available online (eee) so I decided to poke at its innards for a bit and see what's what

in a sort of cute move they decided to write this in some hipster language i've barely heard of

...okay what kind of programming language does not have a simple "make install" command and instead gives me some bullshit GUI and forces me to manually set my path geez

...oh fuck i overwrote my path, welcome to n00b mistake of the night, oh fuck ls and vim are not working did i just break bash

crisis averted (but that was the most terrifying handful of minutes in my life)

uh okay evidently hitting "Analyze" on my local instance gets me a page that says "not found" that seems sort of useless

mm i love the feeling of adding my first expletive to the code (404 errors are much more attractive as "the fuck")

uh okay there's a bug somewhere in dispatch-rules what's that about

bluuuh this is hard to fix without an actual debugger but the racket documentation's pretty vague about how i might use such a thing via the command line

oh interesting, evidently there's a compatibility issue between Racket 5.3.1 (which I was trying to use) and Racket 5.1, which was causing my instance to Not Work ^TM. there's a known compatibility issue between 5.0 and 5.1 but nothing online about this issue; I'll file a bug report and maybe look into it in the morning

Once I had a local instance running, I decided to do some experiments for teh lulz (and perhaps tangentially teh science).

I cleaned out the authors included with the IWL download and used some fanfic authors instead: arbitrarily I chose myself,

amielleon, and

mark_asphodel (hello, unwitting volunteers! :D;;; ). I used the three latest fics by these three authors for training data, then took a few of the other works by each author to see how accurately IWL could guess the true author of a work:

Lua's stuff: IWL incorrectly thinks that Mark wrote "White Like Bone," "Dog in the Vineyard," and chapter 1 of Remnants of Restoration. It correctly thinks I wrote "Pyre" and "Crush." (Accuracy: 2/5)

Ammie's stuff: IWL incorrectly thinks I wrote "lucius listens to the rain," "Ghost Stories," and "a visitor at any hour." It thinks Mark wrote "Coin in Palm" and "New World." It correctly thinks Ammie wrote "In Questioning Ghosts." (Accuracy: 1/6)

Mark's stuff: IWL correctly thinks that Mark wrote "Gold for Salt," "Blackout," "The Losing End," and "In Transition." It thinks I wrote "Without Vocation." (Accuracy: 4/5)

...okay wow, based on that data, IWL seems to suck. Badly. As in, a-random-number-generator-could-do-a-better-job-for-anyone-not-named-Mark¹.

Time to look at the code and see what the methodology at play is...

Analysis seems to be based on both "tokens" and "readability"

The readability metric is just the Flesch Reading Ease score, which has been discussed here before as being a somewhat problematic and inconsistent metric

Tokens is more unclear to me on this quick skim, but what I'm pretty sure is going on is: they're basically making a giant table of "words appearing in the text plus their frequencies," and based on that, they calculate a "rating" based on how the relative probability of those words is distributed (i.e. if A and B both use the words "obnoxious" and "teetotaler" a lot, the algorithm will notice that and assume A and B are more similar)

...so yeah, while the metrics IWL uses are better than a random number generator, they're still pretty unrigorous/underwhelming (quite possibly by design—I know I've seen this website pop up in my friends' circles more than once, and it does make a fun little two-minute time-waster when you first stumble upon it—it doesn't really need to be The Greatest Algorithm Evar ^TM to accomplish that).

¹ It is probably worth noting that the fics used for Ammie's training set might've skewed her results; "Benefits" and "In the City" are perhaps not the most representative samples from her corpus. Whups.

Flat | Top-Level Comments Only

From:

raphiael

There are quite a few images going around the tubes of ridiculous inputs being met with famous authors. I'm pretty sure that one "gimme the booty" song came up as Poe, resulting in the image I've used as my icon :B

But I was always curious as to how it worked. The "male/female" test that goes around every once in a while is at least upfront about it. (My writing usually comes up "masculine" -- and the "feminine" words are typically relationship-focused rather than environmental. Did not like that.)

But yeah. It's cool that you were able to rig that up like that! And the results are interesting, if not especially meaningful.

queenlua

I have not seen the Poe "gimmie the booty" output before. That is pretty fabulous :D

also, that male/female test one made me super-happy because when I got curious about how it worked, not only was there a pretty clear methodology, but the dude posted his master's thesis which was related to the topic and then I spent the afternoon trapped in academic CS papers /dork

Lost in Lualand

I Write Like

I Write Like

no subject

no subject

Profile

Links

Expand Cut Tags