queenlua | Jan. 15th, 2013

"Wings Dancing in the Darkness" reads like Margaret Atwood
"Every Little Thing" reads like Chuck Palahniuk
"Delicately, Madly" reads like Charles Dickens
"White Like Bone" reads like Anne Rice
"Pyre" reads like Raymond Chandler
"Dog in the Vineyard" reads like Dan Brown (...yuck)
"Crush" reads like Chuck Palahniuk
annnnd Remnants of Restoration reads like Kurt Vonnegut

Conclusion: either my writing is wildly inconsistent or the website's algorithm is, and I strongly suspected the latter...

...but then I discovered the source code for IWL is available online (eee) so I decided to poke at its innards for a bit and see what's what

( Lua sets up a local instance and installs shit: the liveblog! (terribly boring do not read) )

Once I had a local instance running, I decided to do some experiments for teh lulz (and perhaps tangentially teh science).

I cleaned out the authors included with the IWL download and used some fanfic authors instead: arbitrarily I chose myself,

amielleon, and

mark_asphodel (hello, unwitting volunteers! :D;;; ). I used the three latest fics by these three authors for training data, then took a few of the other works by each author to see how accurately IWL could guess the true author of a work:

( Data! )

...okay wow, based on that data, IWL seems to suck. Badly. As in, a-random-number-generator-could-do-a-better-job-for-anyone-not-named-Mark¹.

Time to look at the code and see what the methodology at play is...

Analysis seems to be based on both "tokens" and "readability"

The readability metric is just the Flesch Reading Ease score, which has been discussed here before as being a somewhat problematic and inconsistent metric

Tokens is more unclear to me on this quick skim, but what I'm pretty sure is going on is: they're basically making a giant table of "words appearing in the text plus their frequencies," and based on that, they calculate a "rating" based on how the relative probability of those words is distributed (i.e. if A and B both use the words "obnoxious" and "teetotaler" a lot, the algorithm will notice that and assume A and B are more similar)

...so yeah, while the metrics IWL uses are better than a random number generator, they're still pretty unrigorous/underwhelming (quite possibly by design—I know I've seen this website pop up in my friends' circles more than once, and it does make a fun little two-minute time-waster when you first stumble upon it—it doesn't really need to be The Greatest Algorithm Evar ^TM to accomplish that).

( Footnote )

Lost in Lualand

Jan. 15th, 2013

Jan. 15th, 2013

I Write Like

Profile

Links

Expand Cut Tags