queenlua: A black-and-blue jay perched on a branch. (Yucatan Jay)
Lua ([personal profile] queenlua) wrote2024-08-13 01:09 pm

playing around with text-to-speech

whenever i'm editing a piece that i'm being somewhat-to-very tryhard about, i usually make an effort to read the piece aloud to myself. ideally i'd read the whole thing, slowly, audiobook-style, but more often i'm doing some mix of that + "just muttering passages quietly to myself." it's pretty good for catching the sorts of errors that the brain's too good at "filtering out" while reading (e.g. repeating a word, an awkward dialogue tag, etc).

but, i got curious the other night about the state of text-to-speech software, because hey, that's one of the few domains where "just throw more GPUs at it" does seem pretty useful, and i ran out of podcasts for this week's commute, and yeah i'm absolutely vain enough to make a computer audiobookify my own shit haha.

so, lo, here's the random software i decided to play with after a google search. cursory observations:

* these voices are pretty good. like, you're not going to mistake them for a human reader (in particular there's a weirdly "clipped" quality to the way they finish a lot of their sentences, and a sort of monotony/regularity to the way in which they do end-of-sentence/end-of-paragraph-type pauses that sounds distinctly unnatural when you listen for longer periods of time), and i'd certainly rather pay money for the human-read audiobook version of any narrative i actually wanted to enjoy (the lack of any attempt to create different "voices" for each character is a huge drawback), BUT, this is leagues better than the standard-accessibility-suite robo-voice i remember from 00s-era mac osx lol. reasonably pleasant, not too grating, totally works for "being forced to hear my own writing" purposes

* this particular software absolutely cannot handle italics, rip. admittedly this ends up serving as a good reminder that i should be using italics less anyway, but, y'know, sometimes i do need that extra emphasis!!!

* the "audiobook" is excellent for forcing me to notice "stupid" errors (repeated words etc), and i think it miiiight give me a better sense of pacing fuckups? in the sense that, if i've been staring at a wall of text for a while, it's hard for me to get a sense of where a reader might lose interest, whereas if these words are washing over me while i'm in some standstill traffic on the f$@&*ing bridge again, i'm getting a decent intuitive sense of "ok how long is this part going on for & is it actually interesting"

* (unfortunately if i'm listening while in some standstill traffic on the f$@&*ing bridge again, i can't exactly, uh, stop to take notes or fix the manuscript right there, so i'm relying on "just remembering what sounded off," but eh a little mental exercise is good for you)

* i certainly wouldn't want to use this as the only source of reading-aloud-ness since the computer-voice-guy makes some repeated "flow" choices that i just think are WRONG lol. for instance, i think the voice guy gives literally every comma about equal weight, which makes any standalone super-short demarked-by-a-comma phrase, like this one, sound REALLY awkward in a way that i think any ordinary human reading a passage will not find awkward.

* tellius inside baseball observations: this tool pronounces tibarn as "TYE-barn", pronounces as reyson "rey-SON", and pronounces naesala "nae-SA-la," all of which i deem the WRONG way to pronounce those respective names lol. (i'm aware FE Heroes disagrees with me re: naesala, but that just means heroes is wrong too sorry!!!) also nikolias gets pronounced "neh-COE-li-as" which is ALSO wrong. and i made that one up so i'm objectively right here for sure lmao

further observations to be reported when/if they prove interesting
kradeelav: (Default)

[personal profile] kradeelav 2024-08-13 08:39 pm (UTC)(link)
somehow i knew this was going to be about Birb Manuscript before even clicking the readmore.

super fascinating!!! having used dragon naturallyspeaking once or twice back in the 00's as you mentioned it's awesome hearing that it's improved since then. (tbh automatic captions/translations are another area i've seen a distinct marked improvement over the decades - i recently participated in a meeting with only mexican-spanish speakers and understood 95% of it right off of pure captioning, it was awesome, 10/10 would be a fly on the wall again.

cheering for u in the home stretch there!
scytale: snoopy clinging to a tree (snoopy)

[personal profile] scytale 2024-08-13 09:40 pm (UTC)(link)
Oh, hey, I ended up doing this recently too! I was too lazy to find software (I may try yours), so I've just ...been plugging into Google Translate and using the read aloud functionality xD

I found it good for the little errors and some flow things, and that it's good for giving me a little more distance from the work. And some parts of it feel very passive to me in a way that reading doesn't! The other good thing for me is that there are a lot of things that I might pick out if I were reading my own writing, but it turns out that when I listen to them read aloud, they doesn't bother me. So it's curbing the perfectionism a bit for me

Also, much easier to edit while petting cat.
airlock384: (Hanekoma (TWEWY))

[personal profile] airlock384 2024-08-13 11:47 pm (UTC)(link)
well then, spill those objectively correct name pronounciations
airlock384: (Hanekoma (TWEWY))

[personal profile] airlock384 2024-08-14 04:41 am (UTC)(link)
it'd be very amusing to make you bust out an IPA chart for this but na, I do get the picture

and I agree all around! I do pronounce naesala differently, but I know that the thing I'm doing in my head with his name is wrong and sinful. (it goes sorta like... NEI-SHA-la, with a sort of a double stress)

(I also read "nikolias" a bit more NI-ko-lee-as, but, well, I think that's just the more lusophonic reading as opposed to your anglophonic reading)
helicoprion: (Default)

[personal profile] helicoprion 2024-08-14 01:00 am (UTC)(link)
Reminds me of that time that my hands were too jank for typing so I had to dictate my fanfics to my computer and either a) deliberately pronounce some things super fucking wrong to get it to spell them the way I wanted or b) just give everyone Whitebread Names that it would recognize and do a search/replace in post.

I bet that technology has improved a lot in the past 8 years too.