Why do AI models use so many em-dashes?

lemmydividebyzero@reddthat.com · 3 days ago

Why do AI models use so many em-dashes?

CerebralHawks@lemmy.dbzer0.com · 2 days ago

My guess is, those that do are trained from forum posts where intelligence, including the knowledge of how and the wisdom of when to use non-standard punctuation marks, like en and em dashes, the semicolon, and others, were considered valuable. These people would seem, on the surface, to know more about what they’re talking about and would provide better training data for the LLM. Those people used em dashes, so, so too do the AI models based on them.

Also, sorry (not sorry). I am a religious em dash user and have been for over 30 years. I’m not saying I’m smarter than anyone about any one thing, but it is entirely possible some of my forum posts were used to train LLMs. I didn’t get paid for it though; hence the “not sorry” part. If it trained on my posts after the fact, I won’t take any blame for that. But, people were using em dashes long before AIs were.

joeljoelle@piefed.world · 2 days ago

Using this same thumbnail for every article is really attracting a ton of viewers I bet.

lemmydividebyzero@reddthat.com · 2 days ago

Others use AI to generate them… I consume the posts via RSS (without images at all)…

joeljoelle@piefed.world · 2 days ago

Yeah love RSS :) On Piefed I see the same linkedin grade selfie for every post from that website.

DrunkenPirate@feddit.org · 3 days ago

It isn’t new — it’s interesting 8-P

DrunkenPirate@feddit.org · 3 days ago

Having this indicator in mind, it’s a bit fad to spot AI text easily (and everwhere)

I‘m not a bot — AI m a bitch.

unpossum@sh.itjust.works · 2 days ago

State-of-the-art models rely on late-1800s and early-1900s print books for high-quality training data, and those books use ~30% more em-dashes than contemporary English prose. That’s why it’s so hard to get models to stop using em-dashes: because they learned English from texts that were full of them.

That sounds really plausible – I associate the em-dash with old books and stilted prose, like Sherlock Holmes stories

deliriousdreams@fedia.io · 2 days ago

Theft. Its not even a secret.