Why do AI models use so many em-dashes?

lemmydividebyzero@reddthat.com · 2 days ago

Why do AI models use so many em-dashes?

unpossum@sh.itjust.works · 2 days ago

State-of-the-art models rely on late-1800s and early-1900s print books for high-quality training data, and those books use ~30% more em-dashes than contemporary English prose. That’s why it’s so hard to get models to stop using em-dashes: because they learned English from texts that were full of them.

That sounds really plausible – I associate the em-dash with old books and stilted prose, like Sherlock Holmes stories