State-of-the-art models rely on late-1800s and early-1900s print books for high-quality training data, and those books use ~30% more em-dashes than contemporary English prose. That’s why it’s so hard to get models to stop using em-dashes: because they learned English from texts that were full of them.
That sounds really plausible – I associate the em-dash with old books and stilted prose, like Sherlock Holmes stories
That sounds really plausible – I associate the em-dash with old books and stilted prose, like Sherlock Holmes stories