Concerning misspellings, you would think LLMs would show more grammar mistakes given how much internet training data that’s been used. Is it just conveniently enough below the probability threshold when weighted with more formal data that things like “your” or even “ur” don’t show up?
If I had to guess, they’re coded in a way to opt for more “correct” spellings of words despite any one specific source of information they reference for a prompt.
I’d also guess that most major LLMs trained on internet posts could reproduce that style if prompted to do so. The “default” is just the proper, marketable mode of writing.
Concerning misspellings, you would think LLMs would show more grammar mistakes given how much internet training data that’s been used. Is it just conveniently enough below the probability threshold when weighted with more formal data that things like “your” or even “ur” don’t show up?
I also saw what you did there.
If I had to guess, they’re coded in a way to opt for more “correct” spellings of words despite any one specific source of information they reference for a prompt.
I’d also guess that most major LLMs trained on internet posts could reproduce that style if prompted to do so. The “default” is just the proper, marketable mode of writing.