• Rhaedas@fedia.io
    link
    fedilink
    arrow-up
    11
    ·
    9 days ago

    Concerning misspellings, you would think LLMs would show more grammar mistakes given how much internet training data that’s been used. Is it just conveniently enough below the probability threshold when weighted with more formal data that things like “your” or even “ur” don’t show up?

    I also saw what you did there.

    • Stovetop@lemmy.world
      link
      fedilink
      arrow-up
      5
      ·
      9 days ago

      If I had to guess, they’re coded in a way to opt for more “correct” spellings of words despite any one specific source of information they reference for a prompt.

      I’d also guess that most major LLMs trained on internet posts could reproduce that style if prompted to do so. The “default” is just the proper, marketable mode of writing.