• 0 Posts
  • 28 Comments
Joined 2 years ago
cake
Cake day: June 10th, 2023

help-circle





  • This product is so new that we have not yet determined the actual retail price. Whatever it turns out to be, you’ll certainly want at least one. Send us your actual credit card (not just the number!) along with a sample signature, and when the price has been finalized, we’ll charge your card accordingly.

    Order one and let us know!












  • Let me expand a little bit.

    Ultimately the models come down to predicting the next token in a sequence. Tokens for a language model can be words, characters, or more frequently, character combinations. For example, the word “Lemmy” would be “lem” + “my”.

    So let’s give our model the prompt “my favorite website is”

    It will then predict the most likely token and add it into the input to build together a cohesive answer. This is where the T in GPT comes in, it will output a vector of probabilities.

    “My favorite website is”

    "My favorite website is "

    “My favorite website is lem”

    “My favorite website is lemmy”

    “My favorite website is lemmy.”

    “My favorite website is lemmy.org

    Woah what happened there? That’s not (currently) a real website. Finding out exactly why the last token was org, which resulted in hallucinating a fictitious website is basically impossible. The model might not have been trained long enough, the model might have been trained too long, there might be insufficient data in the particular token space, there might be polluted training data, etc. These models are massive and so determine why it’s incorrect in this case is tough.

    But fundamentally, it made up the first half too, we just like the output. Tomorrow some one might register lemmy.org, and now it’s not a hallucination anymore.


  • Very difficult, it’s one of those “it’s a feature not a bug” things.

    By design, our current LLMs hallucinate everything. The secret sauce these big companies add is getting them to hallucinate correct information.

    When the models get it right, it’s intelligence, when they get it wrong, it’s a hallucination.

    In order to fix the problem, someone needs to discover an entirely new architecture, which is entirely conceivable, but the timing is unpredictable, as it requires a fundamentally different approach.


  • eating3645@lemmy.worldtoMemes@lemmy.ml5 parallel universes ahead
    link
    fedilink
    English
    arrow-up
    6
    arrow-down
    2
    ·
    1 year ago

    This is an argument that I do not agree with, but I 100% can respect.

    I would assert that the LLMs are irrelevant here, the kid has an aptitude for engineering with or without LLMs. He clearly is capable of processing information and producing compelling content on his own.

    Likewise, his peers may have their own faculties that will grant them an advantage in life. But I don’t think failing to leverage existing technologies will do them any good. Using textbooks, the internet, and LLMs are various technologies that can be used effectively or detrimentally.

    Other students may succeed, not due to their unwillingness to adopt LLMs, but in spite of it.

    It seems you’re hyper focused on an overly literal interpretation of a meme. Of course blindly outputting chatgpt’s response is an ineffective strategy and doing the student a disservice. So is copying a textbook or plagiarizing from the Internet.

    But rigging this bad boy up? That’s innovative, and more importantly, makes a funny image.