• Natanael@slrpnk.net
      link
      fedilink
      arrow-up
      2
      ·
      9 months ago

      A) I’ve not yet seen evidence to the contrary

      B) you do know there’s a lot of different definitions of average, right? The centerpoint of multiple vectors is one kind of average. The median of online writing is an average. The most common vocabulary, the most common sentence structure, the most common formulation of replies, etc, those all form averages within their respective problem spaces. It displays these properties because it has seen them so often in samples, and then it blends them.

      • General_Effort@lemmy.world
        link
        fedilink
        arrow-up
        1
        arrow-down
        1
        ·
        9 months ago

        A) I’ve not yet seen evidence to the contrary

        You should worry more about whether you have seen evidence that supports what you are saying. So, what kind of evidence do you want? A tutorial on coding neural nets? The math? Video or text?

        • Natanael@slrpnk.net
          link
          fedilink
          arrow-up
          2
          ·
          9 months ago

          Text explaining why the neural network representation of common features (typically with weighted proportionality to their occurrence) does not meet the definition of a mathematical average. Does it not favor common response patterns?

      • General_Effort@lemmy.world
        link
        fedilink
        arrow-up
        1
        arrow-down
        1
        ·
        9 months ago

        I accidentally clicked reply, sorry.

        B) you do know there’s a lot of different definitions of average, right?

        I don’t think that any definition applies to this. But I’m no expert on averages. In any case, the training data is not representative of the internet or anything. It’s also not training equally on all data and not only on such text. What you get out is not representative of anything.

        • Natanael@slrpnk.net
          link
          fedilink
          arrow-up
          2
          ·
          9 months ago

          You don’t need it to be an average of the real world to be an average. I can calculate as many average values as I want from entirely fictional worlds. It’s still a type of model which favors what it sees often over what it sees rarely. That’s a form of probability embedded, corresponding to a form of average.

          • General_Effort@lemmy.world
            link
            fedilink
            arrow-up
            1
            arrow-down
            1
            ·
            9 months ago

            Text explaining why the neural network representation of common features (typically with weighted proportionality to their occurrence) does not meet the definition of a mathematical average. Does it not favor common response patterns?

            Hmm. I’m not really sure why anyone would write such a text. There is no “weighted proportionality” (or pathways). Is this a common conception?

            You don’t need it to be an average of the real world to be an average. I can calculate as many average values as I want from entirely fictional worlds. It’s still a type of model which favors what it sees often over what it sees rarely. That’s a form of probability embedded, corresponding to a form of average.

            I guess you picked up on the fact that transformers output a probability distribution. I don’t think anyone calls those an average, though you could have an average distribution. Come to think of it, before you use that to pick the next token, you usually mess with it a little to make it more or less “creative”. That’s certainly no longer an average.

            You can see a neural net as a kind of regression analysis. I don’t think I have ever heard someone calling that a kind of average, though. I’m also skeptical if you can see a transformer as a regression but I don’t know this stuff well enough. When you train on some data more often than on other data, that is not how you would do a regression. Certainly, once you start RLHF training, you have left regression territory for good.

            The GPTisms might be because they are overrepresented in the finetuning data. It might also be from the RLHF and/or brought out by the system prompt.