• nondescripthandle@lemmy.dbzer0.com
    link
    fedilink
    arrow-up
    6
    arrow-down
    14
    ·
    edit-2
    5 months ago

    Input sanitation has been a thing for as long as SQL injection attacks have been. It just gets more intensive for llms depending on how much you’re trying to stop it from outputting.

    • MajorHavoc@programming.dev
      link
      fedilink
      arrow-up
      21
      ·
      edit-2
      5 months ago

      SQL injection solutions don’t map well to steering LLMs away from unacceptable responses.

      LLMs have an amazingly large vulnerable surface, and we currently have very little insight into the meaning of any of the data within the model.

      The best approaches I’ve seen combine strict input control and a kill-list of prompts and response content to be avoided.

      Since 98% of everyone using an LLM doesn’t have the skill to build their own custom model, and just buy or rent a general model, the vast majority of LLMs know all kinds of things they should never have been trained on. Hence the dirty limericks, racism and bomb recipes.

      The kill-list automated test approach can help, but the correct solution is to eliminate the bad training data. Since most folks don’t have that expertise, it tends not to happen.

      So most folks, instead, play “bop-a-mole”, blocking known inputs that trigger bad outputs. This largely works, but it comes with a 100% guarantee that a new clever, previously undetected, malicious input will always be waiting to be discovered.

      • frezik@midwest.social
        link
        fedilink
        arrow-up
        11
        ·
        5 months ago

        Right, it’s something like trying to get a three year old to eat their peas. It might work. It might also result in a bunch of peas on the floor.

      • nondescripthandle@lemmy.dbzer0.com
        link
        fedilink
        arrow-up
        1
        arrow-down
        12
        ·
        edit-2
        5 months ago

        Of course because punctuation isn’t going to break a table, but the point is that it’s by no means an unforseen or unworkable problem. Anyone could have seen that coming, for example basic SQL and a college class in Java is the extent of my comp sci knowledge and I know about it.

        • MajorHavoc@programming.dev
          link
          fedilink
          arrow-up
          4
          ·
          5 months ago

          it’s by no means an unforseen or unworkable problem

          Yeah. It’s achievable, just usually not in the ways currently preferred (untrained staff spin it up and hope for the best), and not for the currently widely promised low costs (with no one trained in data science on staff at the customer site).

          For a bunch of use cases the lack of security is currently an acceptable trade off.

    • InAbsentia@lemmy.world
      link
      fedilink
      arrow-up
      10
      ·
      5 months ago

      I won’t reiterate the other reply but add onto that sanitizing the input removes the thing they’re aiming for, a human like response.