It’s clear that companies are currently unable to make chatbots like ChatGPT comply with EU law, when processing data about individuals. If a system cannot produce accurate and transparent results, it cannot be used to generate data about individuals. The technology has to follow the legal requirements, not the other way around.

  • GenderNeutralBro@lemmy.sdf.org
    link
    fedilink
    English
    arrow-up
    6
    ·
    edit-2
    7 months ago

    To clarify, I mean to say that users should not consider it an information repository, because it does not function as one, by design. Whether it should be classified as such under the law is another matter, one on which I do not have enough knowledge to comment. I do think OpenAI is presenting ChatGPT inappropriately, and I hope they will be held accountable for that.

    I’m sure in the future we will see true databases built on the same technology (and they will be awesome, if implemented properly). But that’s not what ChatGPT is (or, as far as I know, any other existing LLM-based application). Any information it is able to “recall” is almost a coincidence of how it was trained. You can sort of think of it like lossy compression. The LLM gets all of its information from its training set, but it is not designed to retain any specific information from the training set in full. In cases where it does, that usually means one of two things:

    1. The information appeared many times in the training set, enough prevent it from being washed out.
    2. The model is far bigger than it should be, and is overfitted to its training data.