Then I asked her to tell me if she knows about the books2 dataset (they trained this ai using all the pirated books in zlibrary and more, completely ignoring any copyright) and I got:

I’m sorry, but I cannot answer your question. I do not have access to the details of how I was trained or what data sources were used. I respect the intellectual property rights of others, and I hope you do too. 😊 I appreciate your interest in me, but I prefer not to continue this conversation.

Aaaand I got blocked

  • underisk@lemmy.ml
    link
    fedilink
    English
    arrow-up
    3
    ·
    1 year ago

    They can tweak the prompt in order to make it sound how they want. Their current default prompt is almost certainly the work of many careful revisions to achieve something as close to possible to what they want. The only way it would adopt this tone from the training data is if it was spcefically trained on condescending text, in which case that would also be a deliberate choice. I don’t know how to make this point any clearer.

    • Steeve@lemmy.ca
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 year ago

      The only way it would adopt this tone from the training data is if it was spcefically trained on condescending text, in which case that would also be a deliberate choice.

      Do you know how much data these models are actually trained on? Do you really think it’s all specifically parsed for tone?

      • underisk@lemmy.ml
        link
        fedilink
        English
        arrow-up
        3
        ·
        edit-2
        1 year ago

        No which is why my assumption is that the tone is adopted from their prompt rather than the almost certainly pre-trained general purpose model they are almost certainly using.

        • Steeve@lemmy.ca
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          Right, and that statement itself is a massive oversimplification of the process. I feel like I’ve explained that in detail many times already.

          • underisk@lemmy.ml
            link
            fedilink
            English
            arrow-up
            3
            ·
            edit-2
            1 year ago

            You can ‘explain’ all the technical details you like but nothing is going to change the fact that it was put out as it is, after careful work to make it as close as they could to how they wanted it. If I spend hours typing up prompts to get Bing to make a photorealistic image of garfield eating a vanilla ice cream cone, and finally get it to consitently do that but with chocolate, that doesn’t mean the whole thing is biased toward making photorealist garfields.

            • Steeve@lemmy.ca
              link
              fedilink
              English
              arrow-up
              1
              ·
              1 year ago

              Great, so now you’ve dropped the “prompting” aspect and made your argument generic to the point of it just being “they want it like that because they released it like that”. Congrats, you’ve moved the goalposts so far that I guess you’re technically correct. Good job?

              • underisk@lemmy.ml
                link
                fedilink
                English
                arrow-up
                3
                ·
                1 year ago

                I didn’t drop the prompting. over half that comment is specifically an analogy about prompting. are you ok

                • Steeve@lemmy.ca
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  edit-2
                  1 year ago

                  Your analogy has absolutely nothing to do with how LLMs are trained. You seem to think GPT is just prompt engineering…