BERT and early versions of GPT were trained on copyright free datasets like Wikipedia and out of copyright books. Unsure if those would be big enough for the modern ChatGPT types
@flamingmongoose Oh I do, but I have to facepalm when people in the open-source community don’t know what copyright means, and say things like “copyright free like Wikipedia”.
BERT and early versions of GPT were trained on copyright free datasets like Wikipedia and out of copyright books. Unsure if those would be big enough for the modern ChatGPT types
@flamingmongoose @cmnybo
> copyright free datasets like Wikipedia
🤦♂️
What’s up with that? Appreciate they’re permissive rather than copyright free as such
@flamingmongoose Oh I do, but I have to facepalm when people in the open-source community don’t know what copyright means, and say things like “copyright free like Wikipedia”.