LLMs are built by generating a network of weights based on a large volume of training data. Some models have made those weights public/open, meaning you could, in principle, go in and manually edit the weights individually to change the outcomes. In practice, you would never do this because it would only ruin the output.
However, you could theoretically nudge a lot of values in just the right way to change the model to favor an ideology, have a different attitude, produce disinformation etc.
Right now, this is done practically in a brute force manner. The program will have certain instructions and parameters appended to the input in order to force a certain disposition, limit the scope, etc.
There are a lot of reasons to want to adjust the fundamentals of a model, but AFAIK such a technology doesn’t exist yet (publicly). For example, this could be used for political gain, or for positive purposes like removing racism that has been well documented.
Is anyone working on such a thing?
Note: This community is “no stupid questions,” but I am actually pretty stupid and I probably misunderstood some (all) of the fundamentals of how this works. Please respond to any part of my question.
It would be easier and faster to just train it on the stuff you want it to output. There are hundreds of billions of weights in models like gpt, and no one really knows what any individual one does.