My observation

Humans think about different things and concepts for different periods of time. Saying “and” takes less effort to think of than “telephone”, as that is more context sensetive.

Example

User: What color does an apple have?

LLM: Apples are red.

Here, the inference time it takes to generate the word “Apple” and “are” is exactly the same time as it takes it to generate “red”, which should be the most difficult word to come up with. It should require the most amount of compute.

Or let’s think about this the other way around. The model thought just as hard about the word “red”, as it did the way less important words “are” and “Apples”.

My idea

We add maybe about 1000 new tokens to an LLM which are not word tokens, but thought tokens or reasoning tokens. Then we train the AI as usual. Every time it generates one of these reasoning tokens, we don’t interpret it as a word and simply let it generate those tokens. This way, the AI would kinda be able to “think” before saying a word. This thought is not human-interpretable, but it is much more efficient than the pre-output reasoning tokens of o1, which uses human language to fill its own context window with.

Chances

  • My hope for this is to make the AI able to think about what to say next like a human would. It is reasonable to assuma that at first in training, it doesn’t use the reasoning tokens all that much, but later on, when it has to solve more difficult things in training, it will very likely use these reasoning tokens to improve its chances of succeeding.
  • This could drastically lower the amount of parameters we need to get better output of models, as less thought-heavy tasks like smalltalk or very commonly used sentence structures could be generated quickly, while more complex topics are allowed to take longer. It would also make better LLMs more accessible to people running models at home, as not the parameters, but the inference time is scaled.
  • It would train itself to provide useful reasoning tokens. Compared to how o1 does it, this is a much more token-friendly approach, as we allow for non-human-text generation, which the LLM is probably going to enjoy a lot, as it fills up its context less.
  • This approach might also lead to more concise answers, as now it doesn’t need to use CoT (chain of thought) to come to good conclusions.

Pitfalls and potential risks

  • Training an AI using some blackboxed reasoning tokens can be considered a bad idea, as it’s thought proccess is literally uninterpretable.
  • We would have to constrain the amount of reasoning tokens, so that it doesn’t take too long for a single normal word-token output. This is a thing with other text-only LLMs too, they tend to like to generate long blocks of texts for simple questions.
  • We are hoping that during training, the model will use these reasoning tokens in its response, even though we as humans can’t even read them. This may lead to the model completely these tokens, as they don’t seem to lead to a better output. Later on in training however, I do expect the model to use more of these tokens, as it realizes how useful it can be to have thoughts.

What do you think?

I like this approach, because it might be able to achieve o1-like performace without the long wait before the output. While an o1-like approach is probably better for coding tasks, where planning is very important, in other tasks this way of generating reasoning tokens while writing the answer might be better.

  • laitalaj@lemm.ee
    link
    fedilink
    English
    arrow-up
    2
    ·
    11 hours ago

    How about adding a mechanism for storing the raw, embedding-dimensional vectors as a part of the sequence instead of introducing a set of additional discrete “invisible” tokens? So basically something like checking the final element of each vector in the sequence before the final linear layer and if the element is larger than, say, 0, giving the vector as-is as the output instead of passing through the de-embedding process. Then, when generating the next token, one could just interleave the thought vectors between the embedded “real” tokens after the embedding. This would allow the “thoughts” of the LLM to be continuous and thus more nuanced - a transformer doesn’t need the sequence to be discrete, that’s something imposed on LLMs by the nature of natural language. Could be an advatage over traditional CoT!

    One other reason as to why something like this might beat o1’s thought document (at least for some tasks) is the way the attention mechanism works: it’s much more natural to attend to nearby tokens than to far away ones.

    Training thought tokens like this is pretty simple in principle: one could construct a loss for them based on whether they increase the odds of producing the correct token next. Probably should pair that with some minimum increase threshold (below which we actually penalize for thought token generation) and an increasing penalty for outputting multiple thought tokens in a row (in addition to the hard constraint suggested in the OP). The training does pose one major challenge, though: it would need to be done autoregressively instead of pushing the whole sequence through at once, as we don’t have ground truth for these thought tokens. So this would slow things down quite a bit!

  • hendrik@palaver.p3x.de
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    2 days ago

    As I said in a comment on one of your previous posts, you might want to read the papers on “Chain of thought” prompting. This has already been studied and you’ll find some more ideas and estimates of what it can do. It is a good approach to make the LLMs a bit smarter. And recently it was popularized by OpenAI.

    • Smorty [she/her]OP
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 day ago

      I actually tried, but the papers are written in a really technical way. They only give very few examples and talk a lot about complex LaTeX stuff and they don’t give that many examples…

  • The Hobbyist@lemmy.zip
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 days ago

    What your are describing on a high level is what O1 does. But where you are mistaken is when you say:

    This thought is not human-interpretable, but it is much more efficient than the pre-output reasoning tokens of o1, which uses human language to fill its own context window with.

    What makes those reasoning tokens more efficient? They are just tokens, similarly to all other ones and equally complex/simple to generate. Yes they allow for more reflexion before a presented output is given, but the process is the same.

    Also, they would all need to fit in the same context because otherwise you will prevent the model from actually reasoning on it while it iterates its thoughts.

    • Smorty [she/her]OP
      link
      fedilink
      English
      arrow-up
      2
      ·
      2 days ago

      I imagine that a model would be held back by the format of human readable text.

      Human text uses some concepts, which are mostly unimportant to an AI. Sentence syntax and grammar rules being examples. I think that letting the AI “define its own way of thinking” instead of telling it to think in human language would lead to more efficient thought proccesses. It would be similar to embeddings. A bunch of numbers representing a specific topic in these tokens. Not human readable, but useful for the model.

      As far as I know, o1 writes a big document on what it will do, how it will do it and some reflection aswell. My approach however would allow the model to think of things on the fly, while it is writing the text.

      You are right in that it would have to fit into the context window. As far as I can tell, the output from the o1 model doesn’t remember what the big thought document says. With my approach, the model would keep all its thoughts in mind while it is writing, as they are literally part of its message, just unreadable by humans.

      Am I missing something here? If so, please point it out.