Nov 24, 2023

Removing Irrelevant Text for Better Answer Generation

A short explanation of System 2 Attention, a method for regenerating context before answering to reduce distraction and sycophancy.

In just a few months RAG became a standard used almost in every LLM service. However, models still make mistakes, especially when unnecessary information is provided to its output. To address this, researchers from Meta AI proposed System 2 Attention (S2A) - a new approach where the model regenerates the context to only include relevant information before answering.

How It Works

S2A has two steps. First, we want to regenerate the context to remove irrelevant or distracting information, filtering out the noise, and ensuring that only important details remain. This is done by prompting the model to rewrite the text and extract only the useful parts.

Then, we generate the response using only the regenerated context. This focuses attention on what's relevant.

For example, on a factual QA task, if opinions are provided that suggest incorrect answers, S2A removes those opinions from the regenerated context. The essence of S2A is its ability to make the AI's attention process more efficient and targeted.

Results

S2A was evaluated on three tasks, increasing accuracy from 62.8% to 80.3% on factual QA, from 51.7% to 61.3% on math word problems, and improved objectivity by 57.4% for argument generation.

Advantages and potential limitations

By regenerating the context to focus on what's relevant, S2A improves performance across diverse tasks in terms of accuracy, factuality, and objectivity. Another advantage is that S2A is not just a standalone approach but is also complementary to other reasoning methods. For instance, in the math problem experiment, chain-of-thought reasoning was also applied to the context generated by S2A.

S2A's performance depends on the quality of regeneration prompts, and hence LLM's size. Smaller models can be prone to errors. Also, computing the extra reasoning step incurs additional costs. This is due to the need for regenerating parts of the context, which adds extra computational steps, similar to methods like chain-of-thought that also create intermediate generations. The cost varies based on the length of the context regeneration; larger contexts incur higher computational costs. The paper suggests potential speedup methods, like generating only the different parts or referencing labels for large sections, but these are left for future work.

To sum up, S2A regenerates the input context to only include the relevant portions, before attending to the regenerated context to elicit the final response. In experiments, S2A outperforms standard attention-based LLMs on three tasks containing opinion or irrelevant information, QA, math word problems and longform generation, where S2A increases factuality and objectivity, and decreases sycophancy.

What do you think, does S2A seem like a useful technique for real-world applications? How else could we improve attention and reasoning in AI systems?

← AI explained