Anthropic, a leading AI safety and research company, has launched new tools providing insights into the reasoning of advanced language models. These tools act as a 'microscope' for AI, allowing scientists to trace internal computations within models like Claude. The research aims to understand how AI models process information and generate responses. Researchers can now map 'circuits' linked to specific capabilities like reasoning and translation. This 'circuit tracing' method allows alteration of internal representations mid-prompt. For instance, modifying Claude's poetic planning state changes its rhyme choices, demonstrating the model's internal adaptability. Claude's internal workings are more complex than they appear, even in simple tasks. The model uses parallel computations for math, estimating sums while calculating precise digits. Anthropic's interpretability tools are crucial for ensuring AI systems are safe, predictable, and aligned with human values.
Anthropic's 'Microscope' Reveals AI Claude's Reasoning
Edited by: Veronika Nazarova
Did you find an error or inaccuracy?
We will consider your comments as soon as possible.