TLDR:
– The lack of transparency in AI systems hinders their deployment in sensitive areas.
– Startup Anthropic has made a breakthrough in understanding the inner workings of AI.
The opaque inner workings of AI systems have been a barrier to their widespread deployment in areas such as medicine, law enforcement, and insurance. The algorithms used in deep learning neural networks are trained on data to come up with their own solutions, making them less brittle but harder to interpret. Anthropic, a startup, has made a significant advance in understanding these AI systems.
The team from Anthropic has shown that they can link patterns of activity in large language models to concrete and abstract concepts, giving insights into how these models reach decisions. By training another neural network on the activation data from these models, they were able to extract around 10 million unique features related to various concepts. Manipulating the activity of neurons encoding these features could impact the model’s behavior significantly, showing potential for steering models towards desirable outputs.
While this research provides insight into how AI models “think,” there are still limitations in fully understanding their complexities. Extracting all features would require substantial computing resources, but the findings suggest that making these black boxes less inscrutable is possible.