Decoding intermediate activations in llama-2-7b


In line with previous research, Nina Rimsky found that the decoded block outputs at most layers, except a few early ones, were interpretable. She also found that the other intermediate outputs were interpretable and provided some intuition on what different layers were responsible for. Some very interesting insights in the post.

Read article here
We care about your privacy so we do not store nor use any cookie unless it is stricly necessary to make the website to work
Got it
Learn more