Rotary Embeddings: A Relative Revolution

Description

Summary drafted by a large language model.

In this article, the authors at EleutherAI present Rotary Positional Embedding (RoPE), a new type of position encoding that unifies absolute and relative approaches to position encoding in transformers. Developed by Jianlin Su, RoPE has already garnered interest for its ability to work with both vanilla and efficient attention. The authors conducted tests on RoPE against learned absolute positional embeddings used in GPT-3 and the learned relative positional embeddings used in T5, finding that RoPE performs better than other methods. They also tested it on Performer, an alternative attention mechanism designed to avoid quadratic bottlenecks with respect to sequence lengths. The authors note that the runtime cost of rotary embeddings is fairly negligible and find that they impose a 1-3% overhead across a range of transformer sizes.


Read article here
Link
We care about your privacy so we do not store nor use any cookie unless it is stricly necessary to make the website to work
Got it
Learn more