Description
In this article, the authors at EleutherAI present Rotary Positional Embedding (RoPE), a new type of position encoding that unifies absolute and relative approaches to position encoding in transformers. Developed by Jianlin Su, RoPE has already garnered interest for its ability to work with both vanilla and efficient attention. The authors conducted tests on RoPE against learned absolute positional embeddings used in GPT-3 and the learned relative positional embeddings used in T5, finding that RoPE performs better than other methods. They also tested it on Performer, an alternative attention mechanism designed to avoid quadratic bottlenecks with respect to sequence lengths. The authors note that the runtime cost of rotary embeddings is fairly negligible and find that they impose a 1-3% overhead across a range of transformer sizes.