Back to Search Start Over

3D-RPE: Enhancing Long-Context Modeling Through 3D Rotary Position Encoding

Authors :
Ma, Xindian
Liu, Wenyuan
Zhang, Peng
Xu, Nan
Publication Year :
2024

Abstract

Inspired by the Bloch Sphere representation, we propose a novel rotary position encoding on a three-dimensional sphere, named 3D Rotary Position Encoding (3D-RPE). 3D-RPE is an advanced version of the widely used 2D Rotary Position Encoding (RoPE), with two major advantages for modeling long contexts: controllable long-term decay and improved position resolution. For controllable long-term decay, 3D-RPE allows for the regulation of long-term decay within the chunk size, ensuring the modeling of relative positional information between tokens at a distant relative position. For enhanced position resolution, 3D-RPE can mitigate the degradation of position resolution caused by position interpolation on RoPE. We have conducted experiments on long-context Natural Language Understanding (NLU) and long-sequence Language Modeling (LM) tasks. From the experimental results, 3D-RPE achieved performance improvements over RoPE, especially in long-context NLU tasks.

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2406.09897
Document Type :
Working Paper