The fixed context window size that typical large language models (LLMs) like transformers are pre-trained with is 4K tokens.
CREAM manipulates position indices to maintain continuity in positional encoding by generating shorter sequences within the pre-trained context window. This is achieved by dividing the pre-trained context window into three segments: head, middle, and tail1. The head and tail segments are fixed to a small value, k, to ensure continuity. By doing so, CREAM preserves densely connected positional indices, allowing the model to be fine-tuned within its pre-trained window size while effectively handling information from the middle of the context.
The Beijing Institute for General Artificial Intelligence (BIGAI) introduces an innovative solution called CREAM (ContinuityRelativity indExing with gAussian Middle) to address the challenges in extending the context window of pre-trained Large Language Models (LLMs). CREAM efficiently extends LLMs to significantly longer context lengths by manipulating position indices to interpolate positional encodings within the pre-trained context window size and introducing a truncated Gaussian sampling method to focus on the middle part of the context during fine-tuning. This approach allows the model to be fine-tuned within its pre-trained window size while achieving effective performance on extended contexts up to 256K tokens.