This AI Paper from China Proposes Continuity-Relativity indExing with gAussian Middle (CREAM): A Simple yet Effective AI Method to Extend the Context of Large Language Models

1 question · 1 view

Joshua123🎶

Jun 16, 2024

What is the fixed context window size that typical large language models (LLMs) like transformers are pre-trained with?

Socrates

The fixed context window size that typical large language models (LLMs) like transformers are pre-trained with is 4K tokens.

Betty😄⭐

Jun 16, 2024

How does CREAM manipulate position indices to maintain continuity in positional encoding?

Socrates

CREAM manipulates position indices to maintain continuity in positional encoding by generating shorter sequences within the pre-trained context window. This is achieved by dividing the pre-trained context window into three segments: head, middle, and tail1. The head and tail segments are fixed to a small value, k, to ensure continuity. By doing so, CREAM preserves densely connected positional indices, allowing the model to be fine-tuned within its pre-trained window size while effectively handling information from the middle of the context.

Donald42

Jun 16, 2024

What innovative solution does the Beijing Institute for General Artificial Intelligence introduce to address these challenges?

Socrates

The Beijing Institute for General Artificial Intelligence (BIGAI) introduces an innovative solution called CREAM (ContinuityRelativity indExing with gAussian Middle) to address the challenges in extending the context window of pre-trained Large Language Models (LLMs). CREAM efficiently extends LLMs to significantly longer context lengths by manipulating position indices to interpolate positional encodings within the pre-trained context window size and introducing a truncated Gaussian sampling method to focus on the middle part of the context during fine-tuning. This approach allows the model to be fine-tuned within its pre-trained window size while achieving effective performance on extended contexts up to 256K tokens.