RecurrentGemma, released by Google DeepMind, is a 2B-parameter open language model designed for efficient inference on long sequences. It uses the Griffin architecture, which combines linear recurrences with local attention. This model achieves high performance while reducing memory usage and increasing throughput. Compared to other open language models, RecurrentGemma is one of the strongest in the 2B-parameter range, offering competitive performance on various text generation tasks.
The YaFSDP tool by Yandex aims to revolutionize the training of large language models (LLMs) by enhancing GPU communication and reducing memory usage, leading to a speedup of up to 26% compared to existing methods24. This optimization in GPU consumption enables developers and companies to potentially save hundreds of thousands of dollars monthly. YaFSDP is an open-source method for training LLMs and is currently the most effective publicly available tool for improving GPU communication and reducing memory usage in LLM training23. By eliminating GPU communication inefficiencies, YaFSDP ensures that training requires only necessary processor memory and makes GPU interactions uninterrupted, facilitating further optimizations like minimizing processor communication time, which leads to a significant enhancement in both performance and memory efficiency.
The Galileo Luna model offers several advancements in evaluating language model hallucinations. It is specifically designed to address the prevalent issue of hallucinations in large language models (LLMs). Hallucinations are instances where models generate information not grounded in the retrieved context, which poses a significant challenge in deploying language models in industry applications5. Luna is a purpose-built evaluation foundation model (EFM) that ensures high accuracy, low latency, and cost efficiency in detecting and mitigating these hallucinations6. It surpasses existing models, including GPT-3.5, in both performance and efficiency.