The new AI model developed by the research team differs in data weighting by replacing the current method that relies on 16-bit floating points with one that uses just three: {-1, 0, 1}. This, along with new functions that carry out the same types of operations as the prior method, allows for more efficient processing without the need for matrix multiplication (MatMul).
The main innovation of the new AI model is running language models without matrix multiplication (MatMul), which has been a bottleneck in computing resources for large language models (LLMs). The researchers achieved this by replacing the current weighting method with one that uses just three floating points and developing new quantization techniques, resulting in reduced computing power and electricity usage without compromising performance.
Using fewer weights in AI models can lead to several benefits, such as reduced computational requirements, improved efficiency, and decreased energy consumption. This can result in faster model training and inference, lower costs, and a smaller carbon footprint. Additionally, simpler models with fewer weights may be more interpretable and easier to understand, enabling better decision-making and model analysis.