
DeepSeek-V2 is a mixture-of-experts (MoE) model that employs innovative techniques to increase efficiency and reduce training costs. It has a context length of 128K tokens and allows 236B parameters, of which only 21B are enabled per token. The two key architectures used in DeepSeek-V2 are Multi-head Latent Attention (MLA) and DeepSeekMoE.
MLA guarantees efficient inference by significantly compressing the Key-Value (KV) cache into a latent vector2. This helps overcome the bottleneck of inference-time key-value cache, thus supporting efficient inference4. On the other hand, DeepSeekMoE enables the training of strong models at an economical cost through sparse computation2. This facilitates the training of powerful models by employing a sparse computation strategy that allows for more targeted and efficient parameter use6.
By combining these two techniques, DeepSeek-V2 achieves strong performance, economical training costs, and efficient inference throughput simultaneously. The model is constructed using a high-quality and multi-source pre-training corpus consisting of 8.1T tokens.

Evol-Instruct technique used by WizardLM is a unique approach to overcome the difficulty of creating high-complexity instruction data. Instead of relying on manual human input, WizardLM uses LLMs to iteratively rewrite instructions to increase complexity.
The process begins by taking an initial set of instructions and rewriting them step by step into more complex instructions using Evol-Instruct1. The generated instruction data is then mixed with the original data to fine-tune LLaMA, resulting in the WizardLM model.
Human evaluations on a complexity-balanced test bed and Vicuna's test set have shown that instructions from Evol-Instruct are superior to human-created ones. This is because Evol-Instruct can generate large amounts of instruction data with varying levels of complexity that humans may struggle to produce2. Additionally, the methods for complicating instructions in Evol-Instruct are designed and summarized by human experts, ensuring the quality of the generated data3.

The QWEN series utilizes Reinforcement Learning from Human Feedback (RLHF) to enhance its chat models by incorporating a critical human-in-the-loop component to the standard RL learning paradigm2. This approach involves defining the objective function and iteratively refining it based on feedback from human interactions.
CODE-QWEN and MATH-QWEN-CHAT are specific variations of the QWEN series that excel in coding and mathematics-focused tasks. CODE-QWEN is designed to assist with coding tasks, while MATH-QWEN-CHAT is tailored for solving mathematical problems. These variations demonstrate the versatility of the QWEN series in handling various domains and tasks.