ESM3 is a generative language model designed for protein engineering. Its primary function is to simulate evolutionary processes to create functional proteins that are vastly different from known ones. By integrating sequence, structure, and function, ESM3 can generate proteins following complex prompts, offering creative solutions to biological challenges in drug discovery, materials science, and carbon capture3.
ESM3 generated a new green fluorescent protein (GFP) called esmGFP, which is 58% different from any known fluorescent proteins. This degree of difference is comparable to 500 million years of natural evolution.
ESM3 integrates sequence, structure, and function by representing them as tracks of discrete tokens at the input and output2. The model is a series of transformer blocks, where all tracks are fused within a single latent space. Geometric attention in the first block allows conditioning on atomic coordinates. This integration enables ESM3 to generate diverse, high-quality proteins that differ significantly from known natural proteins.