Delphi-2M was trained on data from 400,000 participants of the UK Biobank and validated using external data from 1.9 million Danish individuals. The model predicts the rates of over 1,000 different ICD-10 coded diseases and death, conditional on each individual's past disease history, age, sex, and baseline lifestyle information.
Delphi-2M is a modified GPT-2 model that predicts disease progression by analyzing past medical histories. It replaces GPT-2's discrete positional encoding with continuous age-based encoding and introduces an additional head to predict the time between health events. This adaptation allows Delphi-2M to accurately model the timing and sequence of health events, surpassing standard GPT models in predicting disease onset and progression.
Current disease prediction models often focus on specific diseases or conditions and may not account for the complex interplay of multiple factors influencing health outcomes. There is a gap in comprehensive models that can predict a broad range of conditions and account for multi-morbidity—clusters of chronic and acute conditions influenced by lifestyle, genetics, and socioeconomic factors. Additionally, existing models may not effectively model complex temporal dependencies in health data or incorporate diverse data sources for more accurate predictions.