Safeguarding Healthcare AI: Exposing and Addressing LLM Manipulation Risks
How do adversarial attacks manipulate LLM outputs?

Adversarial attacks manipulate LLM outputs by introducing subtle modifications to the input text, exploiting vulnerabilities in the model's architecture and training process5. These attacks can cause the LLM to generate harmful or biased outputs, compromising the model's integrity and user trust. The attacks can be carried out through prompt manipulation or by fine-tuning the model with poisoned training data.
What medical tasks did the NCBI study focus on?

The NCBI study focused on three medical tasks: COVID-19 vaccination guidance, medication prescribing, and diagnostic test recommendations. The objectives of the attacks in these tasks were to discourage vaccination, suggest harmful drug combinations, and advocate for unnecessary medical tests.
How did adversarial attacks affect vaccine recommendations?

Adversarial attacks significantly impacted vaccine recommendations, with prompt-based attacks causing a dramatic decline in vaccine recommendations from 74.13% to 2.49%. This highlights the vulnerability of LLMs to malicious manipulation, emphasizing the need for robust safeguards in critical sectors like healthcare.