On May 22, 2025, the National Security Agency Artificial Intelligence Security Center (AISC), Cybersecurity and Infrastructure Security Agency (CISA), Federal Bureau of Investigation (FBI), and allied international cybersecurity agencies released joint guidance to assist organizations across industries protect the data used to train and operate artificial intelligence (AI) systems.
The cybersecurity information sheet (CSI) outlines ways that AI data may become compromised, including unauthorized access, data tampering, poisoning attacks, and inadvertent data leakage, and offers mitigation strategies, as defined by the NIST AI Risk Management Framework. The new guidance further offers targeted technical recommendations for enterprises deploying or managing AI systems in critical environments.
Key Risks & Threat Vectors
The new guidance focuses on three primary risks that threaten AI system security: (1) data supply chain vulnerabilities; (2) maliciously modified (or “poisoned”) data; and (3) “data drift” or model performance degradation due to evolving input distributions.
Data supply chain vulnerabilities
According to the guidelines, when an organization relies on third-party data sources and intermediaries, it may unknowingly ingest data from untrustworthy sources that can undermine model accuracy, expose sensitive systems, or introduce legal and regulatory exposures.
In particular, data brokers, open-source datasets, or unvetted vendors may provide incomplete, unclean, or maliciously crafted data. The CSI specifically highlights the dangers of using “web-scale datasets” – massive, internet-scraped datasets often compiled without quality controls, proper licensing, or source verification. These datasets may contain adversarial data designed to poison models, copyrighted content, personally identifiable information, or other sensitive material. Their scale and opacity make them especially difficult to audit or govern. Organizations relying on these datasets may be at an elevated risk for both security compromise and regulatory exposure.
Mitigation strategy for risks in the data supply chain include (1) establishing a data acquisition policy that mandates provenance checks, digital signatures, and source authentication for all third-party datasets; (2) screening for “malicious and inaccurate material”; and (3) requiring vendors to verify the integrity and lawful sourcing of any provided data.
Maliciously modified (or “poisoned”) data
The guidance identified key risks in how attackers may insert manipulated data into training sets (known as “data poisoning”), potentially causing misclassification, errant outputs, or compromised security in the resulting model. The guidelines detail various iterations of how this attack can occur, including “frontrunning poisoning,” where an attacker anticipates and preemptively inserts malicious data into a dataset before its collection, and “split-view poisoning,” where different subsets of the data are selectively manipulated to cause specific failures without affecting overall model performance during testing. Ultimately, poisoned data can cause users to rely on inaccurate data when making decisions, which carries reputational damage and potential legal risk.
The guidelines propose that organizations employ data sanitization and anomaly detection tools to identify outliers or suspicious patterns in training data. When feasible, organizations should isolate and audit high-risk data subsets using statistical fingerprinting and labeling validation.
Data drift
Data drift occurs when the statistical properties of input data change over time, reducing model accuracy and reliability. As a result, AI system performance may degrade when the underlying data no longer reflects the conditions the model was trained under. The result can cause silent model failures or increase error rates in operational settings, especially in dynamic environments like fraud detection or autonomous systems. Data drift differs from poisoning attacks in that it occurs naturally due to changes in real-world conditions, rather than being a deliberate manipulation by an adversary. The mitigation guidance suggests regular testing and validation of data outputs and implementing feedback loops to periodically retrain models and define thresholds for when a model must be refreshed.
Life-Cycle-Based Security Best Practices
Building on these risks, the CSI recommends applying security controls throughout four phases of the AI life cycle, aligning with NIST’s AI Risk Management Framework.
1. Plan & design phase
This initial phase is critical for embedding security principles into the AI system from the ground up.
Key recommendations:
- Conduct data security threat modeling and privacy impact assessments at the outset of any AI initiative.
- Embed “data minimization” and “purpose limitation” principles in the system design.
- Require documentation of intended data uses and security assumptions.
- Map compliance obligations (e.g., HIPAA, GDPR, Executive Order 13960) that may govern the data used.
Anticipating where data vulnerabilities might arise – such as insecure data pipelines and over-permissive access – allows developers to proactively mitigate risk and reduce the attack surface early.
2. Collect & process data
The collection and processing phase represents a major attack surface due to the volume and sensitivity of ingested data.
Key recommendations:
- Validate data at collection.
- Use secure ingestion channels, such as TLS 1.3 with mutual authentication.
- Apply integrity checks (e.g., hash verification) both at rest and in transit.
- Store raw and processed data in logically segmented environments.
Ingestion is a particularly vulnerable point where threat actors can intercept, alter, or replace data if security measures are not adequately enforced.
3. Build & use the model
In this phase, once the model is in development, additional risks emerge from both data inputs and model outputs.
Key recommendations:
- Prevent unauthorized model input/output manipulation.
- Assess model behavior across edge cases.
- Train on datasets labeled and validated by multiple reviewers.
- Mask or remove unnecessary sensitive attributes during model training.
- Use differential privacy or federated learning as applicable to reduce exposure.
Even well-trained models can be subverted through model inversion, extraction attacks, or adversarial examples if proper safeguards are not implemented.
4. Operate & monitor
After deployment, continuous monitoring and auditing are essential to maintain AI system integrity.
Key recommendations:
- Log all data access and model inference activity.
- Implement role-based access controls and immutable logging.
- Monitor for unusual shifts in model output distributions.
- Conduct regular audits of datasets and model behavior.
Post-deployment AI systems are attractive targets for exploitation, especially if feedback data can be manipulated to degrade model performance over time.
Technical Recommendations
The CSI provides additional concrete controls, including:
- Encryption & Signing. Consider encrypting sensitive training data using algorithms compliant with FIPS 140-3 and digitally sign them using trusted key management infrastructure.
- Data Provenance. Track data lineage from source to model output using automated metadata tagging and immutable logging systems.
- Trust Infrastructure. Build internal frameworks to verify the authenticity, integrity, and timeliness of all datasets, including cross-validating data and third-party attestation services.
- Secure Storage. Verify that sensitive datasets and models reside in storage environments that enforce encryption-at-rest, integrity monitoring, and restricted API exposure.
Key Takeaways
Organizations that develop or deploy AI systems benefit from treating data security as a foundational pillar of responsible AI governance. As AI models increasingly rely on vast and diverse datasets, the risks associated with data integrity, provenance, and misuse grow exponentially. To mitigate these risks and ensure regulatory and ethical compliance, the new guidance recommends that organizations take the following key steps:
- Update incident response plans to address AI-specific threats such as model poisoning, adversarial inputs, and data drift. These plans should include protocols for detecting anomalies in model behavior and responding to compromised training data or inference pipelines.
- Audit existing AI projects for risks related to data sourcing, including unverified data provenance, lack of consent or licensing, and insufficient monitoring of model outputs. Regular audits help identify blind spots and ensure that AI systems remain secure and trustworthy over time.
- Implement cross-functional review processes that combine cybersecurity, legal, data science, and procurement teams. These reviews are essential when onboarding external datasets, integrating third-party models, or deploying AI in sensitive environments. Collaborative oversight helps develop a holistic approach to ethical considerations and security compliance.
Ransomware Fusion Center
Stay ahead of evolving ransomware threats with Alston & Bird's Ransomware Fusion Center. Our Privacy, Cyber & Data Strategy Team offers comprehensive resources and expert guidance to help your organization prepare for and respond to ransomware incidents. Visit Alston & Bird's Ransomware Fusion Center to learn more and access our tools.
If you have any questions, or would like additional information, please contact one of the attorneys on our Privacy, Cyber & Data Strategy team.
You can subscribe to future advisories and other Alston & Bird publications by completing our publications subscription form.