In the age of rapidly advancing technology and artificial intelligence, data security and privacy have become paramount concerns. Companies and organizations invest heavily in safeguarding their confidential information, but even the most diligent can fall victim to unintended data breaches.
Microsoft recently responded to a security incident that exposed 38 terabytes of private data. The breach was discovered on Microsoft's AI GitHub repository and resulted from the accidental publication of open-source training data. The exposed data included a disk backup of two former employees' workstations, which contained sensitive information such as secrets, keys, passwords, and internal Teams messages.
The compromised repository, named "robust-models-transfer," has since been taken down. Before its removal, it contained source code and machine learning models related to a 2020 research paper titled "Do Adversarially Robust ImageNet Models Transfer Better?"
The security lapse occurred due to an overly permissive Shared Access Signature (SAS) token; an Azure feature used for data sharing. The token inadvertently granted access to the entire storage account, exposing additional private data. Furthermore, the token was misconfigured to allow "full control" permissions rather than read-only access, enabling attackers to not only view but also delete and overwrite files in the storage account.
Upon discovering the breach, Microsoft initiated an investigation. They reported that no evidence suggested unauthorized exposure of customer data, and no other internal services were compromised. Microsoft revoked the SAS token and blocked external access to the storage account, resolving the issue two days after being notified.
To prevent similar incidents in the future, Microsoft expanded its secret scanning service to include SAS tokens with overly permissive settings. They also identified a bug in their scanning system that had incorrectly flagged the specific SAS URL in the repository as a false positive.
The incident serves as a reminder of the critical importance of robust security measures and vigilant data management in an era of large-scale data handling and AI development. Microsoft and organizations alike can draw several key lessons and recommendations from this incident:
- Enhanced Data Classification: Clearly categorize and label sensitive data to ensure that it receives the appropriate level of protection.
- Access Control Vigilance: Regularly review and strengthen access controls and permissions to prevent unauthorized access.
- Encryption Practices: Employ strong encryption for data both at rest and in transit to add an extra layer of protection.
- Incident Response Planning: Maintain a well-defined incident response plan to address breaches swiftly and effectively.
- Security Education: Educate employees and teams on data security best practices to increase awareness and reduce the likelihood of misconfigurations.
- Third-Party Audits: Consider involving third-party security experts for audits and threat modeling to identify vulnerabilities.