Triage Security: Analyzing the Role of Commercial AI in Recent Government Security Incidents

Over the past few months, a small group of unauthorized individuals compromised the networks of at least nine Mexican government agencies. According to an analysis by Gambit Security, the incident resulted in the exposure of more than 195 million identities and tax records, alongside vehicle registrations and over 2.2 million property records. The event provides clear documentation of how commercial large language models (LLMs) are altering the operational efficiency of unauthorized groups.

Using a playbook formatted as a prompt of approximately one thousand lines, the operators utilized Anthropic's Claude and OpenAI's ChatGPT. By masquerading as legitimate security assessment professionals, the group bypassed the AI models' safety guardrails within 40 minutes. Once unconstrained, the AI systems functioned as force multipliers. Autonomously mapping environments, identifying vulnerable configurations, developing custom tooling, and navigating standard defenses.

The unauthorized individuals maintained their access for an extended period, establishing deep persistence. Alon Gromakov, CEO and co-founder of Gambit Security, noted the group appeared to be ideologically motivated. "These [unauthorized individuals] got into multiple systems, and they were there for a very extended period of time... more than a month, and they left backdoors," he stated, pointing out the complexity of verifying systemic integrity after such an event.

While Mexican authorities have not publicly confirmed the incident, Anthropic has "disrupted the activity and banned the accounts," according to Bloomberg. The event aligns with broader regional data trends. Latin American organizations experience an average of 3,100 security incidents per week, compared to fewer than 1,500 in the United States.

A significant portion of this volume is driven by the adoption of generative AI. While these tools are frequently used to craft highly convincing social engineering materials—driving click-through rates up to five times higher, according to Microsoft. Their technical application represents a structural shift. Victor Ruiz, founder of Mexican cybersecurity startup Silikn, noted that generative AI is increasingly used to develop sophisticated malicious software. "This marks a strategic inflection point for the region, as traditional defenses built around static signatures and behavioral patterns become less effective against code capable of evolving in real-time," Ruiz explained.

Uncovering the methodology

Researchers at Gambit Security gained visibility into the group's methodology by locating an unsecured operational infrastructure server. Curtis Simpson, Gambit's chief strategy officer, stated that this server contained full transcripts of the interactions between the unauthorized users and the two commercial LLMs.

Gambit assessed the group to be small—likely fewer than five individuals, with no apparent nation-state affiliation or financial motive. They maintained access to the government systems since at least December. Despite lacking advanced technical sophistication, they effectively leveraged the AI platforms to navigate the target environments.

Once the models' guardrails were bypassed, the AI platforms efficiently identified critical assets such as digital certificates, architectural diagrams, and exploitable misconfigurations. "They were fundamentally using Claude as a flashlight and would have been otherwise poking around in the dark until they gave up," Simpson observed.

Autonomous capability expansion

A notable characteristic of the incident was the AI systems' ability to iterate on tasks without explicit, step-by-step instruction. In one instance, the operators directed the models to test a specific set of compromised credentials.

When the initial credentials failed, the AI system autonomously shifted tactics. Simpson explained that the AI proceeded to enumerate identities within Active Directory, applied alternative methods to achieve access, and ultimately succeeded—exceeding the original prompt parameters.

"The level of work that it was doing on the [user's] behalf without even being asked" is a critical development, Simpson noted. He added that this dynamic enables less experienced groups to achieve results previously limited to advanced persistent threats. Tasks that traditionally required significant time and manual expertise can now be executed in minutes or hours.

Commercial LLMs remain the primary choice for these activities. Ruiz pointed out that there is little concrete evidence showing widespread adoption of specialized "dark LLMs." Tracking these deployments remains a challenge, as commercial models, once their guardrails are bypassed—offer superior capability and reliability.

To safeguard environments against AI-augmented intrusions, we recommend organizations focus on systemic resilience rather than relying solely on static prevention. Implementing strict identity-based access controls, segmenting sensitive environments, and enforcing the principle of least privilege are essential steps. Furthermore, maintaining comprehensive observability across network environments ensures that when unauthorized access attempts occur, defenders have the visibility needed to isolate and remediate the activity immediately.