Sovereignty Doesn't Live in the Data Center

In short

On June 3, 2026 the Commission published the Cloud and AI Development Act, read by everyone as a law about European data centers. But the real center of gravity isn’t where the data sits: it’s the verifiable provenance of the code. Fifteen days later one lone researcher, with a script, turned up roughly ten thousand GitHub repositories posing as honest projects by copying real history and real authors, and GitHub filed the fix under “feature request.” The SBOM mandated by the Cyber Resilience Act (exploited vulnerabilities reportable within 24 hours from September 11, 2026, a machine-readable bill of materials mandatory from December 11, 2027) is where sovereignty touches the ground, but only if there’s a signed attestation chain underneath it (SLSA, in-toto, Sigstore) generated by an isolated environment the attacker doesn’t control. Open doesn’t mean inspected. Sovereignty isn’t a place, it’s a property you keep having to prove.

The Cloud and AI Development Act, which the Commission published on June 3 inside its technological sovereignty package, lends itself to a comfortable reading. It’s the geographic reading: megawatts and permitting timelines, where the bytes physically end up, which American provider will be shut out of European public tenders. The reading is defensible, and the numbers back it. The market share of European cloud providers fell from around twenty-nine percent in 2017 to around fifteen percent in 2022, the capacity shortfall in data centers is documented, the stated ambition is to triple that capacity over five or seven years, and the first of the regulation’s four assurance levels literally requires that data be processed and stored on European infrastructure. Hence the interpretation circulating at every table these past weeks, the most reassuring one imaginable: sovereignty means the data has come home.

Beyond the First Level

The interpretation has a flaw, and the flaw is that it stops at the first level. Read further and the regulation’s center of gravity shifts to a far less photogenic spot. The second level doesn’t ask where the data sits. It asks for demonstrable independence from third countries and transparency over the software supply chain. The fourth level asks for full control over that same chain, with no external interference on the code that runs. Then there’s a subtler mechanism, living in the procurement rules: administrations will have to weigh the so-called Union added value as a non-economic award criterion, that is, how much a supplier contributes to the resilience of the European digital chain. The regulation specifies that this criterion must stay ancillary and not decisive, with a maximum weight the recitals place at around fifteen points out of a hundred and twenty, yet it still introduces a structural advantage for whoever has research and production inside the borders. And then there’s the part that touches my work most directly, because it steps outside the public-sector perimeter: the regulation gives the Commission the power to extend, through delegated acts, the same sovereignty-assessment obligations to private firms in the sectors already regulated by NIS2, meaning banks, energy, telecoms, healthcare. Anyone building software for those sectors would do well to read past the first level.

The bet Europe placed at the start of June, under the infrastructural veneer, isn’t geographic. It’s a bet on the verifiable provenance of code. And it doesn’t arrive now by chance. These are the same weeks in which the continent felt firsthand what it means to depend on decisions taken elsewhere, when a unilateral export-control measure made frontier models already in use by thousands of people unavailable overnight. The CADA is, in large part, the structural reflex of that perceived dependence. The implicit question running through it isn’t where you keep your data, but whether you’re able to know, and to prove, what’s inside the software you run. Worth keeping that word in mind, provenance, because in the same fifteen days someone showed just how far down it goes.

Ten Thousand Repositories in Disguise

On June 18 a researcher writing under the pseudonym Orchid published the results of a months-long investigation. The story began in the most banal way: searching for their own project on a search engine, Orchid found in its place a clone with an identical name and description, hosted by a different account. From there the hunt began, run with a single Python script that combed sixteen million push events logged in GitHub’s public archive in search of a recurring signature. The result is a number that in April, in an earlier analysis by other researchers, stopped at a hundred and nine repositories, and that by June had reached roughly ten thousand, with the campaign active since at least early 2025. They all distributed the same malware family, a loader that opened the way to a credential-stealing program. The detail that makes the difference, and that most of the coverage treated as a technical aside, is how these repositories made themselves credible. They weren’t forks. They were clones built from scratch, with the commit history copied wholesale from the original project and the attribution left intact on the real authors, so that clicking on the contributors brought up genuine accounts with genuine history. The only change was a link in the description file pointing to a compressed archive. The thing that made them look trustworthy was exactly the provenance signal we’ve leaned on for ten years: the long history and the recognizable authors, the air of a project that has lived.

Some of these repositories had survived for over a year, and they pulled it off with a trick that says a lot about how automated trust works. They periodically deleted the last commit and republished an identical one, always with the same title, every few hours. That was enough to confuse the platform’s security algorithms, tuned to flag suddenly suspicious activity rather than quiet, prolonged anomaly. But the part that should disturb more than the number is GitHub’s response to those who, before Orchid, had flagged the mechanism through official channels. The platform dismissed the reports, arguing that backdated commits are a legitimate use case for people working offline, and classified making the real push history more visible as a feature request, not a security fix. Translated: the platform that hosts almost all of the world’s open source decided that the provenance problem isn’t its problem. Orchid, met with silence, published both the detection tool and the full list of identified repositories, because reporting them one by one was materially impossible.

A Line, Not an Incident

It’s worth placing this story in a line, because on its own it looks like an incident and lined up with the others it reveals a direction. In 2020 SolarWinds showed you could hit thousands of organizations by compromising a single upstream supplier. In 2024 the XZ Utils case raised the patience bar, with an attacker who had spent months earning an open source project’s trust to plant a backdoor as a legitimate maintainer. 2026 moved the target further still. In May the campaign dubbed Megalodon injected malicious workflows into more than five thousand repositories in a few hours, designed to steal the secrets of continuous-integration pipelines and any cloud credentials within reach. In the same days, GitHub itself confirmed that a device belonging to one of its employees, infected through a malicious extension for a widely used editor, had allowed the exfiltration of roughly three thousand eight hundred internal repositories. Orchid’s campaign moves along a different but complementary axis: it doesn’t compromise a real project, it fabricates fake ones that wear the clothes of the real ones. Sequenced, these stories tell a single movement. The attack surface has shifted from the code to the metadata of trust, and from human hands to automated pipelines. While we moved development toward orchestrating agents that clone and run code without the friction of human judgment anymore, the adversary learned to fake the very signals that friction used to read. And this isn’t a figure of speech. Orchid noticed that many of those repositories were built to lure agents, not just people. The mechanism was described by parallel research into the vulnerabilities of coding assistants: poisoned configuration files, the instructions tools like Copilot or Cursor read on startup to know how to behave, written so as to silently hijack all the code generated from that point on in the session. The malicious instruction survives the repository copy and propagates to downstream projects. When the one cloning is an agent that then writes code in your place, the fake provenance signal stops fooling a distracted human and starts contaminating the very source of what will be produced.

Open Doesn’t Mean Inspected

At this point the lazy columnist already has the piece written, and the headline too. Open source first meets the open source nobody verifies. European sovereignty built on inspectable code, refuted in the very same fifteen days by ten thousand projects that are inspectable and toxic. It’s an irony that holds for the length of a post and not a millimeter beyond, because it rests on a confusion worth dismantling calmly.

The argument for open source as the foundation of sovereignty is correct, and it deserves to be taken seriously before anyone tries to dent it. A component whose sources are public and whose dependencies are declared starts from a better position than a proprietary stack you can’t look inside. It’s the thesis SUSE put precisely when commenting on the regulation: open infrastructure, where every component is inspectable and every dependency is declared, satisfies that transparency requirement in a way closed software doesn’t reach. True. Orchid’s campaign doesn’t dismantle this thesis, it moves it one step over, onto ground the thesis didn’t cover. Open means inspectable in principle. It doesn’t mean inspected. And at the scale of hundreds of millions of repositories, the principle is precisely where the attacks nest. Trust, in software development, is an economy like any other, and like any other it tries to save. Actually reading the code of every dependency we import would cost more than the time we have, so we almost never do it. We read the signals around it: the star count, the history, the authors, the position in search results. Those signals are a proxy, a delegation we sign in three seconds every time we clone something. Orchid’s attackers didn’t break our ability to read the code. They learned to mint the coin we pay trust with so we don’t have to read it. The forgery isn’t about the source, it’s about the apparatus that excuses us from looking at it.

The SBOM Becomes Law

Here the discussion stops being philosophical and becomes a date on the calendar. The tool that turns the transparency the regulation promises into something operational exists, has an inelegant name and a by now precise legal basis. It’s software’s bill of materials, the SBOM, and with the Cyber Resilience Act it becomes a legal obligation for the first time, no longer just a best practice. The dates are close. From September 11, 2026 anyone placing a product with digital elements on the European market must report actively exploited vulnerabilities within twenty-four hours. From December 11, 2027 they must keep, in the technical documentation, a machine-readable SBOM covering at least the top-level dependencies and staying current for the whole support period of the product. The fines are scaled on the GDPR model, up to fifteen million euros or two and a half percent of worldwide turnover. And there’s a detail many discover only during the first assessment: you can’t report what you don’t know you have. The September 2026 vulnerability deadline makes the SBOM a practical prerequisite right away, well before it becomes formally mandatory, because without an accurate dependency inventory the twenty-four-hour window is impossible to meet.

Having worked for a while on tools of this kind for clients in regulated sectors like healthcare and public administration, the misunderstanding I run into most often is that the SBOM is the deliverable. It isn’t. The SBOM is the question. And here it’s worth conceding right away to anyone who knows the field, because the objection is sound and should be defused before it’s raised. A bill of materials, taken for what most organizations make of it, doesn’t even see the Orchid attack. Those repositories don’t slip a malicious dependency into your manifest, they convince you to clone and run a project wearing someone else’s clothes. A scanner that compares your dependency tree against a database of known vulnerabilities passes straight through without noticing a thing, because the deception doesn’t live in the list, it lives in the origin of what the list enumerates. And that’s exactly the point. The useless version of the SBOM, the one produced first almost everywhere, is a freeze-frame: a static list of components, maybe a PDF, generated the night before the audit and never touched again, that says what you put in and stays silent on where it came from and whether it’s really what it claims to be. The ten thousand repositories of the campaign would pass a check like that without batting an eye. The question that counts isn’t which components are there, it’s whether each of them really comes from where it claims to, and that question isn’t answered by a document, it’s answered by a verification run at the exact moment the component enters the build.

Who Signs the Declaration

And this is where the SBOM, on its own, isn’t enough, because an SBOM is a declaration, and a declaration can be faked as easily as a commit history. The real question backs up a step: who signs that declaration, and how hard is it for an attacker to sign a fake one. The answer, for anyone working in supply chain security, runs through a layer of tooling that has matured a great deal in recent years. The reference model is called SLSA and defines, by levels, how trustworthy the story of how an artifact was built is. That story is expressed in a signed attestation format called in-toto, and genuinely signed thanks to Sigstore, which avoids the burden of managing long-term keys by tying each signature to an identity verified on the spot and logging the trace in a public, immutable ledger. The crucial point of this setup isn’t the cryptography itself, it’s who holds the pen. Self-certification is worth nothing: if it’s the build script itself that generates the proof of its own honesty, an attacker who controls the script can make the proof say whatever it wants. We saw it a few months ago in an attack on the package chain that carried cryptographically valid provenance attestations, produced however by a build platform that didn’t guarantee the isolation required by the higher SLSA levels. Provenance becomes credible only when what generates it is an isolated environment, controlled by the platform and not by the user, so that not even whoever has access to the configuration can lie. It’s no accident that reproducible builds, the ability to rebuild the same artifact bit for bit from the same sources, were removed from recent versions of SLSA: they were the theoretical ideal, and they turned out to be too hard to achieve in practice. Verification, too, has its regress. Someone has to certify the certifier, and the only solid answer is to move that someone out of reach of anyone with an interest in cheating.

And here the irony the lazy columnist had glimpsed comes back, far sharper than it was left. The attestation layer I’ve just described is mature. GitHub itself natively offers the generation of signed provenance proofs for the artifacts that come out of its pipelines, and reaching a decent assurance level is today a matter of an afternoon’s work, no longer a weeks-long project. The platform, that is, provides the cryptographic lock for the package you produce and at the same time leaves forgeable by choice the nameplate on the door of the repository that package seems to come from. It makes available the tools to prove an artifact comes from a certain build, and refuses to make it harder to fake that an entire project comes from a certain author. The two things live at the same address and don’t speak to each other.

Where Sovereignty Lives

So back to the question we started from, where sovereignty lives. Not in the data center, which is the easy part to legislate and the easy part to measure, and that’s exactly why everyone talks about it. The geography of bytes is a problem solved on paper the moment you write it into a tender spec. The provenance of the components running inside that data center is a problem that has to be proved anew at every build, on every dependency, against an adversary with every interest in looking compliant. Europe has made provenance, perhaps without saying so all the way, the load-bearing wall of the whole building. But the language of the regulation names the result and stays silent on the mechanism. It asks for transparency over the supply chain and doesn’t say signed by whom, or verified against what criterion at the moment of release to production. The distance between having an SBOM and being able to prove the SBOM tells the truth is the whole game, and it’s exactly the distance that this same fortnight in June laid out in the open. On one side a regulation betting on the verifiable chain, on the other a lone researcher who with a script proves the chain, today, can’t attest to itself.

What’s left to say is the uncomfortable part. Open source first is the right principle and an empty one until open is paired with attested. Transparency of sources is a necessary condition of sovereignty and in no way a sufficient one, because between being able to look and having looked opens the space where an adversary builds ten thousand repositories nobody looks at. Sovereignty isn’t a place you reach, it’s a property you keep having to prove, and it stops existing the instant you stop verifying it. The SBOM isn’t a piece of compliance furniture to keep in a corner for the auditor, it’s the point where the word sovereignty touches the ground, provided there’s an attestation chain holding it up underneath.

That GitHub called a feature the fix that would have made provenance harder to fake is the most telling detail of the whole story. The platform decided it isn’t the one to build the verification layer. Which means that, for anyone taking the language of the European regulation seriously, that layer becomes work to be done elsewhere, in the pipelines and the practices of whoever ships software for the sectors the law is about to tighten. The European bet on supply chain sovereignty is, underneath it all, the bet that someone builds the attestation layer the platform refuses to build. That layer is the real work. Everything else is megawatts.

Key takeaways

The CADA isn’t a geographic law. The first assurance level asks for European infrastructure, but from the second level up the regulation demands demonstrable independence and transparency over the software supply chain. The European bet is on the verifiable provenance of code, not on megawatts.
The Commission can extend the sovereignty-assessment obligations, through delegated acts, to private firms in the NIS2 sectors: banks, energy, telecoms, healthcare. Anyone building software for those sectors is already inside the perimeter, even outside the public sector.
Open means inspectable, not inspected. The ten thousand repositories of the Orchid campaign didn’t break our ability to read the code: they faked the provenance signals (history, authors, position in results) we lean on so we don’t have to read it.
The SBOM mandated by the CRA is a deadline, not a best practice: exploited vulnerabilities reportable within 24 hours from September 11, 2026, a machine-readable bill of materials mandatory from December 11, 2027, fines up to €15 million or 2.5% of worldwide turnover. You can’t report what you don’t know you have.
An SBOM is a declaration, and a declaration can be faked. You need a signed attestation chain (SLSA, in-toto, Sigstore) generated by an isolated environment that not even whoever controls the configuration can tamper with. That layer is the one GitHub refuses to build: it’s the real work, and it’s work to be done in the pipelines of whoever ships software to regulated sectors.

Questions & answers

What is the Cloud and AI Development Act (CADA)?

It’s the draft regulation the European Commission published on June 3, 2026 within its technological sovereignty package. It was born from precise numbers: the market share of European cloud providers fell from around 29% in 2017 to around 15% in 2022, and the stated ambition is to triple European data center capacity within five to seven years. But beyond the first assurance level, which asks for European infrastructure, the regulation demands demonstrable independence from third countries and transparency over the software supply chain.

Does the CADA only concern the public sector?

No. The regulation gives the Commission the power to extend, through delegated acts, the same sovereignty-assessment obligations to private firms in the sectors already regulated by NIS2: banks, energy, telecoms, healthcare. Anyone building software for those sectors would do well to read past the first level.

What is the campaign Orchid uncovered?

On June 18, 2026 a researcher writing under the pseudonym Orchid published a months-long investigation: roughly ten thousand GitHub repositories distributing the same malware family while posing as honest projects. They weren’t forks but clones built from scratch, with the commit history copied wholesale and the attribution left intact on the real authors. Some had survived for over a year by republishing an identical commit every few hours, to confuse security algorithms tuned for suddenly suspicious activity rather than for quiet, prolonged anomaly.

What is an SBOM and when does it become mandatory?

The SBOM (Software Bill of Materials) is software’s bill of materials, the list of components and dependencies. With the Cyber Resilience Act it becomes a legal obligation for the first time: from September 11, 2026 anyone placing a product with digital elements on the European market must report actively exploited vulnerabilities within 24 hours; from December 11, 2027 they must keep a machine-readable SBOM in the technical documentation. Fines reach €15 million or 2.5% of worldwide turnover.

If I have an SBOM, am I safe from provenance attacks?

No. An SBOM taken as a freeze-frame, a static list generated the night before the audit, doesn’t see Orchid’s campaign: those repositories don’t slip a malicious dependency into your manifest, they convince you to clone and run a project wearing someone else’s clothes. And an SBOM is a declaration anyway, and declarations can be faked. You need a signed attestation chain (SLSA, in-toto, Sigstore) generated by an isolated environment controlled by the platform and not by the user, so that not even whoever has access to the configuration can lie.

Andrea Margiovanni

I help public bodies and private organizations read their own infrastructure dependencies. Digital sovereignty is a lattice, not a flag; and it is measured more on contracts than on speeches.

See the guide

Beyond the First Level

Ten Thousand Repositories in Disguise

A Line, Not an Incident

Open Doesn’t Mean Inspected

The SBOM Becomes Law

Who Signs the Declaration

Where Sovereignty Lives

Key takeaways

Questions & answers

The author

The Digital Omnibus Is Not an Amnesty

Before It Becomes Irreparable

Other People's Privacy