Building a world-class AI model takes months of work, millions in compute costs, and proprietary data that took years to accumulate. Model theft can replicate much of that value in days — using nothing but the model's public API.
Model theft is the process of reconstructing a proprietary AI model's behaviour by querying it extensively and using those responses to train a replica. The attacker never gains access to the model's weights, its training data, or any internal system. They simply ask it enough questions and use the answers to build their own version.
The result is a functionally similar model that the attacker owns outright — at a fraction of the development cost. For organisations whose competitive advantage rests on AI capabilities they have invested heavily to build, model theft is an intellectual property risk as serious as any traditional form of corporate espionage.
What makes it particularly difficult to address is that the attack is almost impossible to distinguish from legitimate use — until the replica appears in the market.
Training a high-performing AI model is expensive. It requires large volumes of high-quality data, significant compute resources, specialist expertise, and considerable time. The resulting model — and the proprietary knowledge encoded in it — represents substantial investment and, for many organisations, genuine competitive differentiation.
Model theft, sometimes called model extraction, exploits the fact that a deployed model's behaviour is observable even when its internals are not. By submitting a large, carefully chosen set of inputs and recording the corresponding outputs, an attacker builds a labelled dataset that reflects the original model's decision-making. They then train a new model on that dataset — and the replica learns to approximate the original's behaviour without ever accessing the original's architecture, weights, or training data.
The fidelity of the replica improves with the number and quality of queries. With enough queries across a sufficiently diverse input space, a competent attacker can produce a model that is functionally near-identical to the original for the vast majority of real-world use cases.
Training a frontier AI model can cost tens of millions of pounds. A model theft attack that generates a few million queries — at pennies per query on most commercial APIs — can produce a replica for a few thousand pounds. The economics strongly favour the attacker, which is why model theft is increasingly common in competitive markets where AI capabilities are commercially valuable.
A fintech company spends two years building a fraud detection model trained on proprietary transaction data. They offer it as a commercial API to banks. A competitor signs up for the API, submits millions of synthetic transactions across every combination of parameters the model accepts, and records the fraud probability scores returned for each. They train a replica model on those scores. Six months later the competitor launches a fraud detection product with near-identical performance — having spent almost nothing on model development.
A company fine-tunes a large language model on years of proprietary customer service interactions, producing a model that handles complex domain-specific queries with unusually high accuracy. A third party systematically queries the model across thousands of domain-specific scenarios, capturing every response. They use those responses as training data for their own model. The resulting system replicates the original's domain knowledge — including the hard-won patterns learned from years of real customer interactions — without any of the underlying data.
In most forms of intellectual property theft, there is a clear point of compromise — a system that was accessed without authorisation, a document that was copied, a trade secret that was transmitted. Model theft has no such moment. Every query the attacker sends is indistinguishable from a legitimate API call. The model responds as it always does. No alarm fires. No log entry looks unusual. The theft occurs entirely within the bounds of normal, permitted usage.
The attack also scales in a way that traditional IP theft does not. Stealing a competitor's source code requires access to their repository. Stealing the functional equivalent of their AI model requires only an API key and sufficient query budget — both of which may be freely available.
Reverse engineering is the process of studying a finished product to understand and replicate its functionality without access to the original design. It is a well-understood practice in hardware, software, and manufacturing — legally constrained in many jurisdictions but technically straightforward given physical access to the product.
Model theft is reverse engineering applied to AI. The attacker studies the model's observable behaviour — its outputs — and works backwards to replicate its functionality. As with traditional reverse engineering, no access to the original design is required. The finished product, interacted with through its intended interface, provides everything needed to build a copy.
| Reverse engineering (traditional) | Model theft (AI) | |
|---|---|---|
| Access required | Physical access to the product, or a legitimate copy of the software constrained | Only API access — which may be publicly available and freely obtained unconstrained |
| Evidence of attack | Physical possession of the product, software installation, or decompilation leave traceable evidence | API queries are indistinguishable from legitimate use — no forensic trace of the theft in standard logs |
| Cost and effort | Typically requires significant engineering time, specialist tooling, and iterative analysis | Automatable at scale — query generation and replica training can be largely automated with modest investment |
| Legal framework | Reverse engineering is explicitly addressed in IP law, trade secret law, and software licensing in most jurisdictions established | Whether model extraction via API constitutes IP infringement is actively contested — case law is sparse and inconsistent unsettled |
| Fidelity of replica | Full reverse engineering can produce exact replicas — but typically requires considerable effort to reach that fidelity | Fidelity is directly proportional to query volume — with sufficient queries, functional equivalence is achievable for most use cases |
| Detection | Decompilation, licence violations, and physical access can be detected or legally enforced detectable | Extraction queries look like normal usage — detection requires behavioural analytics specifically designed for this pattern very difficult |
The legal gap is particularly significant for organisations trying to protect their AI investments. A competitor caught decompiling proprietary software faces clear legal consequences. A competitor who queried a public API a million times and trained a replica model on the results is in genuinely contested territory — one that courts and regulators are only beginning to work through.
Implement strict per-account and per-IP rate limits that make large-scale extraction economically and practically prohibitive. The goal is not to prevent all querying — it is to ensure that the volume required for a high-fidelity extraction attack takes long enough, and costs enough, to deter opportunistic attackers and surface systematic ones through monitoring.
Introduce small, carefully calibrated perturbations into model outputs — rounding confidence scores, introducing minor response variations — that do not meaningfully affect legitimate use but degrade the quality of training data generated by an extraction attack. A replica trained on perturbed outputs is a less accurate replica.
Embed an imperceptible watermark into model outputs that transfers into replica models trained on those outputs. A detectable watermark in a competitor's model is forensic evidence of extraction — and significantly strengthens any legal or contractual claim against the attacker.
Require all API users to authenticate with verified identities and agree to terms of service that explicitly prohibit model extraction, replica training, and commercial use of outputs for competitive model development. Contractual prohibition does not prevent the attack — but it transforms it from a legal grey area into a clear breach with enforceable consequences.
Deploy behavioural analytics on API usage to flag accounts whose query patterns match the signature of an extraction attack — high volume, systematic input variation, uniform coverage of the input space. Flag these accounts for review and consider graduated responses: throttling, challenge verification, or access suspension.
For models where the architecture and weights themselves are highly sensitive, confidential computing environments can ensure the model runs in a protected enclave that prevents even infrastructure-level access to weights. This does not prevent extraction via API queries, but it closes the more direct route of internal access by a malicious cloud provider or insider.
Model theft reframes AI as an intellectual property challenge as much as a security one. Organisations that have invested heavily in building proprietary AI capabilities need to treat those capabilities with the same rigour they apply to any other valuable IP — not just protecting the data and the weights, but actively monitoring for and deterring extraction through the model's own interface.
Next in this series: backdoor and Trojan attacks — how malicious behaviour can be embedded into a model during training, lying dormant until a specific trigger activates it.