Two questions arise frequently now: does the AI tool you are using actually save your data, and is that data being used to train the next version of the model?
These used to be questions for technical teams. Today, they are board-level questions, especially when companies upload strategy documents, financials, product roadmaps, customer information, or any other information they would never willingly publish online.
The honest answer is that it depends.
It depends on the provider, the product, the plan, the endpoint being used, the jurisdiction in which the infrastructure sits, and one distinction that is often missed: whether “we do not train on your data” also means “we do not retain your data.”
Those are two very different promises.
A Quick Tour Through the Big Names
OpenAI says API data is not used to train its models unless the customer opts in. By default, however, API inputs and outputs may still be retained for up to 30 days for abuse monitoring, unless a customer has an approved Zero Data Retention arrangement or a longer retention period is legally required.
Anthropic also offers Zero Data Retention for eligible Claude API customers. Even then, retention is not an absolute concept in every scenario: data may still be retained where required by law or where misuse investigations apply, and some features have their own retention rules. The practical point is that “not used for training” and “not stored” still need to be checked separately.
Google Cloud says customer data in Vertex AI is not used to train or fine-tune models without permission. That is important, but it does not mean every feature has the same retention footprint. Grounding with Google Search, for example, stores prompts, contextual information, and outputs for 30 days, while Gemini Live API session resumption can cache data for up to 24 hours when enabled. Google’s broader deletion documentation also makes clear that deletion from active systems can take around two months, with encrypted backups expiring on a longer cycle.
DeepSeek is a different conversation. Its privacy policy states that personal data may be collected, processed, and stored in China. For companies dealing with sensitive commercial information, that is not a small operational detail. It is a governance question.
The point is not to choose a hero and a villain. The point is that AI privacy is not one checkbox. It is a stack of decisions about training, retention, deletion, jurisdiction, infrastructure, and legal exposure.
Jurisdiction Does a Lot of the Heavy Lifting
Where data is processed and who controls the provider matter more than many users realise.
In the United States, the CLOUD Act does not give the government unlimited access to every file in the cloud, but it does create a legal pathway for authorities to request data from certain service providers, including data stored outside the US in some circumstances. In China, the National Intelligence Law requires organisations and citizens to support and cooperate with national intelligence work in accordance with the law.
The EU takes a different approach. GDPR Article 5 is built around principles such as purpose limitation, data minimisation, storage limitation, integrity, confidentiality, and accountability. In plain language, that means collecting less, keeping it for less time, using it only for the purpose stated, and being able to demonstrate compliance.
How We Handle It at DDScore
That is the principle behind our ZeroTrace Policy at DDScore.
When materials are uploaded, they are used to generate the report and then deleted immediately after processing. The finished report remains available for download for up to 24 hours, after which it is deleted as well.
We kept that window for one practical reason: users wanted to run reports in the evening and still be able to download them the following morning.
Your data is not used to train AI models. Processing takes place on EU infrastructure in Finland.
This is not a magic privacy spell. It is simply a workflow designed to keep retention as short as possible while still making the product usable. That matters because the materials uploaded into a due diligence tool are not casual prompts. They can include strategy, financials, customer lists, product roadmaps, cap tables, fundraising assumptions, and the reasoning behind a potential deal.
Be Honest About the Modern Stack
Most companies already operate across a wide digital stack: GitHub, Slack, LinkedIn, Meta, TikTok, analytics platforms, CRMs, ad networks, email tools, meeting tools, and cloud infrastructure. Data about the company, its people, and its operations is already being processed in many places. Pretending otherwise is not a serious privacy strategy.
The realistic question is not whether a company can operate with zero digital exposure. For most companies, the answer is no.
The better questions are:
- Whose law applies?
- What data is being retained?
- Does that retention need to exist at all for the task at hand?
“Move fast” still matters. Global competition has not become easier. But moving fast is not only about working harder or adopting every new tool as quickly as possible.
It is about having better information before the decision has to be made, while being disciplined about what sensitive data is left behind.
That is what DDScore is built for: giving investors, founders, and evaluators a faster, structured read on private-company materials without leaving an unnecessary trail of sensitive data behind.
Use Structured Analysis Without Leaving an Unnecessary Data Trail
DDScore is built for confidential company materials, with EU-based processing, no training on submitted data, and short retention by design.
Get in Touch