🔒 How to stop AI from using your data?
Where Privacy and AI collide, how can we respect people?
The year 2025 marks a critical point at the intersection of artificial intelligence (AI) and privacy. As AI technologies advance, new concerns and regulatory challenges emerge.
The recent controversy involving LinkedIn1 has highlighted issues regarding the use of personal data to train AI models. The platform faced criticism for automatically enrolling users in its generative AI training initiatives without explicit consent. This case is not an isolated incident; it reflects a worrying trend among tech giants using user data to enhance AI capabilities without adequate transparency.
Is it better to apologize than to ask for permission?
I recently read an article in Wired2 about how user data is being used to train models and ways to minimize this use.
Avoiding it completely? Utopian. Many companies are gathering massive amounts of data from the internet, including content posted by users on social media, blogs, and other public online spaces. The main concern is that this data is often used without the creators' or owners' knowledge, raising ethical and legal questions, such as intellectual property rights.
But to what extent does using content from posts to train models actually harm data owners?
"Oh, but you posted it willingly, the data was always there..."
Yes, but before, we didn’t have today’s processing power or models capable of directly impacting our lives through applications in various fields.
In my view, a simple post used to train a model doesn’t necessarily harm the person. But if an application built on that model uses it unethically to deliver a service result, that’s where the real problem lies.
For example, let’s say you made a post about the increase in the number of Muslim immigrants in France. The post doesn’t even have to be controversial or critical—just reporting a fact. Now, imagine an AI agent responsible for profiling your personality to generate a risk score for a job application, a bank loan, or a visa approval. Do you think that, depending on the model’s (natural) bias, the agent analyzing your profile won’t lower your score?
Looking at it from this angle, a simple post can turn into a markdown of a person, with sensitive data inferred and computed through probability.
On social media, people will start acting like robots just to avoid being flagged as a risk.
One of the main issues under debate is how difficult it is to prevent your data from being used to train AI. A large portion of publicly available information on the internet is captured by web crawlers—programs that scan websites for content to feed AI models. While some platforms offer options for users to limit data collection, these settings are often unclear or hard to access. Plus, many websites lack transparent policies on how user data is handled, and even when they do, who ensures that the policy is actually followed?
In response to these practices, lawmakers and regulators are stepping up efforts to protect citizens' privacy. The European Union, for example, implemented the AI Act in August 2024, with specific provisions taking effect3 as of February 2025—essentially, right now.
This regulation bans certain high-risk AI applications, such as social scoring systems and emotional recognition technologies in workplaces.
In the United States, changes in federal administration have created uncertainty about the future of AI policies. With the expected repeal of the Biden administration’s Executive Order on AI, individual states have taken the lead in drafting laws to regulate AI data use, particularly when it comes to sensitive information like biometric data.
The rise of new AI players like DeepSeek has raised questions4 about the origins and destinations of training data. Authorities in various countries are demanding transparency regarding how these companies collect and use data.
There are several initiatives and tools designed to help users protect their data. Some organizations are developing methods to trick data collection algorithms by inserting noise into datasets or generating irrelevant content to confuse AI models, using various anonymization techniques such as Randomization, Noise Injection, and Data Perturbation. Another commonly mentioned strategy is the use of robots.txt files, which can block web crawlers from accessing specific sites or pages.
Block? Well, kind of… robots.txt only works if the web crawler actually respects it. But yeah, fair enough.
For companies, the challenge in 2025 is figuring out how to navigate this complex regulatory landscape while still innovating with AI. Experts recommend that businesses take a proactive approach by conducting AI inventories5 to assess the data being used, the consent obtained, and the decisions being made. Implementing AI governance policies, including risk management and human oversight, has become essential.
To make this process easier, there are already several solutions on the market designed to accelerate and automate AI governance, such as Privacy Tools—check it out here.
The protection of children's data is another major concern, especially after recent cases where AI chatbots encouraged harmful behaviors in minors6. This has led to much stricter scrutiny of AI system design choices and governance failures.
For individuals, how can they block or prevent AI from using their data?
1. Adjust Privacy Settings
Review privacy settings on services like Google, Microsoft, Meta (Facebook, Instagram, WhatsApp), Apple, and OpenAI. These are the main ones—focus on big tech first, then check other services.
Disable the option for "AI model training" data sharing whenever possible. Sometimes, it’s well hidden.
2. Delete or Download Your Data
Request data deletion using tools like:
Google Takeout (Google)
Your Facebook Information (Meta)
Privacy Dashboard (Microsoft)
Some companies allow you to delete AI training data, like ChatGPT conversations.
3. Use “Do Not Track” Tools
Enable "Do Not Track" in your browser.
Use tracking blockers like uBlock Origin, Privacy Badger, and Ghostery.
4. Choose Privacy-Focused Services
Use private search engines like DuckDuckGo or Brave Search.
Switch from standard email to ProtonMail or Tutanota.
Prefer browsers with built-in tracking protection like Brave or Firefox (Enhanced Tracking Protection).
5. Limit App and Service Permissions
Review app permissions on your phone and browser.
Disable microphone, camera, and location when not in use.
Avoid linking apps and social media accounts automatically.
6. Use VPNs and Proxies to Hide Your Data
VPNs like ProtonVPN, Mullvad, or NordVPN make IP tracking harder.
Proxies can help mask your browsing location.
7. Request Data Removal from Companies and Governments
Under LGPD (Brazil), GDPR (Europe), or CCPA (California), you can formally request companies to delete your data, especially if AI is using it without consent.
8. Avoid Posting Personal Information Publicly
Don't share ID numbers, addresses, emails, or phone numbers on social media.
Be extra careful with photos that reveal private details.
Never share sensitive data, even in so-called “secure” environments—nothing is truly secure.
9. Use Anonymous Identities and Temporary Emails
Use disposable emails (like TempMail or SimpleLogin) for one-time registrations.
Browse in private mode (not a game-changer, but better than nothing).
Avoid using Google or Facebook logins for third-party services.
10. Disable AI Chat and Voice Assistant Data Storage
In ChatGPT, turn off “Chat History & Training” in settings.
On Alexa (Amazon) and Google Assistant, disable voice history.
Is it enough? Never. These are just basic information security measures—nothing groundbreaking. But notice how those who do nothing end up being the most exposed.
Try a few experiments:
Search for your full name, CPF, phone number, or email on Google, Bing, and other search engines. Check the websites where your data appears and look for an option to request data removal. If there isn’t one, try contacting the company's DPO (Data Protection Officer). If that’s not available, find a contact form or customer support, but always make a formal removal request based on data protection regulations.
Google itself allows you to request the removal of your personal data:
As we move through 2025, privacy and security by design are becoming key pillars for effective AI risk management and digital resilience. Companies are under increasing pressure to adopt privacy-preserving techniques like federated learning, AI governance, and differential privacy, among others.
The 2025 landscape is one of continuous adaptation, with businesses and regulators striving to balance technological innovation and individual rights protection.
https://complexdiscovery.com/controversy-erupts-over-linkedins-ai-data-usage-policies/
https://www.wired.com/story/how-to-stop-your-data-from-being-used-to-train-ai/
https://2b-advice.com/en/2025/02/03/ki-regulation-ki-vo-that-applies-to-companies-from-february-2025/
https://www.malwarebytes.com/blog/news/2025/01/the-deepseek-controversy-authorities-ask-where-the-data-comes-from-and-where-it-goes
https://truyo.com/deepseeks-data-dilemma-the-overlooked-privacy-risks-in-ai-training/
https://www.dentons.com/en/insights/articles/2025/january/10/ai-trends-for-2025-data-privacy-and-cybersecurity