I was working on a project early this year and the goal of this project was to scrape all sorts of information about a specific company (usually a competitor) and report back to the user if there was any significant change worth mentioning about.
But while I was developing this project I noticed that something has changed on the internet since 2022.
The internet used to be more open, or at least not playing hard to get. But since almost every site got robbed of its data as AI training food… the web had to make some changes.
What makes an AI an AI?
It’s all about access to data. And lots of data… What is the easiest way to get access to lots of data? The internet.
What most people call “AI” today is actually powered by something called a Large Language Model (LLM). LLMs are part of the machine learning family, a branch of AI that trains systems to recognize patterns in data rather than relying on hand-coded rules. These models “learn” by consuming massive amounts of text and adjusting internal parameters to predict and generate human-like language based on that training.
LLMs don’t make decisions, despite what many believe. They simply predict the most likely next word based on context. It feels like decision-making, but it’s really just statistical pattern-matching.
In a way, what we often do as humans, like the classic 'How are you? I’m fine, and you? I’m great, just doing some work...' chitchat, is basically a pattern of LLM behavior. It’s so deeply imprinted in our daily dialogue that we don’t even notice it anymore. We're running our own little scripts, just like the machines.
The problem is not necessarily compute power, or chips
In this article, I want to focus on the things that are harder to buy. Compute power or chips can be acquired through partnerships or large-scale investment in manufacturing. Those challenges, while real, are solvable.
What’s far more critical and often overlooked is the untapped potential within Europe itself. We already have deep expertise and serious talent in this field. If only Europeans were better incentivized to collaborate, and a bit more proud of the world-class companies we’ve already built, we’d realize how much power we’re sitting on.
Once upon a time, the internet was a free-for-all library of human expression. Then AI showed up, and suddenly everyone started to close the doors.
Here’s how the biggest, richest data platforms shifted gears post-AI hype:
Twitter (X)
Before: Open APIs, real-time global dialogue, perfect for training language models and sentiment analyzers.
After: Twitter monetized its data. Today, you’ll pay $5,000/month just to read 1 million posts via API. And that’s pocket change compared to enterprise deals. xAI’s Grok model? It’s fueled by this very data. But can you blame him? Elon paid a fair amount of money to acquire this company(data).
Before: A goldmine of human discussion and opinion, freely accessible and indexed.
After: API now behind a paywall. In 2023, Reddit began charging for access, triggering a backlash from open source developers and researchers. Why? Because LLMs were vacuuming up the content. ($0.24 per 1,000 API requests)
Before: Public profile data used for labor market analysis, job matching AI, and economic research.
After: Locked down. Aggressive anti-scraping tech and legal enforcement. Owned by Microsoft who, not coincidentally, also backs OpenAI.
Crunchbase
Before: Open access to startup and funding data, ideal for training market intelligence AI.
After: Paid plans, gated APIs, strict usage restrictions. Automating insight? Only if you're paying.
Meta Platforms (Facebook/Instagram)
Before: Public-facing posts and comments were quietly scraped by third parties.
After: Now legally fortified against scraping. Meta wants its data to feed its models, not yours.
Strava
Before: Fitness data from workouts, routes, and social sharing could be used by third-party developers for training analysis, community tools, or route suggestions.
After: As of November 2024, Strava explicitly bans use of its API for AI/ML training, putting a hard stop on aggregated fitness datasets.
Spotify
Before: Developers could access playlists, song metadata, and user tastes via public APIs to build recommendation systems or trend tracking tools.
After: Starting late 2024, Spotify blocked new apps from accessing this data, citing security, significant shifts were made without notice.
You get the point.
The AI gold rush turned data into currency. Platforms that once supported innovation through openness are now locking down APIs or restricting third-party usage, often citing privacy, monetization, or protecting their own model-building.
No data = No AI.
How are current AI’s getting smarter without access to new data?
Welcome to the year of “AI Agents”.
An AI agent is basically a web scraper but that behaves like a real human. It's a technology that literally creates an instance of an operating system, a browser and a user-like behavior that goes for you to a website and looks up the thing you need and knows it did a good job or not if you stop asking questions or provide it feedback.
This makes it even look like an ethical thing, as if the scraper was designed to act as a human but in the process it just feeds the algorithm. Genius!
For the techs interested in these scraping technologies, I advise looking up: Playwright, Selenium, and Puppeteer
What can Europe still do?
If Europe is going to act, it needs to do it now, not after another round of whitepapers and regulation drafts.
Some possible directions which could potentially bring Europe back into the game:
A serious European AI innovation fund
Let’s stop pretending individual EU countries can match US or Chinese investment alone. We need a pan-European fund dedicated to AI infrastructure and business-critical technology.
Take the US: Trump kicked off his run with a $500 billion investment in AI infrastructure. Europe should respond with a similar scale of ambition not just to stay relevant, but to lead.
This isn’t about subsidies. It’s about fueling the next generation of sovereign AI platforms, infrastructure, and companies built on European values, standards, and trust.
A European data enrichment plan
AI eats data. So let’s feed it, ethically, transparently, and in a way that benefits everyone, not just a handful of tech giants.
What if we incentivized people across Europe to become part of the AI learning process?
- Give every European access to a public, EU-backed API.
- Let them contribute knowledge, validate facts, share experiences, and train models.
- Pay them for their data and expertise.
It’s Web3 with a purpose. People earn because they’re helping build a smarter European AI. This isn’t passive surveillance, it’s active participation.
Examples:
- Answer 10 questions a day with yes/no to validate information.
- If you’re trained in law, medicine, education, or other… add expert context.
- Share your story about buying a house, getting a driver’s license, or just your favorite shampoo brand.
To encourage participation at scale:
- Offer tax breaks to individuals and businesses that contribute (anonymized) data.
- Tie this to a blockchain-backed European digital currency, and boom: you’ve just laid the foundation for a universal basic income powered by AI participation.
Yes, it’s ambitious. But so is building a digital Europe that matters in the next 50 years.
The window is still open… Barely
Europe has the brains, the values, and the talent. But it’s missing speed, coordination, and ambition. The AI race isn’t about who writes the most papers, it’s about who builds, ships, and scales faster than the next geopolitical curveball.
We’ve already watched the cloud slip away. We were spectators during the social and mobile revolutions. Let’s not sit this one out, too.
Because this time, the consequences aren’t just economic. They’re cultural, educational, societal. If Europe wants to help shape the values, ethics, and boundaries of the AI-driven future, it needs to stop watching and start building.
Europe, if you’re reading this… Act now. Or don’t complain when your digital future is written by someone else.