Web scrapers get blocked when a website’s anti-bot systems detect automated traffic and refuse it, returning an HTTP 403 (Forbidden), an HTTP 429 (Too Many Requests), or a CAPTCHA challenge instead of the requested page. A scraper that ran fine one day can stop working the next, with no change to its code, once the target site flags it.
The reasons why web scrapers get blocked rarely come down to a single clever detection. Almost every block traces back to three signals an automated client tends to leak: the rate of its requests, the quality of the proxies it routes through, and how consistently its fingerprint matches a real browser.
In short: Web scrapers get blocked because of three signals: request rate, proxy quality, and browser fingerprint. There is no universal requests-per-second threshold that is safe. The same rate runs clean through good residential proxies and trips an instant block through flagged datacenter IPs. Fix the weakest of the three signals, not just the speed.
3
Signals that get you blocked
429
"Too Many Requests"
403
"Forbidden": you're flagged
JA3
Your TLS fingerprint
The sections below cover each signal in turn: how it triggers a block, why no single request rate is universally safe, and the common techniques used to reduce detection.
The three reasons why web scrapers get blocked
Strip away the jargon and almost every block traces back to one of three causes. A scraper gets flagged because it is:
Making too many requests in a short time
Volume and speed that no human could produce. This trips rate limits and earns you a 429.
Using low-quality proxies
Datacenter IPs that are already on blocklists, shared by thousands of other scrapers.
Not covering your fingerprint properly
Headers, TLS handshake, and browser signals that don't match a real user.
That’s the whole list: rate, proxies, fingerprint. These three account for the overwhelming majority of blocks seen in production.
The mistake most people make is fixing one and ignoring the other two. You buy proxies, the blocks keep coming, and you blame the proxies. The proxies were fine. Your fingerprint gave you away.
Why there’s no magic requests-per-second number
People always want one number. “How many requests per second before I get blocked?”
I get why. It would make life easy. But I have to be honest: that number doesn’t exist, and anyone who gives you one is selling something.
The rate that’s safe depends on the other two signals. The same 10 requests per second that runs clean through residential proxies with real cookies will trip an instant 429 through flagged datacenter IPs with a Python-shaped fingerprint.
Bottom line:
Block rate isn't a fixed threshold. It's the product of three variables: proxy quality, session quality (cookies and headers), and fingerprint consistency. Improve the weak one and your safe rate goes up.
So when your scraper dies, don’t ask “was I too fast?” Ask “which of the three was weakest?” Usually it’s not speed. Speed is just the thing that finally tipped the balance.
Bad proxies are the most common cause
If I had to bet on why your scraper is blocked right now, I’d bet on the proxies. It’s the cause people underestimate the most.
Not all proxies are equal. The cheap ones are cheap for a reason.
Datacenter proxies
Cheap, fast, and easy to detect. Their IP ranges are known and often pre-flagged. Shared pools mean someone else already burned the IP you just got.
Residential proxies
Real ISP-assigned IPs from actual devices. Far harder to flag because blocking them risks blocking real users. More expensive, but they survive.
There’s a second proxy trap nobody warns you about: geography. Many sites serve different content, or block outright, based on where the IP says you are.
Scrape a US marketplace from a German IP and you may get a different page, wrong prices, or a hard block. The fix is country-pinning: routing each request through an IP inside the target country. You can do this with country-targeted residential proxies, or with a managed service that handles it for you (it’s how our Amazon data API routes marketplace requests).
The same applies to local listings, search results, and anything that varies by region. Route through the wrong country and you get data no real user there would ever see.
Your fingerprint gives you away
Here’s the signal that catches people who did everything else right. You rotated good residential proxies, you slowed down, and you still get blocked. Why?
Because your client looks nothing like a browser at the protocol level.
When any client opens an HTTPS connection, it sends its supported ciphers and extensions in a particular order. Chrome sends them one way. A Python requests script sends them another.
That order is a fingerprint, often expressed as a JA3 signature, and a server can read it before a single byte of your request body arrives.
The trap:
You can set a perfect Chrome User-Agent header and still get blocked, because your TLS handshake says "Python." The header and the handshake disagree, and that mismatch is the tell.
Fingerprinting goes beyond TLS. It includes your header order, whether you send the headers a real browser sends, HTTP/2 frame settings, and JavaScript signals like navigator.webdriver. Each one is a chance to look wrong.
This is the part of the arms race that never ends. Sites add a new signal, scrapers adapt, repeat. It’s genuinely exhausting to keep up with, and it’s the main reason teams eventually stop rolling their own.
Blocked vs. configured: a quick Python example
Here’s the naive version almost everyone starts with. It works on a friendly site and gets a 429 or 403 on a defended one.
| |
No real headers, no session, no pacing. To a defended site this is a flashing sign that says “bot.”
A more careful version reuses a session, sends browser-like headers, and paces itself. It survives longer, though it still won’t beat TLS fingerprinting on the toughest targets.
| |
It’s better. But notice what’s missing: a clean residential IP and a matching TLS fingerprint. Those two are the hard part, and they’re exactly what a managed search API takes off your plate.
When to stop building and use an API
Building your own scraper is a fine choice when scraping is your core product. You want control, you have the time, and the arms race is your job.
But if the data is a means to an end, the math changes fast. You didn’t set out to maintain a proxy pool and reverse-engineer JA3 signatures. You wanted the data.
That’s when a managed scraping API makes sense. A service like our Google Search API handles all three signals for you: clean residential routing, human-like pacing, and a fingerprint that matches a real browser. You send a query, you get structured JSON back.
The same logic covers the other tough targets. CAPTCHA-heavy sites like Amazon (see our breakdown of Amazon’s CAPTCHA systems ), data behind logins, and job boards all face the same anti-bot wall, and a managed endpoint handles it.
Build it yourself
Scraping is your core product, you want full control, and maintaining proxies and fingerprints is time you're happy to spend.
Use a scraping API
The data is a means to an end. You'd rather ship features than babysit the anti-bot arms race. Let a managed Google search results API handle the blocking.
If you want the deeper benchmark on which managed option is actually cheapest, I covered that in the best web scraping API comparison . And if you’re set on rolling your own, our Python web scraping guide is the honest starting point.
So why did your scraper die on Thursday?
Go back to the three signals. It wasn’t bad luck and the site didn’t single you out. One of the three tipped over.
Most likely your proxies got flagged, or the site added a fingerprint check your requests script couldn’t match. Speed was just the trigger, not the cause.
Fix the weakest of the three and your scraper comes back to life. Or hand all three to an API that does nothing but fight this battle, and go build the thing you actually wanted to build.
The one thing to remember:
Speed is the trigger, not the cause. Rate, proxies, and fingerprint are the three signals that decide whether you get through. Find the weakest one before you blame the speed.
What are you scraping when it breaks? I’m always curious which of the three gets people most.
P.S. If you only fix one thing today, fix your proxies. It’s the cause we see behind more blocks than the other two combined.
Oriol.
