Is it legal to scrape Amazon product data?

Scraping publicly visible Amazon data is not illegal under the CFAA — the 2022 hiQ v. LinkedIn ruling confirmed that — but Amazon's Terms of Service prohibit automated access and they actively block scrapers. For production use, a dedicated Amazon data API like FlyByAPIs handles compliance, anti-bot evasion, and structured data delivery across 22 marketplaces.

Why does Amazon block my Selenium scraper?

Amazon uses multiple anti-bot layers — IP rate limiting, CAPTCHA challenges, browser fingerprinting, and honeypot detection — that flag default Selenium browser signatures immediately. Rotating user agents, adding random delays, and using residential proxies help, but Amazon's detection evolves constantly. Most DIY scrapers start failing within days without ongoing maintenance.

How do I scrape Amazon without getting blocked?

Use residential rotating proxies, randomize your user agent on every request, add random delays between 3-8 seconds, and disable Selenium's automation flags (navigator.webdriver). Even with all this, expect blocks at scale. For reliable production data, a dedicated Amazon scraping API handles anti-bot evasion server-side and returns structured JSON.

What Python library is best for scraping Amazon?

Selenium with undetected-chromedriver is the most reliable option for Amazon because it renders JavaScript and handles dynamic content. BeautifulSoup and Requests alone won't work — Amazon requires full browser rendering. For production workloads, a REST API that returns structured JSON eliminates the need for any scraping library.

How do I extract Amazon product prices with Python?

Using Selenium, navigate to the product page by ASIN, wait for the price element to load, then extract it with find_element. The price typically lives in a span with class 'a-price-whole' and 'a-price-fraction'. You'll also need to handle deal prices, coupon discounts, and currency formatting across marketplaces.

Can I scrape Amazon search results with Selenium?

Yes — navigate to amazon.com/s?k=your+query, wait for the results grid to render, then extract each product card using the data-asin attribute, which is the most stable selector in Amazon's DOM. Be aware that Amazon randomizes other CSS class names periodically, so price and title selectors may break without warning.

How much does it cost to scrape Amazon at scale?

DIY scraping costs add up fast: residential proxies run $5-15 per GB, you'll need 50-100+ IPs for rotation, and developer time for ongoing maintenance. A typical 10,000 products/day operation costs $200-500/month in infrastructure alone. FlyByAPIs Amazon API starts free with 100 requests/month and covers 10,000 requests for $14.99/month — with no proxy or maintenance costs.

What data can I extract from Amazon product pages?

From a product page you can extract: title, price, rating, review count, availability, seller info, BSR ranking, product images, bullet-point features, and technical specifications. Amazon search results additionally provide sponsored flags, badges (Best Seller, Amazon's Choice), and delivery estimates. A dedicated Amazon data API returns all these fields as structured JSON across 22 marketplaces without writing any scraping code.

Web Scraping Amazon with Python: Selenium Guide (2026)

Twelve lines of Python. That’s how long my first Amazon scraper was. Fetch the page, parse the price, print it. It worked for about six hours.

Then Amazon served me a CAPTCHA. Then a blank page. Then a page that looked normal but had completely different CSS class names. Web scraping Amazon is the process of programmatically extracting product data — prices, ratings, reviews, availability — from Amazon’s website, typically using browser automation tools like Selenium or Playwright to bypass JavaScript rendering requirements.

TL;DR: A basic Selenium scraper survives roughly 50-100 Amazon requests before hitting CAPTCHAs and costs $400-800/month at scale. The FlyByAPIs Amazon API returns the same structured data for $14.99/month across 22 marketplaces, with no proxy or maintenance overhead.

I’ve spent the last three years building and maintaining Amazon data extraction API — first for internal use, then as a public API. I know exactly where the scraping-it-yourself path leads, because I walked it. And I’m going to show you the whole road: how to build a working Amazon scraper with Python and Selenium, what will break, and when it makes sense to stop fighting Amazon’s anti-bot systems and let someone else handle that part.

By the end of this guide, you’ll have a functional scraper that extracts product data and search results. You’ll also understand why that scraper will need constant babysitting — and what the alternative looks like.

Selenium

Browser automation

Python

Language of choice

50-100

Requests before CAPTCHA

$400+

Monthly proxy cost at scale

Prerequisites and setup

All the code from this tutorial is available in our GitHub repo — clone it and you’ll have both the Selenium scraper and the API version ready to run.

You’ll need Python 3.8+ and a few packages. Here’s the setup:

1
pip install selenium webdriver-manager

webdriver-manager automatically downloads the right ChromeDriver binary for your Chrome version. No more matching driver versions manually.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from webdriver_manager.chrome import ChromeDriverManager
import time
import random

def create_driver():
    options = Options()
    options.add_argument("--headless=new")
    options.add_argument("--no-sandbox")
    options.add_argument("--disable-blink-features=AutomationControlled")
    options.add_argument("--window-size=1920,1080")
    options.add_argument(
        "user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/125.0.0.0 Safari/537.36"
    )

    service = Service(ChromeDriverManager().install())
    driver = webdriver.Chrome(service=service, options=options)

    # Remove the webdriver flag that Amazon checks
    driver.execute_cdp_cmd(
        "Page.addScriptToEvaluateOnNewDocument",
        {"source": "Object.defineProperty(navigator, 'webdriver', {get: () => undefined})"},
    )
    return driver

Two things to note here. The --disable-blink-features=AutomationControlled flag and the CDP command that redefines navigator.webdriver are both anti-detection measures. Without them, Amazon will flag your browser as automated on the very first request.

Heads up on headless mode

Amazon's detection systems are better at spotting headless browsers than headed ones. During development, remove --headless=new so you can see what Amazon actually serves you. Switch to headless only for production runs.

Scraping an Amazon product page

Every Amazon product has an ASIN — a 10-character alphanumeric ID. The product URL follows the pattern https://www.amazon.com/dp/{ASIN}. That’s your entry point.

Here’s a scraper that extracts the key fields from a product page:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
def scrape_product(asin, marketplace="com"):
    driver = create_driver()
    url = f"https://www.amazon.{marketplace}/dp/{asin}"

    try:
        driver.get(url)

        # Wait for the product title to render
        WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.ID, "productTitle"))
        )
        # Random delay to mimic human behavior
        time.sleep(random.uniform(2, 5))

        product = {}

        # Title
        product["title"] = driver.find_element(By.ID, "productTitle").text.strip()

        # Price — Amazon nests price in multiple spans
        try:
            whole = driver.find_element(By.CSS_SELECTOR, "span.a-price-whole").text.replace(".", "").replace(",", "")
            fraction = driver.find_element(By.CSS_SELECTOR, "span.a-price-fraction").text
            symbol = driver.find_element(By.CSS_SELECTOR, "span.a-price-symbol").text
            product["price"] = f"{symbol}{whole}.{fraction}"
        except Exception:
            product["price"] = None

        # Rating
        try:
            rating_text = driver.find_element(
                By.CSS_SELECTOR, "span[data-hook='rating-out-of-text']"
            ).text
            product["rating"] = float(rating_text.split(" ")[0])
        except Exception:
            product["rating"] = None

        # Review count
        try:
            reviews = driver.find_element(By.ID, "acrCustomerReviewText").text
            product["reviews_count"] = reviews.split(" ")[0].replace(",", "").replace("(", "").replace(")", "")
        except Exception:
            product["reviews_count"] = None

        # Availability
        try:
            product["availability"] = driver.find_element(
                By.ID, "availability"
            ).text.strip()
        except Exception:
            product["availability"] = None

        # ASIN
        product["asin"] = asin

        return product

    finally:
        driver.quit()

# Example usage
result = scrape_product("B09G9FPHY6")
print(result)

This works. For one product. On a good day.

The problem isn’t the code — it’s the selectors. Amazon uses IDs like productTitle and acrCustomerReviewText today, but those aren’t contractual. They change.

I’ve seen productTitle stay stable for months and then shift to a different structure overnight during an A/B test that only runs in certain regions.

Why Selenium instead of plain requests?

A simple requests.get() + BeautifulSoup approach can technically grab some Amazon data, but Amazon aggressively blocks non-browser traffic — you'll hit CAPTCHAs and empty responses almost immediately. Selenium runs a real Chrome instance with a realistic fingerprint, which buys you more time before detection kicks in.

Scraping Amazon search results

Search results are where things get more interesting — and more brittle. Here’s how to scrape a search query:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
def scrape_search(query, marketplace="com", page=1):
    driver = create_driver()
    url = f"https://www.amazon.{marketplace}/s?k={query}&page={page}"

    try:
        driver.get(url)

        # Wait for search results to load
        WebDriverWait(driver, 10).until(
            EC.presence_of_element_located(
                (By.CSS_SELECTOR, "[data-component-type='s-search-result']")
            )
        )
        time.sleep(random.uniform(3, 6))

        results = []
        cards = driver.find_elements(
            By.CSS_SELECTOR, "[data-component-type='s-search-result']"
        )

        for card in cards:
            product = {}

            # ASIN is in the data attribute
            product["asin"] = card.get_attribute("data-asin")

            # Title
            try:
                title_el = card.find_element(
                    By.CSS_SELECTOR, "h2 span"
                )
                product["title"] = title_el.text.strip()
            except Exception:
                product["title"] = None

            # Price
            try:
                whole = card.find_element(
                    By.CSS_SELECTOR, "span.a-price-whole"
                ).text.replace(".", "").replace(",", "")
                fraction = card.find_element(
                    By.CSS_SELECTOR, "span.a-price-fraction"
                ).text
                product["price"] = f"${whole}.{fraction}"
            except Exception:
                product["price"] = None

            # Rating
            try:
                rating_el = card.find_element(
                    By.CSS_SELECTOR, "span.a-icon-alt"
                )
                product["rating"] = float(
                    rating_el.get_attribute("textContent").split(" ")[0]
                )
            except Exception:
                product["rating"] = None

            # Sponsored flag
            try:
                card.find_element(
                    By.XPATH, ".//span[contains(text(), 'Sponsored')]"
                )
                product["sponsored"] = True
            except Exception:
                product["sponsored"] = False

            if product["asin"]:
                results.append(product)

        return results

    finally:
        driver.quit()

# Example: search for wireless headphones
products = scrape_search("wireless+headphones")
for p in products:
    print(f"{p['asin']} | {p['title'][:60]} | {p['price']}")

The data-asin attribute on each search result card is the most reliable selector in Amazon’s DOM. It’s been stable for years. Everything else — the price container, the rating stars, the title structure — changes depending on the marketplace, the device, and sometimes just the day of the week.

If you’re scraping for a project like a Python-based Amazon price tracker , this is enough to get you started. But you’ll hit walls fast at any meaningful volume.

Why Amazon will block you (and how fast)

Let me be direct: Amazon has one of the most aggressive anti-scraping systems on the web. They don’t just block — they adapt. Here’s what you’re fighting:

Amazon's anti-bot layers

IP rate limiting

More than 20-30 requests from the same IP within a few minutes triggers throttling. Keep going and you get a full block — sometimes for hours, sometimes permanently for that IP.

CAPTCHA challenges

Amazon's CAPTCHAs aren't the simple checkbox type. They serve image recognition puzzles that automated solvers struggle with. And once you hit one, you'll get them more frequently on that IP.

Browser fingerprinting

Amazon checks canvas rendering, WebGL info, installed fonts, screen resolution, and dozens of other browser properties. Default Selenium fingerprints are flagged instantly.

Honeypot links and dynamic DOM

Hidden links that real users never click. CSS class names that rotate. HTML structure that varies by marketplace, device type, and A/B test cohort. Your selectors work until they don't.

In my experience, a basic Selenium script with a single IP address survives about 50-100 requests before the first CAPTCHA. After 200-300, you’re blocked. With the anti-detection flags we added earlier, you might stretch that to a few hundred more — but the ceiling is the same. This is exactly why we built the Amazon Product Data API to handle anti-bot evasion server-side.

Rotating proxies and user agents

If you’re serious about web scraping Amazon beyond toy-project scale, you need proxy rotation. The idea is simple: every request (or every few requests) comes from a different IP address, so Amazon can’t build a behavior profile.

User agent rotation

First, the easy part. Rotate your user agent string on every request:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import random

USER_AGENTS = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:126.0) Gecko/20100101 Firefox/126.0",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.5 Safari/605.1.15",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36",
]

def create_driver_with_rotation(proxy=None):
    options = Options()
    options.add_argument("--headless=new")
    options.add_argument("--no-sandbox")
    options.add_argument("--disable-blink-features=AutomationControlled")
    options.add_argument("--window-size=1920,1080")

    # Random user agent each time
    ua = random.choice(USER_AGENTS)
    options.add_argument(f"user-agent={ua}")

    # Proxy if provided
    if proxy:
        options.add_argument(f"--proxy-server={proxy}")

    service = Service(ChromeDriverManager().install())
    driver = webdriver.Chrome(service=service, options=options)

    driver.execute_cdp_cmd(
        "Page.addScriptToEvaluateOnNewDocument",
        {"source": "Object.defineProperty(navigator, 'webdriver', {get: () => undefined})"},
    )
    return driver

Proxy rotation

This is where it gets expensive. You have two options:

Datacenter proxies — cheap ($1-5/month per IP) but Amazon blocks them aggressively. Most datacenter IP ranges are already flagged. You’ll burn through them fast.

Residential proxies — real ISP-assigned IPs that look like normal home connections. Much harder for Amazon to block, but they cost $5-15 per GB of traffic. The same proxy challenge applies to scraping other platforms — it’s why dedicated APIs exist for Google Search results API , Google Maps data API , and more.

Pro tip

A single Amazon product page weighs 2-3 MB with all assets loaded. At residential proxy rates, that's roughly $0.01-0.04 per page load — costs that compound fast at scale.

Here’s a basic rotation setup:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
PROXIES = [
    "http://user:pass@proxy1.example.com:8080",
    "http://user:pass@proxy2.example.com:8080",
    "http://user:pass@proxy3.example.com:8080",
]

def scrape_with_rotation(asin_list):
    results = []
    for i, asin in enumerate(asin_list):
        proxy = PROXIES[i % len(PROXIES)]
        driver = create_driver_with_rotation(proxy=proxy)
        try:
            product = scrape_product_with_driver(driver, asin)
            results.append(product)
            # Random delay between requests
            time.sleep(random.uniform(3, 8))
        finally:
            driver.quit()
    return results

Adding random delays

The delay between requests matters more than you’d think. A consistent 2-second delay is just as suspicious as no delay — real humans are random. Use a uniform distribution between 3 and 8 seconds for browsing, and occasionally throw in a longer pause of 15-30 seconds to simulate someone reading a page.

1
2
3
4
5
def human_delay():
    if random.random() < 0.1:  # 10% chance of a long pause
        time.sleep(random.uniform(15, 30))
    else:
        time.sleep(random.uniform(3, 8))

The real cost of DIY scraping

Let's do the math. 10,000 product pages/day at $0.02/page in proxy costs = $200/month just for proxies. Add the ChromeDriver instances, server costs, and the developer hours for maintenance — you're looking at $400-800/month all-in for a scraper that will still break regularly.

Skip the proxy headaches — try the API free →

Structured JSON · 25 endpoints · No proxy headaches

The production problem with DIY scraping

Everything I’ve shown you works for prototyping. It works for scraping 50 products once to seed a spreadsheet. It does not work for production.

Here’s what happens when you try to scale this to a real workload — say, tracking prices on 5,000 ASINs daily across 3 marketplaces.

Pro tip

Before investing in production scraping infrastructure, run a small pilot for 4 weeks. The timeline below is what every DIY scraper project goes through.

Week 1: Everything works. You’re happy. You schedule a cron job and move on.

Week 2-3: Amazon starts serving CAPTCHAs to 15% of your requests. Your success rate drops to 85%. You add more proxies and retry logic. Cost doubles.

Week 4: Amazon rotates some CSS class names on their product pages. Your price parser breaks silently — it returns None instead of crashing, so you don’t notice for two days. Your database now has 48 hours of missing price data.

"Your selectors work until they don't — and Amazon doesn't send you a changelog."

Month 2: A new anti-bot update catches your browser fingerprint. Success rate drops to 40%. You spend a weekend researching undetected-chromedriver and playwright-stealth. You get it back to 80%.

Month 3: You realize you’ve spent more time maintaining the scraper than building the product it feeds.

Key takeaway: The maintenance cost of a DIY Amazon scraper always exceeds the subscription cost of a dedicated API — usually within 4-6 weeks of production use.

This is the cycle. I’ve lived it. Web scraping Amazon is not a matter of skill — Amazon has an entire team dedicated to blocking scrapers, and they iterate faster than any solo developer can keep up. We’ve seen the same pattern with every data source we’ve built APIs for — our Crunchbase API , job listings API , and Translate API all exist because scraping these platforms yourself is the same story.

DIY Selenium scraper

✗ Proxy costs: $200-500/month
✗ Server/compute: $50-150/month
✗ Dev maintenance: 5-10 hrs/month
✗ Success rate: 60-85%
✗ Breaks on DOM changes
✗ One marketplace per config

Amazon data API

✓ Fixed cost: $14.99/month for 10K req
✓ No infrastructure to manage
✓ Zero maintenance
✓ Structured JSON response
✓ Anti-bot handled server-side
✓ 22 marketplaces, one endpoint

The shortcut — using an Amazon data API

After building and maintaining my own scraping infrastructure for three years, I turned it into a public API. Not because I wanted to be in the API business, but because I was tired of solving the same problems over and over — and I figured other developers were too.

Here’s the same product lookup from earlier, but using the Amazon Product Data API instead of Selenium:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
import requests

url = "https://real-time-amazon-data-the-most-complete.p.rapidapi.com/product-details"

params = {
    "asin": "B09G9FPHY6",
    "marketplace": "com"
}

headers = {
    "X-RapidAPI-Key": "YOUR_API_KEY",
    "X-RapidAPI-Host": "real-time-amazon-data-the-most-complete.p.rapidapi.com"
}

response = requests.get(url, headers=headers, params=params)
product = response.json()["data"]

print(f"Title: {product['title']}")
buybox = product.get("buybox_seller", {})
print(f"Price: {buybox.get('price', 'N/A')}")
print(f"Rating: {product['rating']} ({product['reviews_count']} reviews)")
print(f"Availability: {buybox.get('stock_status', 'N/A')}")

No Selenium. No ChromeDriver. No proxies. No waiting for the DOM to render. Just a requests.get() that returns structured JSON in under 2 seconds.

And search results? Same simplicity:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
url = "https://real-time-amazon-data-the-most-complete.p.rapidapi.com/search"

params = {
    "query": "wireless bluetooth headphones",
    "marketplace": "com",
    "page": "1"
}

response = requests.get(url, headers=headers, params=params)
data = response.json()["data"]

for product in data["products"]:
    print(f"{product['asin']} | {product['title'][:60]}")
    print(f"  Price: {product['price']} | Rating: {product['rating']}")
    print(f"  Bought last month: {product.get('bought_last_month', 'N/A')} | Sponsored: {product['sponsored']}")

Each response includes fields you’d struggle to extract reliably with Selenium: bought_last_month, badges (Best Seller, Amazon’s Choice), delivery_info with dates, organic_position vs sponsored. All structured, all consistent — no selector maintenance, no proxy rotation.

The API handles everything that makes web scraping Amazon painful: proxy rotation, CAPTCHA solving, session management, and — this one matters — country-pinned requests.

When you request data from amazon.de, the request is routed through a German IP. From amazon.co.jp, a Japanese IP. This is how the Amazon scraping API gets accurate localized pricing, not the “international visitor” version that Amazon serves to foreign IPs.

What it costs

The free tier gives you 100 requests/month to test your integration. After that:

Pro: $14.99/month for 10,000 requests
Ultra: $49.99/month for 50,000 requests
Mega: $99.99/month for 250,000 requests

Compare that to the $400-800/month all-in cost of maintaining a DIY scraper at the same volume, and the math is straightforward. If you’re building something like a Telegram deals bot or a price monitoring dashboard, the Amazon data API path saves you weeks of setup and months of maintenance.

Try the Amazon API free on RapidAPI →

100 free requests/month · No credit card required

When to scrape vs when to use an API

I don’t think every Amazon data project needs an API. Here’s my honest take:

Scrape it yourself when

→You need data from fewer than 50 products, once
→You're learning web scraping and want to understand how it works
→You need something very specific no API covers (like A+ content HTML)
→Budget is literally $0 and you have time to maintain it

Use an API when

→You're running any recurring data pipeline (daily, hourly, weekly)
→You need data from multiple Amazon marketplaces
→Uptime and data consistency matter to your product
→Your time costs more than $14.99/month
→You're building something that other people depend on

The Selenium approach I showed you is genuinely useful knowledge. Understanding how Amazon pages work, how anti-bot systems operate, how proxy rotation functions — that makes you a better developer regardless of which path you choose for production.

But if you’re building a real product that needs reliable Amazon data, fighting Amazon’s anti-bot team is not where your energy should go. That’s their full-time job. It shouldn’t be yours.

I built a scraper before I built an API. The scraper taught me everything. The API is what I actually use.

If you want to experiment with Selenium, the code above will get you started. When you hit the wall — and you will — the Amazon API is there. 100 free requests, no credit card, takes about 3 minutes to set up.

All the code from this post — the Selenium scraper, the API version, and the requirements file — is ready to clone and run from our GitHub repo .

Oriol.

Web Scraping Amazon with Python: Selenium Guide (2026)

Prerequisites and setup

Scraping an Amazon product page

Scraping Amazon search results

Why Amazon will block you (and how fast)

Rotating proxies and user agents

User agent rotation

Proxy rotation

Adding random delays

The production problem with DIY scraping

The shortcut — using an Amazon data API

What it costs

When to scrape vs when to use an API

Frequently Asked Questions

Q Is it legal to scrape Amazon product data?

Q Why does Amazon block my Selenium scraper?

Q How do I scrape Amazon without getting blocked?

Q What Python library is best for scraping Amazon?

Q How do I extract Amazon product prices with Python?

Q Can I scrape Amazon search results with Selenium?

Q How much does it cost to scrape Amazon at scale?

Q What data can I extract from Amazon product pages?

Ready to stop maintaining scrapers?