Golang Web Scraper: Build One With Colly in Under 30 Minutes (2026)

Build a golang web scraper with Colly from scratch. Runnable Go code for requests, CSS selectors, pagination, concurrency, CSV export, and not getting blocked.

Most scraping tutorials reach for Python. I get it: BeautifulSoup is friendly and there are a thousand guides for it. But if you have ever watched a Python scraper crawl 50,000 pages one slow request at a time, you have probably wondered if there is something faster.

There is. A golang web scraper built with Colly will saturate your network long before it saturates a CPU core, and it ships as a single binary you can drop on any server.

I just built one to write this post. It crawls all 1,000 books across 50 pages of a sandbox site, extracts five fields each, and writes a clean CSV. End to end it runs in about 20 seconds and the core logic is under 60 lines of Go.

< 30 min

Build time, start to finish

~60 lines

Core scraper logic

1,000

Books scraped, 50 pages

1 binary

No runtime to install

We run web data extraction infrastructure for a living, so I have opinions about where DIY scraping pays off and where it quietly becomes a second job. This post is the honest version: build the thing, ship it, and know exactly when to stop maintaining it.

By the end you will have a working scraper that handles requests, CSS selectors, pagination, concurrency, CSV output, and the anti-blocking basics. All the code runs. You can clone it.

In short: a Go web scraper is a program that fetches web pages and extracts structured data from them, and the standard tool for it is Colly. The scraper in this post crawls 1,000 books across 50 pages and writes a clean CSV in about 20 seconds, in under 60 lines of core logic. When the target fights back with bot defenses, FlyByAPIs runs the same job as a single HTTP call. The complete, runnable project is on GitHub: flybyapis/blog-web-scraping-code → golang-web-scraper . Clone it, run go run ./colly-scraper, and watch it work before you read another word.

Why Go is a good fit for web scraping

Scraping is mostly waiting. Your program fires a request, then sits idle while bytes travel across the internet. The faster you can run those waits in parallel, the faster the whole job finishes.

This is exactly what Go was built for. Goroutines make concurrency cheap, so a Go scraper can keep hundreds of requests in flight without the threading headaches you would hit elsewhere.

Where Go wins

Native concurrency, low memory use, a single compiled binary, and fast HTML parsing. Great for large crawls and long-running services.

Where Go lags

Fewer scraping libraries than Python, and no first-class headless browser. JavaScript-heavy sites need extra tooling.

If you have done web scraping with Python before, the mental model carries over. The difference shows up at scale, when the same job that pinned a Python process barely registers in Go.

I am not here to start a language war. Python is excellent, and our Python web scraping guide walks through that side. Pick the language your team already runs. If that is Go, you are in good hands with Colly.

What you need before you start

Two things: a recent Go install and one library. That is the whole setup.

First, create a module for the project. This is just standard Go modules , nothing scraping-specific:

1
2
mkdir book-scraper && cd book-scraper
go mod init book-scraper

Then add Colly, the only dependency you need for HTML scraping:

1
go get github.com/gocolly/colly/v2

What is Colly?

Colly is a scraping framework for Go. You create a "collector," register callbacks for the HTML you care about, and tell it which URLs to visit. It handles requests, parsing, concurrency, caching, and proxies so you do not have to.

Our target is books.toscrape.com , a site the Scrapy team built specifically so people can practice without annoying anyone. It has a product grid, real prices, and pagination across 50 pages. Perfect for learning.

Building your golang web scraper with Colly

Here is the plan. We will build the scraper in five steps, each one adding a real capability. By the end you will have the full thing.

I will show the important pieces inline. The complete file lives in the golang-web-scraper repo so you are never copying half-finished snippets.

Step 1: Send your first request

Every Colly scraper starts the same way. You make a collector, attach a callback, and visit a URL.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
package main

import (
	"log"

	"github.com/gocolly/colly/v2"
)

func main() {
	c := colly.NewCollector(
		colly.AllowedDomains("books.toscrape.com"),
	)

	c.OnRequest(func(r *colly.Request) {
		log.Printf("visiting %s", r.URL)
	})

	c.OnResponse(func(r *colly.Response) {
		log.Printf("got %d bytes", len(r.Body))
	})

	c.Visit("https://books.toscrape.com/")
}

Run it with go run . and you will see one request go out and the response come back. AllowedDomains is a small safety net: it stops the crawler from wandering off-site if a link points somewhere unexpected.

That is the whole rhythm of Colly. Register callbacks, then visit. Everything else is variations on this.

Step 2: Select the data with CSS selectors

Now the part you actually came for: pulling fields out of the HTML. Colly uses CSS selectors, the same ones you would use in the browser console.

Open the target page, right-click a book, and inspect it. Each book sits inside article.product_pod, with the title in an anchor, the price in p.price_color, and the rating encoded as a CSS class.

1
2
3
4
5
6
7
8
c.OnHTML("article.product_pod", func(e *colly.HTMLElement) {
	title := e.ChildAttr("h3 a", "title")
	price := e.ChildText("p.price_color")
	availability := e.ChildText("p.instock.availability")
	link := e.Request.AbsoluteURL(e.ChildAttr("h3 a", "href"))

	log.Printf("%s | %s (%s)", title, price, availability)
})

Pro tip: grab the title attribute, not the text

The visible h3 text on this site is truncated with an ellipsis. The full title lives in the anchor's title attribute, which is why we use ChildAttr("h3 a", "title") instead of ChildText.

ChildText grabs the text inside a selector. ChildAttr grabs an attribute. AbsoluteURL turns a relative href into a full link you can actually follow. Those three cover almost everything.

The rating is a small puzzle. The HTML stores it as class="star-rating Three", so you read the class string and take the second word. A two-line helper handles that in the full code.

Step 3: Handle pagination and scrape every page

One page of books is not a dataset. The whole point of golang web scraping is to walk the entire catalog, which means following the “next” link until it runs out.

This is where Colly feels almost too easy. You register a callback for the pagination link and tell it to visit whatever it finds.

1
2
3
4
c.OnHTML("li.next a", func(e *colly.HTMLElement) {
	next := e.Request.AbsoluteURL(e.Attr("href"))
	c.Visit(next)
})

That is it. Colly sees the next link on page one, visits page two, finds the next link there, and keeps going until there is no li.next left. Fifty pages, zero manual URL building.

Why this works

Colly keeps an internal queue of URLs to visit and remembers which ones it has already seen. Each Visit call adds to the queue. You are describing the link graph, and Colly walks it for you.

Step 4: Scrape pages concurrently

Here is where Go earns its keep. Crawling 50 pages one at a time is slow because most of that time is spent waiting on the network. Run them in parallel and the job collapses to a fraction of the time.

Turn on async mode when you create the collector, then add a limit rule so you stay polite:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
c := colly.NewCollector(
	colly.AllowedDomains("books.toscrape.com"),
	colly.Async(true),
)

c.Limit(&colly.LimitRule{
	DomainGlob:  "*toscrape.com*",
	Parallelism: 3,
	RandomDelay: 2 * time.Second,
})

// ... register callbacks ...

c.Visit("https://books.toscrape.com/")
c.Wait() // block until every queued request finishes

Two things change. Async(true) makes Visit return immediately instead of blocking, and c.Wait() at the end holds the program open until the crawl drains.

!

Concurrency means shared state needs a mutex

With async on, your OnHTML callback runs from several goroutines at once. If they all append to the same slice, you will get a data race. Wrap the append in a sync.Mutex or you will lose books and corrupt the results.

The RandomDelay matters more than it looks. Firing requests at full speed with no gap is the single most obvious bot signal there is. A small random delay makes the traffic look human and keeps you off block lists.

Step 5: Save the results to CSV

Scraped data that lives in memory and vanishes when the program exits is not useful. Let us write it to a CSV with the standard library, no extra packages.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
f, _ := os.Create("books.csv")
defer f.Close()

w := csv.NewWriter(f)
defer w.Flush()

w.Write([]string{"title", "price", "rating", "availability", "url"})
for _, b := range books {
	w.Write([]string{b.Title, b.Price, b.Rating, b.Availability, b.URL})
}

Collect every book into a slice during the crawl, then write the slice after c.Wait() returns. Open the file in any spreadsheet and there is your dataset.

If a spreadsheet is your actual destination, I wrote a whole post on getting scraped data into Excel cleanly that covers the formatting traps.

50

Pages crawled

1,000

Rows in the CSV

~20s

Total run time

That is a complete, working scraper. Five steps, and you can pull a full catalog into a CSV. The full version on GitHub wires all of this together with flags, a mutex, and the hardening we are about to add.

Making your Go scraper production-ready

The sandbox site is friendly. Real sites are not. The moment you point a scraper at a site that does not want to be scraped, you hit rate limits, bot detection, and IP bans.

Here is the hardening that actually moves the needle, all of it built into Colly.

Production checklist

Identity

Rotate the User-Agent

extensions.RandomUserAgent(c)

✓ Avoids the default Go UA flag

Pace

Limit rate + random delay

colly.LimitRule{...}

✓ Looks human, avoids bans

Resilience

Retry with backoff

c.OnError + Retry()

✓ Survives flaky responses

Scale

Rotate proxies

proxy.RoundRobinProxySwitcher

✓ Spreads load across IPs

Rotate your User-Agent

The fastest way to get blocked is to send Go’s default User-Agent on every request. It screams “bot.” Colly ships an extension that rotates through real browser strings:

1
2
3
import "github.com/gocolly/colly/v2/extensions"

extensions.RandomUserAgent(c)

One line, and every request now looks like it came from a different browser.

Retry failed requests with backoff

Networks are flaky. A request that fails once often succeeds on the second try, so do not let a single timeout kill your crawl. Catch errors and retry with an increasing delay:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
retries := map[string]int{}

c.OnError(func(r *colly.Response, err error) {
	url := r.Request.URL.String()
	retries[url]++
	if retries[url] > 3 {
		log.Printf("giving up on %s: %v", url, err)
		return
	}
	time.Sleep(time.Duration(retries[url]) * time.Second)
	r.Request.Retry()
})

The backoff grows with each attempt, so you are not pounding a struggling server. After three tries it gives up and moves on instead of hanging forever.

Add rotating proxies when you scale

One IP making thousands of requests is a pattern any defense will catch. Spread the load across a pool of proxies and each one looks like a normal visitor:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
import "github.com/gocolly/colly/v2/proxy"

rp, err := proxy.RoundRobinProxySwitcher(
	"http://user:pass@proxy1:8000",
	"http://user:pass@proxy2:8000",
)
if err != nil {
	log.Fatal(err)
}
c.SetProxyFunc(rp)

The honest part about proxies

Good residential proxies cost real money, often hundreds of dollars a month at volume. This is the line where DIY scraping stops being free. Keep that number in mind for the next section.

Put all four together and you have a scraper that survives contact with a real website. The complete hardened version bundles every one of these with command-line flags.

Colly vs chromedp: when you need a headless browser

Colly has one hard limit. It reads the raw HTML the server sends, and nothing more. If a site builds its content with JavaScript after the page loads, Colly sees an empty shell.

You can test this in seconds. Scrape the page, print the body, and if the data you want is missing from the raw HTML, it is being rendered client-side.

Use Colly when

The data is in the page source. Server-rendered sites, classic HTML, most blogs, catalogs, and listings. Fast and cheap.

Reach for chromedp when

Content loads via JavaScript. Single-page apps, infinite scroll, data that appears only after interaction. Slower and heavier.

chromedp drives a real headless Chrome from Go, so it executes JavaScript exactly like a browser. The cost is speed and memory: you are running an actual browser per worker, which does not scale the way Colly does.

This is the same wall you hit in any language. Our Node.js scraping guide covers the Puppeteer equivalent, and the tradeoff is identical. Headless browsers are powerful and expensive, which is often the point where a hosted Go scraping API starts to look attractive.

When to stop scraping and use an API

I promised the honest version, so here it is. A scraper you build yourself is the right tool for plenty of jobs. It is the wrong tool for a few specific ones, and pretending otherwise wastes your time.

Building the scraper is the easy 20%. The other 80% is maintenance: proxies that get banned, layouts that change overnight, CAPTCHAs, and JavaScript walls. That work never ends.

1

The site has serious anti-bot defenses

Cloudflare, rotating tokens, fingerprinting. You will spend more time fighting the defense than using the data.

2

You are scraping Google, Amazon, or Maps

These targets fight back hard and change constantly. A maintained API is almost always cheaper than your time.

3

The data needs to be reliable

If a broken scraper means a broken product, you do not want a 2am page because a competitor redesigned their site.

This is the gap we built FlyByAPIs to close. Instead of maintaining proxies and parsers, you make one HTTP call and get clean JSON back. Same Go you already know.

Say you want Google search results. With a Go web scraping API for Google , the entire scraper is a single request:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
req, _ := http.NewRequest("GET", "https://google-serp-search-api.p.rapidapi.com/search", nil)

q := req.URL.Query()
q.Set("q", "web scraping golang")
q.Set("gl", "us")
q.Set("num", "10")
req.URL.RawQuery = q.Encode()

req.Header.Set("X-RapidAPI-Key", os.Getenv("RAPIDAPI_KEY"))
req.Header.Set("X-RapidAPI-Host", "google-serp-search-api.p.rapidapi.com")

resp, _ := http.DefaultClient.Do(req)
// resp.Body is clean JSON: organic_results, people_also_ask, and more

No proxies. No headless Chrome. No selector that breaks when Google ships a redesign. FlyByAPIs handles the IP rotation and parsing, so the response comes back as structured JSON with organic_results, each carrying title, link, description, and position. The runnable version is in the api-scraper folder of the repo.

See the Google Search API →

Free tier included · No credit card required

The same idea covers the targets that punish DIY scrapers the most. There is an Amazon product data API for prices and listings, a Google Maps data extraction API for local business records, and a Crunchbase company data API for firmographics.

Need something else? There is a jobs search API for listings and a translation API for turning scraped content into other languages at scale.

Bottom line:

Build your own scraper for niche sites, internal tools, and learning. Use a managed Google search API for the hard targets where uptime and clean data matter more than control.

The right answer is usually both. DIY where you can, API where the maintenance cost outweighs the freedom. A good engineer knows which is which.

The complete code

Everything in this post is in one repo, tested and runnable:

1
2
3
4
5
6
7
8
9
git clone https://github.com/flybyapis/blog-web-scraping-code.git
cd blog-web-scraping-code/golang-web-scraper

# DIY Colly scraper: crawls all 1,000 books to a CSV
go run ./colly-scraper

# The API approach: clean Google results, no maintenance
export RAPIDAPI_KEY=your_key_here
go run ./api-scraper -q "web scraping golang"

The colly-scraper folder has the full hardened version with flags for output file, concurrency, retries, and proxies. The api-scraper folder shows the managed alternative.

Wrapping up

We started with a question: is there something faster than the usual Python scraper? There is, and you just built it. A golang web scraper with Colly handles requests, selectors, pagination, concurrency, and CSV output in under 60 lines, then crawls 1,000 records in about 20 seconds.

You also learned where the line is. Colly is brilliant for server-rendered sites and large crawls. It is the wrong tool for Cloudflare-protected targets and JavaScript walls, and for those a maintained Google Search API from FlyByAPIs saves you the part of scraping nobody enjoys.

The takeaway:

Reach for Colly when the data is in the page source and you control the crawl. Reach for a managed API when the target is Google, Amazon, or Maps, the defenses are serious, or the data has to stay reliable. The right answer is usually both.

Clone the repo, run it, break it, make it yours. Then point it at something real and see how far DIY takes you before the maintenance starts to bite.

What are you building with it? If you scrape Google, Amazon, or Maps and the bans are wearing you down, try the managed APIs free . Beats debugging proxies at 2am.

Try the managed Google Search API →

Free tier, no credit card. Stop maintaining proxies.

Oriol.

FAQ

Frequently Asked Questions

Q Is Go a good language for web scraping?

Yes, especially when you care about speed and concurrency. Go compiles to a single binary, handles thousands of concurrent requests with goroutines, and uses far less memory than a Python or Node equivalent. The main tradeoff is a smaller ecosystem of scraping libraries, but Colly covers most of what you need.

Q What is the best Go library for web scraping?

Colly is the default choice for HTML pages. It gives you a clean callback API, built-in request limiting, caching, and proxy support. For pages that render content with JavaScript, pair Go with chromedp, which drives a headless Chrome instance.

Q Can Colly scrape JavaScript-rendered pages?

No. Colly only sees the raw HTML returned by the server, so client-side rendered content is invisible to it. For single-page apps and JS-heavy sites you need chromedp or a managed API that renders the page for you before returning the data.

Q How do I stop my Go scraper from getting blocked?

Rotate your User-Agent, add a random delay between requests, cap concurrency with a limit rule, and retry failed requests with backoff. At scale you also need rotating proxies. When the anti-bot measures get serious, a managed API that handles IPs and CAPTCHAs is usually cheaper than maintaining it yourself.

Q How do I scrape multiple pages in Go with Colly?

Register an OnHTML callback for the pagination link (the 'next' button), resolve it to an absolute URL with Request.AbsoluteURL, and call Visit on it. Colly queues each new page and keeps crawling until there are no more next links.

Q Can Go scrape concurrently?

Yes. Set colly.Async(true) when creating the collector and Colly runs requests in parallel goroutines. Control the parallelism with a LimitRule so you do not hammer the target, and guard any shared slice or map with a mutex because callbacks run from multiple goroutines.

Q Is web scraping with Go legal?

Scraping public data is generally allowed, but it depends on the site's terms of service, the data you collect, and your jurisdiction. Respect robots.txt, avoid personal data, and do not overload servers. When in doubt, use an official API.
Share this article
Oriol Marti
Oriol Marti
Founder & CEO

Computer engineer and entrepreneur based in Andorra. Founder and CEO of FlyByAPIs, building reliable web data APIs for developers worldwide.

Free tier available

Ready to stop maintaining scrapers?

Production-ready APIs for web data extraction. Whatever you're building, up and running in minutes.

Start for free on RapidAPI