IP-based blocking is the oldest anti-scraping defense and still one of the most common. A site can count requests from an IP address, compare that IP against reputation databases, block entire hosting ranges, or show CAPTCHA when traffic from one source looks too automated. Rotating proxies spread requests across multiple IP addresses. Used well, they reduce the pressure on any single IP and let a scraper operate closer to normal browsing patterns. Used badly, they create an even stronger bot signal: thousands of requests from mismatched, low-quality, or constantly changing identities.Documentation Index
Fetch the complete documentation index at: https://www.octoparse.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
What a proxy changes
A proxy sits between the scraper and the target website. The website sees the proxy’s IP address instead of the operator’s direct connection. That helps with several scraping problems:- Rate limits. Requests can be distributed instead of concentrated on one IP.
- IP bans. A blocked IP can be removed from the pool.
- Geo restrictions. A task can use an IP from the region where the content is available.
- Operational privacy. The operator’s local IP is not exposed to every target site.
- Cloud parallelism. Subtasks can run through separate network paths instead of competing through one address.
Types of proxies
Datacenter proxies
Datacenter proxies come from hosting providers and cloud infrastructure. They are fast, cheap, and easy to buy in large quantities. They work well for lightly protected sites, public directories, and targets that mainly care about request volume. They perform poorly on sites that block known hosting ranges or expect consumer traffic.Residential proxies
Residential proxies route traffic through IPs associated with consumer internet providers. They look more like normal user traffic and are often more successful on e-commerce, search, travel, and other protected sites. The trade-off is cost and ethics. Use providers that can explain how their IP pool is sourced and whether users have consented. A low-quality residential pool can create legal, privacy, and reliability problems.ISP proxies
ISP proxies sit between datacenter and residential. They are hosted like datacenter proxies but registered with internet service providers. They are usually more stable than rotating residential IPs and less suspicious than ordinary datacenter ranges. They are useful when a scraper needs sticky sessions, stable geography, and better reputation than a cloud hosting IP.Mobile proxies
Mobile proxies route through carrier networks. They can be highly trusted because many real mobile users share carrier-grade NAT IPs. They are also expensive and often overkill. Use mobile proxies only when the target site is mobile-first or when other proxy types consistently fail.Rotation strategies
Proxy rotation is not always “new IP on every request.” The right strategy depends on the site and the workflow.Per-request rotation
Each request uses a different IP. This can work for stateless pages such as public search result pages or simple product listing pages. It is risky for sessions that rely on cookies, carts, login state, or region consistency. Switching IPs too often can look suspicious.Sticky sessions
A worker keeps the same IP for a period of time or for the life of a session. This is usually better for JavaScript-heavy sites, logged-in areas, pagination flows, and any task where the website expects continuity. For example, a scraper might keep one IP while it searches, opens several result pages, and extracts details, then rotate before starting the next keyword or category.Task-level rotation
Each subtask receives its own proxy identity. One category, city, keyword, or URL batch runs through one IP or one small pool. This aligns well with cloud execution because subtasks are already natural boundaries for parallelism.Geo-targeted rotation
The proxy region is selected intentionally. A job collecting US prices should use US IPs; a job comparing availability across countries should split work by region. The browser timezone and language should match the chosen geography.Matching proxies to use cases
| Use case | Proxy approach |
|---|---|
| Public static pages | No proxy or datacenter proxy with conservative rate limits |
| Large product catalog | Rotating datacenter or residential proxies, depending on defenses |
| Search engines or marketplaces | Residential or ISP proxies with sticky sessions and fingerprint coherence |
| Logged-in dashboards | Stable IP per account; avoid aggressive rotation |
| Geo-specific prices or availability | Country or region-targeted proxies |
| Mobile-only content | Mobile proxies, only when truly necessary |
Common mistakes
- Rotating too often. A new IP on every click can look less human, not more.
- Ignoring IP quality. Cheap blocked IPs can increase CAPTCHA frequency.
- Mixing geography. A German IP with a US timezone and Japanese language headers is a mismatch.
- Changing IP during login. Logged-in sessions should usually stay sticky.
- Scaling too fast. More proxies do not make an overloaded target safer to scrape.
- Treating proxies as legality. A proxy changes network routing; it does not change permission, terms of service, privacy obligations, or robots.txt considerations.