Skip to main content

Documentation Index

Fetch the complete documentation index at: https://www.octoparse.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

IP-based blocking is the oldest anti-scraping defense and still one of the most common. A site can count requests from an IP address, compare that IP against reputation databases, block entire hosting ranges, or show CAPTCHA when traffic from one source looks too automated. Rotating proxies spread requests across multiple IP addresses. Used well, they reduce the pressure on any single IP and let a scraper operate closer to normal browsing patterns. Used badly, they create an even stronger bot signal: thousands of requests from mismatched, low-quality, or constantly changing identities.

What a proxy changes

A proxy sits between the scraper and the target website. The website sees the proxy’s IP address instead of the operator’s direct connection. That helps with several scraping problems:
  • Rate limits. Requests can be distributed instead of concentrated on one IP.
  • IP bans. A blocked IP can be removed from the pool.
  • Geo restrictions. A task can use an IP from the region where the content is available.
  • Operational privacy. The operator’s local IP is not exposed to every target site.
  • Cloud parallelism. Subtasks can run through separate network paths instead of competing through one address.
Proxies do not fix bad selectors, broken pagination, or obvious browser automation. They are one layer of the scaling stack, not a complete anti-bot strategy.

Types of proxies

Datacenter proxies

Datacenter proxies come from hosting providers and cloud infrastructure. They are fast, cheap, and easy to buy in large quantities. They work well for lightly protected sites, public directories, and targets that mainly care about request volume. They perform poorly on sites that block known hosting ranges or expect consumer traffic.

Residential proxies

Residential proxies route traffic through IPs associated with consumer internet providers. They look more like normal user traffic and are often more successful on e-commerce, search, travel, and other protected sites. The trade-off is cost and ethics. Use providers that can explain how their IP pool is sourced and whether users have consented. A low-quality residential pool can create legal, privacy, and reliability problems.

ISP proxies

ISP proxies sit between datacenter and residential. They are hosted like datacenter proxies but registered with internet service providers. They are usually more stable than rotating residential IPs and less suspicious than ordinary datacenter ranges. They are useful when a scraper needs sticky sessions, stable geography, and better reputation than a cloud hosting IP.

Mobile proxies

Mobile proxies route through carrier networks. They can be highly trusted because many real mobile users share carrier-grade NAT IPs. They are also expensive and often overkill. Use mobile proxies only when the target site is mobile-first or when other proxy types consistently fail.

Rotation strategies

Proxy rotation is not always “new IP on every request.” The right strategy depends on the site and the workflow.

Per-request rotation

Each request uses a different IP. This can work for stateless pages such as public search result pages or simple product listing pages. It is risky for sessions that rely on cookies, carts, login state, or region consistency. Switching IPs too often can look suspicious.

Sticky sessions

A worker keeps the same IP for a period of time or for the life of a session. This is usually better for JavaScript-heavy sites, logged-in areas, pagination flows, and any task where the website expects continuity. For example, a scraper might keep one IP while it searches, opens several result pages, and extracts details, then rotate before starting the next keyword or category.

Task-level rotation

Each subtask receives its own proxy identity. One category, city, keyword, or URL batch runs through one IP or one small pool. This aligns well with cloud execution because subtasks are already natural boundaries for parallelism.

Geo-targeted rotation

The proxy region is selected intentionally. A job collecting US prices should use US IPs; a job comparing availability across countries should split work by region. The browser timezone and language should match the chosen geography.

Matching proxies to use cases

Use caseProxy approach
Public static pagesNo proxy or datacenter proxy with conservative rate limits
Large product catalogRotating datacenter or residential proxies, depending on defenses
Search engines or marketplacesResidential or ISP proxies with sticky sessions and fingerprint coherence
Logged-in dashboardsStable IP per account; avoid aggressive rotation
Geo-specific prices or availabilityCountry or region-targeted proxies
Mobile-only contentMobile proxies, only when truly necessary
The key is to match the network identity to the browsing story. A rotating proxy pool works best when the browser fingerprint, language, timezone, request pace, and session behavior all agree with the IP being used.

Common mistakes

  • Rotating too often. A new IP on every click can look less human, not more.
  • Ignoring IP quality. Cheap blocked IPs can increase CAPTCHA frequency.
  • Mixing geography. A German IP with a US timezone and Japanese language headers is a mismatch.
  • Changing IP during login. Logged-in sessions should usually stay sticky.
  • Scaling too fast. More proxies do not make an overloaded target safer to scrape.
  • Treating proxies as legality. A proxy changes network routing; it does not change permission, terms of service, privacy obligations, or robots.txt considerations.

How visual platforms handle it

Integrated scraping platforms often make proxy rotation part of task settings rather than custom infrastructure. Octoparse, for example, documents IP rotation for cloud runs, built-in residential proxies for local and cloud workflows, and user-provided HTTP proxies for local runs. Its cloud extraction model can split a task into subtasks that run across multiple cloud nodes, so traffic is not concentrated behind one local IP. The broader principle is what matters: proxy configuration should live next to the task’s execution strategy. A scraper that splits work into subtasks, schedules runs, manages browser sessions, and rotates network paths in one place is easier to operate than a scraper where every layer is wired separately.

Practical rule

Use the weakest proxy strategy that works. Start with normal pacing and no proxy for low-risk targets. Add datacenter proxies when volume is the only issue. Move to residential or ISP proxies when reputation matters. Use sticky sessions whenever the website expects continuity. Reserve mobile proxies and aggressive rotation for cases that truly justify their cost and complexity.