Skip to main content

Documentation Index

Fetch the complete documentation index at: https://www.octoparse.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Websites may detect or limit automated access when traffic looks unusual, too frequent, or different from normal browser behavior. Octoparse includes several settings and workflow practices that can help reduce failures, but no anti-blocking method can guarantee access to every site. Use this page to understand why tasks get blocked and which Octoparse features may help.

Common blocking signals

Websites may block or challenge extraction tasks based on signals such as:
  • Too many requests in a short time
  • Repeated access from the same IP address
  • Missing or unusual browser fingerprints
  • Login or cookie inconsistencies
  • CAPTCHA challenges
  • Region or location mismatch
  • Unusual navigation behavior
  • Sessions expiring during a run

Common symptoms

SymptomPossible cause
Task extracts fewer records than expectedPagination failed, content did not load, or access was limited
Page shows CAPTCHAWebsite detected suspicious activity
Login page appears during extractionSession expired or cookies were not preserved
Cloud run behaves differently from local runWebsite reacts differently to the cloud environment
Fields become emptyPage structure changed or content was blocked
Task stops unexpectedlyNetwork, blocking, selector, or page load issue

Anti-blocking options

Proxy settings

Use proxies when the website is sensitive to IP address, location, or request frequency.

Browser fingerprinting

Understand how browser signals may affect website detection.

CAPTCHA handling

Learn what to do when a website shows CAPTCHA during task building or execution.

Auto-login & cookies

Keep session-dependent tasks more stable with login and cookie workflows.
1

Run a local test

Watch the task in the built-in browser and identify where blocking or failure appears.
2

Check whether login or cookies are required

If the site requires an account, confirm that the session is valid before running the task.
3

Slow down the workflow

Add waits, reduce frequency, and avoid unnecessary repeated actions.
4

Review proxy or location needs

If content depends on region or IP reputation, configure proxy settings where appropriate.
5

Compare local and cloud behavior

If the task works locally but fails in the cloud, review cloud run logs and site behavior.

Best practices

  • Test with a small sample before scaling.
  • Avoid running tasks more frequently than necessary.
  • Add wait steps for dynamic or slow-loading pages.
  • Keep login sessions and cookies up to date.
  • Monitor run logs after changing task settings.
  • Respect website terms, robots rules, and applicable laws.
Anti-blocking features improve reliability, but they do not bypass every restriction. If a website explicitly prevents automated access or the data is not permitted to collect, do not scrape it.