Skip to main content

Documentation Index

Fetch the complete documentation index at: https://www.octoparse.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Some websites require login before data can be viewed. Octoparse can work with login-required pages when the task is configured to preserve the necessary session state. Use login and cookie workflows when the target data is only visible after authentication and you have permission to access it.

When login setup is needed

Login or cookies may be required when:
  • The website hides data behind an account
  • Search results or detail pages require authentication
  • The website shows different content after login
  • A session expires during extraction
  • Cloud runs fail because the login state is not available
  • Cookies control region, language, or user-specific content

Typical workflow

1

Open the website

Start from the login page or a page that requires authentication.
2

Use Browse Mode

Interact with the page like a normal browser to complete login or reach the desired page state.
3

Save the session setup

Configure the task so the required login or cookie state is available during extraction.
4

Test the task

Run a sample to confirm the task can access the protected content.
5

Monitor expiration

Recheck the task if the website logs users out or invalidates cookies.

Cookies and sessions

Cookies store information that helps a website recognize a browser session. They may include login state, preferences, region, language, or tracking information. For scraping tasks, cookies matter because the website may show different content depending on whether the session is valid.

Common issues

IssuePossible cause
Task returns login pageSession expired or login was not preserved
Works locally but fails in cloudCloud run does not have the same session state
Data differs between runsCookies, region, or account state changed
CAPTCHA appears after loginWebsite detected unusual session behavior
Task stops after some pagesSession expired mid-run

Best practices

  • Use an account you are authorized to use.
  • Test login-required tasks locally before cloud runs.
  • Avoid rotating IPs during the same logged-in session.
  • Re-authenticate when cookies expire.
  • Monitor cloud logs for login redirects.
  • Keep login steps as simple and stable as possible.
Only extract data you are permitted to access. Login access does not automatically mean the data may be collected or reused.