Skip to main content

Documentation Index

Fetch the complete documentation index at: https://www.octoparse.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Octoparse turns web browsing actions into repeatable extraction workflows. You define what to collect, how the website should be navigated, and where the results should go. Most workflows follow four stages: build, test, run, and export.

Workflow overview

1

Build

Start from a URL, template, or custom task. Select the data fields you want and define actions such as clicking, scrolling, pagination, and opening detail pages.
2

Test

Run a small sample to confirm that Octoparse captures the right fields, records, and page sequence.
3

Run

Execute the task locally for testing or in the cloud for scheduled, unattended, and larger-scale extraction.
4

Export

Send the extracted results to files, spreadsheets, databases, cloud storage, or other connected systems.

Build the task

A task defines how Octoparse interacts with a website. You can build a task by:
  • Using a template
  • Letting Auto-detect identify page data automatically
  • Selecting elements manually in the no-code builder
  • Adding actions such as click, scroll, loop, pagination, and wait
  • Refining field values before export
The goal is to turn the website interaction into a reusable workflow.

Test the extraction logic

Before running a task at scale, test a small sample. Check whether:
CheckWhy it matters
Fields are correctPrevents exporting the wrong values
Field names are clearMakes downstream data easier to use
Pagination worksEnsures the task moves across result pages
Detail pages open correctlyConfirms nested page workflows are captured
Sample output looks cleanReduces cleanup after export
Testing is especially important for dynamic pages, login-protected pages, infinite scroll, popups, and websites where data is loaded after user actions.

Run locally or in the cloud

Octoparse supports different run options depending on the task and your plan.
Run typeBest for
Local extractionTesting, debugging, and tasks that rely on your local environment
Cloud extractionScheduled, unattended, and higher-volume extraction
Boost modeCloud tasks that need more speed or concurrency, when supported
The best option depends on the website, task complexity, required frequency, and whether the task needs to keep running when your computer is off.

Export the data

After a task runs, Octoparse stores the extracted results as structured records. Common export destinations include:
  • CSV
  • Excel
  • JSON
  • HTML
  • XML
  • Google Sheets
  • Databases
  • Cloud storage
For automated workflows, use scheduled export or integrations so data can move downstream without manual downloading.

Local vs cloud extraction

Compare run environments and choose the right execution mode.

Export formats

Learn which output formats Octoparse supports.