Octoparse turns web browsing actions into repeatable extraction workflows. You define what to collect, how the website should be navigated, and where the results should go. Most workflows follow four stages: build, test, run, and export.Documentation Index
Fetch the complete documentation index at: https://www.octoparse.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
Workflow overview
Build
Start from a URL, template, or custom task. Select the data fields you want and define actions such as clicking, scrolling, pagination, and opening detail pages.
Test
Run a small sample to confirm that Octoparse captures the right fields, records, and page sequence.
Run
Execute the task locally for testing or in the cloud for scheduled, unattended, and larger-scale extraction.
Build the task
A task defines how Octoparse interacts with a website. You can build a task by:- Using a template
- Letting Auto-detect identify page data automatically
- Selecting elements manually in the no-code builder
- Adding actions such as click, scroll, loop, pagination, and wait
- Refining field values before export
Test the extraction logic
Before running a task at scale, test a small sample. Check whether:| Check | Why it matters |
|---|---|
| Fields are correct | Prevents exporting the wrong values |
| Field names are clear | Makes downstream data easier to use |
| Pagination works | Ensures the task moves across result pages |
| Detail pages open correctly | Confirms nested page workflows are captured |
| Sample output looks clean | Reduces cleanup after export |
Testing is especially important for dynamic pages, login-protected pages, infinite scroll, popups, and websites where data is loaded after user actions.
Run locally or in the cloud
Octoparse supports different run options depending on the task and your plan.| Run type | Best for |
|---|---|
| Local extraction | Testing, debugging, and tasks that rely on your local environment |
| Cloud extraction | Scheduled, unattended, and higher-volume extraction |
| Boost mode | Cloud tasks that need more speed or concurrency, when supported |
Export the data
After a task runs, Octoparse stores the extracted results as structured records. Common export destinations include:- CSV
- Excel
- JSON
- HTML
- XML
- Google Sheets
- Databases
- Cloud storage
Related pages
Local vs cloud extraction
Compare run environments and choose the right execution mode.
Export formats
Learn which output formats Octoparse supports.