Skip to main content

Documentation Index

Fetch the complete documentation index at: https://www.octoparse.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

This page explains the core concepts used across Octoparse. Understanding these terms makes it easier to build tasks, troubleshoot extraction issues, and connect results to downstream tools.

Task

A task is a reusable extraction workflow. It contains the target website, the steps Octoparse should perform, the fields to extract, and the run/export settings. A task may include:
  • Opening one or more URLs
  • Clicking buttons or links
  • Looping through lists
  • Handling pagination
  • Opening detail pages
  • Extracting fields
  • Cleaning field values
  • Running locally or in the cloud

Template

A template is a prebuilt task for a common website or use case. Templates help you start faster because the extraction workflow and fields are already configured. Use templates when:
  • A matching website template is available
  • You need a faster setup path
  • You want a standard structure for common data types
  • You do not need heavy customization
If the template does not match your target page or required fields, build or customize a task manually.

Workflow actions

Workflow actions define how Octoparse moves through a website. Common actions include:
ActionWhat it does
Open pageLoads a target URL
ClickClicks a button, link, menu item, or page element
LoopRepeats an action across multiple items
PaginationMoves through multiple result pages
ScrollLoads more content on pages with infinite scroll or lazy loading
WaitGives dynamic content time to load
Extract dataCaptures values from selected elements

Field

A field is a column in your extracted data. Examples include product name, price, rating, URL, date, company name, address, or review text. Fields should be named clearly so exported data is easy to understand. Good field names are specific:
Less clearBetter
Text 1Product name
Field 2Price
LinkProduct URL
DateReview date

Run

A run is one execution of a task. The same task can be run many times to collect updated results. Runs can happen locally or in the cloud depending on your settings, plan, and workflow needs.

Export

Export sends extracted data out of Octoparse so it can be used elsewhere. Exports can be manual or automated, and may go to files, spreadsheets, databases, cloud storage, or connected apps.

How these concepts fit together

1

Create or choose a task

Start with a template, Auto-detect, or a custom workflow.
2

Define actions and fields

Tell Octoparse how to navigate the page and what values to extract.
3

Run the task

Execute the workflow locally or in the cloud.
4

Export the results

Send structured data to the destination you need.