Pagination is the navigation layer of a scraper. After the scraper can fetch, render, and extract one page, it still needs to answer a practical question: where is the next batch of records, and how do I know when there are no more? Most pagination failures come from treating every site like a numbered page list. In practice, a catalog might use URL parameters, a next button, infinite scroll, a load-more button, an API offset, or an opaque cursor token. Some sites combine several of these patterns.Documentation Index
Fetch the complete documentation index at: https://www.octoparse.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
Start with the request, not the UI
Before writing pagination logic, open DevTools and watch what changes when you move to the next batch.- Open the Network tab and filter to Fetch/XHR.
- Click the next page, scroll down, or press the load-more button.
- Inspect the request URL, query parameters, request body, and response.
- Decide whether the scraper should follow links, interact with the page, or call an API endpoint directly.
offset=40. A page link might actually hydrate results through JavaScript after the URL changes.
| What changes | What to try first |
|---|---|
URL includes page=2, p=2, or /page/2 | Loop through numbered URLs |
An <a> link points to the next page | Follow the href until it disappears or becomes disabled |
| Content appears after scrolling | Find the XHR request; use browser scrolling only if needed |
| Content appears after clicking a button | Reuse the API request or click the button in a browser session |
JSON includes next_cursor, endCursor, has_more, or offset | Paginate through the API response |
Numbered pages
Numbered pagination is the simplest case because the next location is visible in the URL:0, parameter names such as p or start, and sites that return the first page again when the page number is out of range. A repeated first page is worse than an empty page because it can create duplicate data without obvious errors.
Next links
Some sites do not expose page numbers. They only expose a “Next” link or arrow. If the element is a normal anchor, treat pagination as link following:seen_urls guard matters. Misconfigured sites sometimes point the final “Next” link back to the current page or to page one. Also check disabled states such as aria-disabled="true", disabled, or a disabled class before trusting the link.
Infinite scroll
Infinite scroll looks like a browser-only problem, but it usually has an API underneath it. Scroll once with DevTools open and look for a request that fetches the next group of records. The useful parameters are often namedoffset, page, after, cursor, or limit.
When the endpoint is usable, call it directly:
Load-more buttons
A load-more button is controlled infinite scroll. The page waits for a click before requesting the next batch. That makes pacing easier because the scraper can wait, validate the new item count, and retry if the request fails. If the button calls a clean API, use that API. If not, click the button in a browser loop:Offset and cursor APIs
Modern sites often paginate data at the API layer. Offset pagination asks for a numeric position:Retry-After, retry temporary failures with backoff, and store progress if the job is large enough that restarting from page one would be expensive.
Hybrid pagination
Real sites often combine patterns:- A category has numbered pages, but each page lazy-loads more products after scrolling.
- A search page starts with a load-more button, then switches to numbered links.
- A tabbed interface has separate pagination for “New”, “Popular”, and “Sale”.
- A listing page paginates result URLs, then each detail page has its own paginated reviews or comments.
Practical safeguards
- Define a stop signal. Empty result sets, missing next links, disabled buttons,
hasNextPage: false, repeated cursors, and max-iteration limits are all valid stop signals. - Detect duplicates. Infinite scroll and cursor APIs can repeat records when data changes mid-run. Store stable IDs or canonical URLs.
- Throttle navigation. Add small randomized waits between batches. Browser automation should wait for content changes, not only fixed timeouts.
- Log failures. If one page fails after retries, record the URL or cursor and continue when possible.
- Prefer APIs when they are legitimate and stable. Direct API pagination is usually faster and easier to validate than driving a browser.
- Use a visual tool when speed matters more than custom code. In Octoparse, pagination can be configured visually for common next-page, load-more, and infinite-scroll flows, then run locally or in the cloud.