E-commerce data collection turns product pages and marketplaces into structured datasets. Retailers use it to monitor competitors. Brands use it to watch reseller activity and reviews. Market researchers use it to understand category trends. Product teams use it to identify gaps in assortment, content, and customer sentiment. The sources are familiar: Amazon, Walmart, eBay, Shopify stores, brand sites, marketplace seller pages, review pages, and category pages. The engineering challenge is that each source represents the same commercial facts with different page layouts and different anti-bot posture.Documentation Index
Fetch the complete documentation index at: https://www.octoparse.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
What to collect
E-commerce scraping usually starts with a product catalog.| Data type | Example fields |
|---|---|
| Product identity | Title, brand, ASIN/SKU/GTIN/UPC, model, product URL |
| Pricing | Current price, list price, discount, coupon, subscription price |
| Availability | In stock, out of stock, delivery estimate, seller availability |
| Seller data | Seller name, marketplace seller ID, fulfilled-by signal |
| Product content | Images, description, feature bullets, specifications |
| Reviews | Rating, review count, review text, review date, helpful votes |
| Ranking | Best-seller rank, search position, category rank |
| Variants | Size, color, pack count, style, region |
Common workflows
Catalog monitoring
Scrape category pages or search results to discover products, sellers, and rankings. Store product URLs and IDs as refresh targets.Product detail enrichment
Visit detail pages for discovered products. Collect descriptions, specs, images, variants, seller information, and availability.Review analysis
Collect reviews separately from product facts. Review pages often paginate independently and may require sorting by newest to support monitoring.Price and stock tracking
Refresh selected products on a schedule. Store timestamped snapshots so the team can detect price changes, promotions, stockouts, and seller changes.Platform differences
| Platform type | Notes |
|---|---|
| Large marketplaces | Rich data, heavy anti-bot defenses, many variants and sellers |
| Brand stores | Cleaner product structure, often Shopify or similar commerce platforms |
| Long-tail retailers | Less standardization, but lighter defenses |
| Review-heavy marketplaces | Strong sentiment value, separate review pagination |
| B2B catalogs | Often require login, quote requests, or region-specific pricing |
Data normalization
E-commerce data needs cleanup before analysis.- Normalize currency and region.
- Convert pack counts into unit price.
- Separate product price from shipping.
- Standardize availability states.
- Map variants to parent products.
- Deduplicate identical products across URLs.
- Preserve source timestamps.