Scraping Product Detail Pages from

Octoparse enables you to scrape data from To speed up the extraction, you can use our Cloud Extraction to split the scraping task into many sub-tasks. Then our cloud servers will collect the data shortly and provide you with a structured data-set.   

To scrape product details from as fast as possible, you can make two scraping tasks -- Task 1 and Task 2. Task 1 is used to scrape the URLs of product details and Task 2 is used to scrape all the product details from  


In this tutorial we will scrape all the product details page from with Octoparse.

The data fields include auction item name, item condition, ended time, price for the item, the number of items sold, HKD(including shipping), shipping price, shipping details, item location, seller id, seller's representative, product details, item image URL and product detail page URL.


You can directly download the two tasks (The OTD. file) to begin collect the data. Or you can follow the steps below to make the scraping tasks to scrape data from

(Download my extraction tasks of this tutorial HERE just in case you need it: Task 1Task 2.)


Task 1. Scraping the URLs needed for Task 2. 


Step 1. Set up basic information.

Click "Quick Start" ➜ Choose "New Task (Advanced Mode)" ➜Complete basic information.


Step 2Enter the target URL in the built-in browser. ➜ Click "Go" icon to open the webpage.

(URL of the example:;)


Step 3Click on the "Next" pagination link. ➜ Choose "Loop click in the element" to turn the page.



1. If you want to extract information from every page of search result, you need to add a page navigation action.

2. You can right click the "Next"pagination link to prevent trigger the link.

3. You can click "Expand the selection area"button until "Loop click in the element" appears. )


Step 4Move your cursor over the section with similar layout, where you would extract the URLs.

Click the first highlighted link ➜ Create a list of sections with similar layout. Click "Create a list of items" (sections with similar layout). ➜ "Add current item to the list".

Then the first highlighted link has been added to the list. ➜ Click "Continue to edit the list".

Click the second highlighted link ➜ Click "Add current item to the list" again. Now we get all the links with similar layout. ➜Then click "Finish Creating List" ➜ Click "loop" to process the list for extracting the elements in each page.


Step 5. Extract the URLs.

Extract the link of the first item. ➜ Click the item name➜ Select "Extract link(href attribute of A tag) of this item". Then click "Save".


Step 6. Drag the second “Loop Item” box before the “Click to paginate” action of the “Cycle Pages” box in the Workflow Designer so that we can grab all the elements of sections from multiple pages.


Step 7Check the workflow.

Now we need to check the workflow by clicking actions from the beginning of the workflow.

Go to the webpage ➜ Cycle Pages box ➜ Loop Item box ➜ Extract Data➜ Click to Paginate.


Step 8. Click "Save" to save the configuration. Then click "Next" ➜ Click "Next" ➜ Click "Local Extraction" to run the task on your computer. Octoparse will automatically extract all the URLs.


Step 9. The data extracted will be shown in "Data Extracted" pane. Click "Export" button to export the results to Excel file, databases or other formats and save the file to your computer. Copy the list of URLs for Task 2.



Task 2. Scraping product details from


Step 1. Set up basic information.

Click "Quick Start" ➜ Choose "New Task (Advanced Mode)" ➜Complete basic information ➜ Click "Next". 


Step 2Create a loop for a list of URLs.

Drag a "Loop Item" into the Workflow Designer and then choose "URL list" in the "Loop mode".

Paste a list of URLs into the "URL list" box and Click "Save".

You can see the ‘Go To Web Page’ action will be generated automatically and directly go to the first URL. You can click the Loop Item box to see all the list of URLs.


Step 3Extract the product details.

You can select the URL that would has the full information you needed since sometimes the first URL will not include all the content you want to extract. In this case, you can pick up one of the URLs that contains all the content you needed in the loop. Here we choose the URL "".


Click the auction item ➜ Select "Extract text" ➜ Click the "Field Name" to modify. Other contents can be extracted in the same way. Then click "Save".


After you extract all the data fields you want, you can check if Octoparse correctly extract the values from the product detail page. For example, you can re-format the first data field “Auction_Item” to extract exact information. You can add the current page URL so that you can check which detail page may have missing values by observing the output. Then click "Save". 



Step 4. Click "Save" to save your configuration. Then click "Next" ➜ Click "Next" ➜ Click "Local Extraction" to run the task on your computer. Octoparse will automatically extract all the data selected.


Step 5. The data extracted will be shown in "Data Extracted" pane. Click "Export" button to export the results to Excel file, databases or other formats and save the file to your computer.



Point 1.

You will find out the there are some missing values for some data fields in the output. In this case, you need to figure out why Octoparse could not extract the value for the data fields. Click this article to find out the reasons for the missing values when using Local Extraction.

Some original XPath for some data fields could not select the elements correctly and result in missing values for these data fields. In this case, you can modify the XPath expressions for these data fields. Here I replace all the SPAN tags with * tags for all the data fields. Click "Save" to save the configuration. You can follow this tutorial to modify XPath expressions in Octoparse.

Knowing some knowledge about how to edit XPath expressions could help you solve lots of problems when scraping data from websites. The tutorials or FAQs below could help you pick up XPath quickly.

How to use Firebug and Firepath?

Getting started with XPath 1

Getting Started With XPath 2

Modify XPath Manually in Octoparse


Point 2. is unable to show you more than 10,000 results. So the extraction of your task to scrape may stop by this reason and you may need to refine your search to narrow your results.


Author: The Octoparse Team

- See more at: Octoparse Tutorial

Views: 121


You need to be a member of Codetown to add comments!

Join Codetown

Happy 10th year, JCertif!


Welcome to Codetown!

Codetown is a social network. It's got blogs, forums, groups, personal pages and more! You might think of Codetown as a funky camper van with lots of compartments for your stuff and a great multimedia system, too! Best of all, Codetown has room for all of your friends.

When you create a profile for yourself you get a personal page automatically. That's where you can be creative and do your own thing. People who want to get to know you will click on your name or picture and…

Created by Michael Levin Dec 18, 2008 at 6:56pm. Last updated by Michael Levin May 4, 2018.

Looking for Jobs or Staff?

Check out the Codetown Jobs group.

There's also a free Java Jobs mailing list. It's a Yahoo group so you have to create a Yahoo account to use it.


Enjoy the site? Support Codetown with your donation.

InfoQ Reading List

Design Sprints at LEGO: Q&A with Eik Thyrsted Brandsgård

Design sprints have led to a high level of energy and motivation at LEGO. You need to discuss the ideas and learnings coming out of each sprint to decide if there’s a solution or if you need to go deeper in the next sprint. Design sprints have created a sense of pride; a belief that teams can tackle any challenge, and the feeling that individuals can add value that exceeds their expected roles.

By Ben Linders

CloudBees Releases Official Jenkins X Distribution

CloudBees has released its official Jenkins X distribution, a CI/CD tool for cloud-native Kubernetes applications based on the GitOps approach. CloudBees will take the essential features from the open-source code with a monthly release cadence. This initial release supports GKE, pipelines, vault integration, and preview environments; additional features like DevPods will come in a future version.

By Christian Melendez

Progress Announces NativeScript 6 Release

The NativeScript 6 release adds support for Vue.js and Angular 8 for creating native mobile applications with web technologies. NativeScript now also supports Progressive Web Apps (PWA), increasing the level of code reuse between web and mobile apps created with NativeScript.

By Dylan Schiemann

Presentation: Strategic Domain-Driven Design

Nick Tune discusses some of the bounded context design heuristics, recurring patterns in the wild, and explains how to facilitate modeling sessions in an organization.

By Nick Tune

Presentation: Rendering Large Models in the Browser in Real Time

Shwetha Nagaraja & Federico Rocha explore in detail some of the most interesting heavily optimized techniques and strategies that Autodesk Forge Viewer introduces for viewing extremely large 2D and 3D models in the browser in real time. These include progressive rendering to reduce time to first pixel, geometry consolidation to reduce draw overhead, etc.

By Shwetha Nagaraja, Federico Rocha

© 2019   Created by Michael Levin.   Powered by

Badges  |  Report an Issue  |  Terms of Service