Octoparse vs. Import.io comparison: which is best for web scraping?

Web scraping software, also known as data extraction tool, is the software to collect the data from the website. It’s usually not easy for us to pick up a web scraping tool as there’s so many web scraping tools available now (refer to Top 30 Free Web Scraping Software to learn more). That’s why I decided to put the web scraping tool Octoparse head to head with import.io to see how the two tools compare. Here is everything you need to know when deciding which web scraping tool better suits you.


Feature Comparison

Here is a general comparison between Octoparse and Import.io features:





Desktop app for Windows (available fo MAC with virtual machine)

Web based application, support Chrome,Firefox,Safari

Selecting elements

Point-and-click, XPath

Point-and-click, XPath


Clicking on pagination links or manually entering the XPath(websites without "Next page" links)

Entering a list of pages

Scraper logic

Variables, loops, conditionals

Selecting and extracting only

Drop downs, tabs,hovering, pop-ups



Infinitely scrolling pages



Entering into search boxes




Yes with local machine


Signing in to accounts






Transforming data

Regex, javascript expressions

Regular expression


Fast parallel execution

Fast parallel execution


Hosted on cloud of Octoparse servers if subscribed to Octoparse cloud or on local machine with basic version

Hosted on cloud of Import.io servers

IP Rotation

Included in paid plans or manual IP proxy(free version)


Scheduling runs

With a premium Octoparse account

With a premium import.io plan

Data export

CSV, Excel, Txt, Databases

CSV, JSON, API, Google Sheets

Smart Mode



Cloud service



Up-to-date data

Yes (Incremental extraction)


Image and files extraction

No, only able to extract the image or file URLs






Free professional support, tutorials, community support

Community support or professional support for paid users, customer success training


So what both web scrapers could do for you?

Both the interface is built according to point-and-click principle, which is easily for you to extract data without coding. Both of the scrapers could deal with Javascript and AJAX pages and are able to scrape behind a login. Like a bot, they could follow the links to go into the deeper web pages by clicking the items and extract the data on the other pages. Also, they are able to get data in CSV format and transform data by manually modifying Regular expression or XPath.

They all provide cloud services, which are able to offer API options, IP rotation and services to schedule extractors running in real time. With that, you are easy to get up-to-date data regularly without having to keep your computer on.


What Octoparse could for do you?

The biggest difference between Octoparse and its web scraping alternatives is that Octoparse can get data from interactive websites. It totally mimics human behaviour when browsing a website.

You can instruct Octoparse to scrape data from very complex and dynamic sites, because it can:

  • Sign in to accounts to scrape behind a login
  • Select choices from dropdown menus(single and multiple), tabs, pop-up windows
  • Enter keywords and search with a search bar
  • Go to a new page simply by clicking on the "next" button
  • Get data from infinitely scrolling pages
  • Able to input Captcha in local machine
  • Visual workflow to understand the logics of the scraper (Variables, loopsand conditionals) and could be changed easily with point-and-click interface
  • Smart mode to deal with the simple website just by entering the target URL
  • Extract inter and outer HTML and attributes and customize the values for further extraction
  • Advanced RegEx tool and XPath tool to modify the regular expression or XPath, which means you don’t need to know how regular expression and XPath are written(see the screenshots below)

And more! Except for the first one, these are all things that import.io cannot handle.



Octoparse RegEx Tool



Octoparse XPath tool


Here is a full list of Octoparse’s scraping features:

Automatic IP Rotation


Loops, variables and conditionals logics

Extract text, HTML and attributes

Scheduled Runs

Cloud servers to store data

Extract files and images URLs

Search through forms and inputs

Get data from drop-downs, tabs, pop-ups and hovers

Databases integration

Pagination and navigation

Scrape content from infinitely scrolling pages

RegEx and XPath Tool

Get data from tables and maps

Content that loads with AJAX and JavaScript


The downside of using Octoparse as an alternative to import.io is that you need to install the application on your own computer. And because the software is written in .Net, it only supports Windows system. A Mac visual machine is needed if you want to If you need to run Octoparse on Mac. You would also be annoyed if the Internet is unstable and the scraper stopped unexpectedly, you need to rerun the crawler from the scratch. The other one is that it may take longer to learn Octoparse for you are easily to make mistakes if you don’t understand the logics of the workflow. But luckily, there are plenty of tutorials and great support if you get stuck!

Besides, Octoparse is not able to extract the images and files directly; you need to extract their URLs and download them with other applications. And the function of API is quite limited.


What import.io could do for you?

First of all, import.io is a cloud-based platform, which means you don’t need to run the scraper on your machine and the data could be kept in the cloud. Therefore, you can access your data from any computer connected to Internet. Also, you don’t need to concern about the scraping process maintenance and scalability.


Unlike Octoparse advanced mode, import.io tries to guess what you want from the page, and would build an extractor for you just a few seconds. Other features include:

  • Connect one data source with another and thus producing new, valuable, real-time data sets
  • Integrate with Google Sheet and Tableau
  • Able to extract images and files
  • API integration


Here is a full list of Import’s scraping features:

Automatic IP Rotation

Cloud servers to store data

Content that loads with AJAX and JavaScript

Extract files and images

Scheduled Runs

XPath and Regular Expressions Selectors


Get data from tables and maps

API,Tableau and Goolge Sheet integration


The downside of using import.io is that it’s not as widely used as Octoparse to deal with websites. As is mentioned above, it couldn’t deal with websites with dropdown menus, pop-up windows and captcha. It’s also not able to scrape with infinite scrolling pages, which are quite common for most of web pages. There’s also no scraper logics like conditions for further extraction to specifically locate the web page or items.

And for pagination, it’s not easy as you need to enter a list of pages. As for transforming the data in regular expression and XPath, there are no built-in tools for you and you need to enter the expression yourself, which means you need to master XPath and regular expression if you want to explore more on import.io.


Cost Comparison

There’s no doubt that Octoparse has overwhelming advantages. It provides free version with powerful functions! To summarize, that is:










Monthly plan ($)







Yearly plan ($)







Let’s see the screenshots below for more details.

Octoparse Pricing



Import.io Pricing 


Octoparse's plans are limited by:

  • the number of crawlers
  • the number of crawlers you could concurrent run on your machine
  • the speed at which you can collect data (different cloud servers)

There are unlimited pages for each crawlers and unlimited computer licence for each version, including the free one.

(Note: When you enter URLs in URL list, it would suggest LESS THAN 20,000 URLs. All the versions are limited by such number as Octoparse need to ensure that the CPU run the crawler at one time. But you could copy the crawler to extract the rest URLs.)


Import.io’s plans are limited by: 

  • the number of queries per month or year
  • the expiry date of the queries
  • limited functions like image and file download, API, up-to-date reporting
  • support

It’s sad to find out that import.io doesn’t provide free version anymore.


Most people build one two crawlers per website on Octoparse. One is to extract the separate web pages URLs and the other one is to use URL lists to bulk extract the data with the extracted URLs. It’s highly recommended when using the cloud service (see Splitting Tasks to Speed Up Cloud Extraction to learn more).  

On the other hand, Import.io counts an extractor as one query and it doesn’t provide URL lists to bulk extract the web pages. Therefore, either you need to spin over these separate web pages in one extractor (which usually means missing data) in import.io, or you need to upgrade your version for more queries.

For both Octoparse and Import.io, you have to subscribe to a premium plan for scheduling feature —— the ability to collect data from a website continuously on a schedule (real-time, daily, weekly, monthly).

If you don't want to learn how to use a tool and just want your data on demand, both Octoparse and Import.io provide data service extracting data for you. Just contact the sales of both companies and they will scrape data from the website you want —— delivering them in CSV/Excel or API format.



It is not difficult to start a project either with Octoparse or import.io. And they all deal well with both static and dynamic websites. XPath and regular expression are needed if you want to explore more, though they claim that no programming knowledge is needed. Also, both have their limits.


I will also make some examples to further show you how these two scrapers work. And if there’s something wrong with the information above, just contact me here.


Views: 841


You need to be a member of Codetown to add comments!

Join Codetown

Comment by Camila Aug on October 2, 2019 at 2:06pm
Hi Paul, thanks for writing this up. I've been using Proxycrawl for the past year or so and after looking at your post I decided to give Octoparse a try. Do you know if in the free octopare version I can use Proxycrawl? As you say it supports manual proxies

Happy 10th year, JCertif!


Welcome to Codetown!

Codetown is a social network. It's got blogs, forums, groups, personal pages and more! You might think of Codetown as a funky camper van with lots of compartments for your stuff and a great multimedia system, too! Best of all, Codetown has room for all of your friends.

When you create a profile for yourself you get a personal page automatically. That's where you can be creative and do your own thing. People who want to get to know you will click on your name or picture and…

Created by Michael Levin Dec 18, 2008 at 6:56pm. Last updated by Michael Levin May 4, 2018.

Looking for Jobs or Staff?

Check out the Codetown Jobs group.

There's also a free Java Jobs mailing list. It's a Yahoo group so you have to create a Yahoo account to use it.


Enjoy the site? Support Codetown with your donation.

InfoQ Reading List

Article: Three Major Cybersecurity Pain Points to Address for Improved Threat Defense

Three pain points every company must address when addressing cybersecurity include threat volume and complexity, a growing cybersecurity skills gap, and the need for threat prioritization. This article describes each of these in some detail, and includes recommendations for corporations to deal with them.

By Jonathan Zhang

Microsoft Releases Azure Sentinel, the Cloud Native SIEM, to General Availability

In a recent blog post, Microsoft announced the general availability of Sentinel, a Security Information and Event Management (SIEM) service in Azure, providing customers with intelligent security analytics across their enterprise. With the GA of Azure Sentinel, Microsoft now enters the SIEM market.

By Steef-Jan Wiggers

Improving Blockchain Performance Off-Chain, Hyperledger Announces Avalon

In a recent blog post, the Hyperledger project announced a new project, called Hyperledger Avalon, that addresses some of the scalability and privacy challenges that are currently associated with many blockchain projects. The projects seek to address these scalability and privacy challenges through the use of trusted off-chain processing, while ensuring the transactions are secure and resilient.

By Kent Weare

Open-Source Build and Test Tool Bazel Reaches 1.0

Derived from Google internal build tool Blaze, Bazel is a build and test tool that offers a human-readable definition language and is particularly aimed to large, multi-language, multi-repositories projects. Originally open-sourced in 2015, Bazel has now reached 1.0.

By Sergio De Simone

Coming Next, JavaScript Private Class Fields & Methods

Private class fields are finally coming to JavaScript with no less than three separate TC39 proposals that define the new capabilities.

By Guy Nesher

© 2019   Created by Michael Levin.   Powered by

Badges  |  Report an Issue  |  Terms of Service