December 2016 Blog Posts (18)

Scraping Online Dictionary - Merriam-Webster.com

Octoparse enables you to scrape the online dictionary into an organized list by entering a list of words. It’s very easy to use and could get the definition and examples of the word you want by using a Loop mode for entering a text list.

 

In this tutorial, I will show you how to scrape definition of some words from merriam-webster.com.

The website URL we will use is …

Continue

Added by Paul Black on December 29, 2016 at 9:43pm — No Comments

Web Scraping|Scrape Booking Reviews

 

(picture from www.luxurybackpacker.com)

 

Collecting online customer reviews, including star ratings, comments, likes, dislikes, images, videos, share channels and etc, can help an online retailer to better understand if the product sold is a good purchase and popular among customers, thus to adjust marketing strategies. There are many web scraping tools available online to live up to your expectations to scrape…

Continue

Added by Paul Black on December 29, 2016 at 9:00pm — No Comments

10 Essential Tutorials That Every Octoparse Newbie Should Know

Octoparse offers the most convenient way to scrape data from websites. Although few programming knowledge is required, some still claim that they have no ideas about how to use Octoparse. Thus this post aims to help our lovely new users to settle into Octoparse smoothly.

 

Below you will find links to 10 of the most helpful tutorials that will support you to make a first step in Octoparse. These guides will not only help you in scraping different kinds of website structures,…

Continue

Added by Paul Black on December 29, 2016 at 8:47pm — No Comments

Reasons and Solutions - Missing Data in Cloud Extraction

We all want to get a neat Excel spreadsheet with the data scraped, before going further analysis.

With Octoparse, you can fetch the data you want from websites and have the data ready for your use. Our cloud services enable you to fetch large amounts of data by running your scraping task with Cloud Extraction. The premise is, you know how to deal with all the circumstances when you are using Cloud Extraction to scrape the sites.

We summarize several problems encountered by our…

Continue

Added by Paul Black on December 27, 2016 at 4:14am — No Comments

Reasons and Solutions - Cloud Extraction Is Slower Than Local Extraction

Imagine that one day you open one web scraping software and the screen display all the data you want, neatly.

Octoparse Cloud servers had got all the data you want from any websites for you. You're full of joy.

We love to see you smile.

We are dedicated to providing the best web scraping software and service for you. 

So we create some tutorials to solve all the problems you may have when using Cloud…

Continue

Added by Paul Black on December 27, 2016 at 4:10am — No Comments

Reasons and Solutions - Getting Data from Local Extraction but None from Cloud Extraction

We all want to get a neat Excel spreadsheet with the data scraped, before going further analysis.

With Octoparse, you can fetch the data you want from websites and have the data ready for your use. Our cloud services enable you to fetch large amounts of data by running your scraping task with Cloud Extraction. The premise is, you know how to deal with all the circumstances when you are using Cloud Extraction to scrape the sites.

We summarize several…

Continue

Added by Paul Black on December 27, 2016 at 4:04am — No Comments

5 Steps to Collect Big Data

 

(picture from databigandsmall.com)

We know most companies today collect big data to analyze and interpretate of daily transaction and traffic data for keeping track of the operations, forecasting needs or implementing new programs. It is in this way that we define big data as the capability allowing companies to extract value from large volumes of different kinds of data. But how to collect such capability of big data we want directly?

There may be…

Continue

Added by Paul Black on December 22, 2016 at 4:36am — No Comments

Web Scraping|Scrape Data from Online Accommodation Booking Sites

For personnel who are actively looking for flight or hotels with low prices for traveling to other places, or for businesses who want to track prices of flights or any types of travel accommodations for maintain their competitive edge, Octoparse works great to effortlessly collect data based on different filters without manual searches.

An real-life example from one of our users who was trying to scrape data from …

Continue

Added by Paul Black on December 20, 2016 at 3:16am — No Comments

Web Scraping - Scrape Biographical Data from Websites

In this article we will tell you how to scrape biographical data from websites of law firms and export the data collected into an Excel spreadsheet.

A real-life example of this kind of issues from one of our users who is looking for someone to review the websites of several hundred law firms, pull the biographical data and put it into an excel spreadsheet.

Prior to awarding the project for the full several hundred firms, he provided a test sample in his email and want to scrape…

Continue

Added by Paul Black on December 18, 2016 at 11:17pm — No Comments

6 Tips to Use the Web Scraping Tool Octoparse

These days we received some feedback from our users and some of them have troubles moving forward with Octoparse for issues happened occasionally. Therefore, my post here is to share my experience with you about using Octoparse, in hope that they’ll help guide you move forward and deal with more difficult and complex websites.

 

  1. Manually Check the Rule in the Workflow Designer

Since Octoparse doesn’t signal an error for you to trace the problem when…

Continue

Added by Paul Black on December 18, 2016 at 11:00pm — No Comments

The Best Answers to Your Most Crucial Deep Learning Questions

(picture from www.re-work.co)

Most people keep close eyes on the top of the fast-moving technology trends. There’s no doubt that deep learning is most trending buzzwords today. Deep learning has made a significant breakthrough and is applied in many areas like facial recognition, recognizing images and AlphaGo Games. Thus more and more people focus…

Continue

Added by Paul Black on December 15, 2016 at 2:00am — No Comments

Scrape LinkedIn Public Data

 

(picture from www.business.linkedin.com)

Business and employment data gives a company the opportunity to gain more profit by better evaluating employees and identifying the best one. Promising employees will make their profile pages attractive and competitive by adding the skills employers need most in candidates. Similarly, companies will optimize their LinkedIn company pages because of its high popularity…

Continue

Added by Paul Black on December 13, 2016 at 1:41am — No Comments

Be the Best Junior Management Consultant: Skills You Need to Succeed

More and more people are interested in becoming a consultant in the subject/field he or she is fully knowledgeable about. Consulting is known for its attractive salary and intellectual challenge. For junior management consultant, it’s unavoidable to work overtime to adjust to the fast-paced environment and get used to the consulting…

Continue

Added by Paul Black on December 13, 2016 at 1:38am — No Comments

How to avoid collecting the first item of the web page in Octoparse?

Q: Why does Octoparse only collect the first item from each page?

 

Description: 

 

I have been testing your software to try and data mine some info.

The website is https://www.yelp.com/search?find_desc=car+audio&find_loc=Brooklyn%2C+NY

The problem is it will only collect the first item from each…

Continue

Added by Paul Black on December 12, 2016 at 3:41am — No Comments

Set AJAX timeout in Octoparse

Q: Where will I set Ajax timeout in Octoparse?

 

Description:

I don't see any indication of where to alter the AJAX timeout settings.

My task is setting up through the Wizard mode to extract data from multiple urls. I don't see any place in that process to alter timeout settings. When I try setting up a task in Advanced mode, I also don't see any place to alter timeout settings as I'm setting up the…

Continue

Added by Paul Black on December 12, 2016 at 3:35am — No Comments

How to integrate Octoparse and any other database via API?

Q: Is it possible to integrate Octoparse and any other database via Octoparse API? How?

 

A:

 

Currently Octoparse only enables you to export data to MySQL, SqlServer and Oracle for free.

If you need to get the most out of Octoparse to other database types, you can pull the data scraped…

Continue

Added by Paul Black on December 12, 2016 at 3:21am — No Comments

Make your own crawler

Big Data is all around us. As everyone is aware, there are unbelievable amount of data available on the Internet - most of all data is unstructured data, useful and obscure. It’s impossible to find only one little piece of information among the data ocean without search engines. So how do the search engines gather and organize data for us? The answer is web crawlers, programs that crawl through the Internet in an automated way and get the information that…

Continue

Added by Paul Black on December 5, 2016 at 5:13am — No Comments

How to Get Data from the Web

Most enterprises of any size are generating large amounts of web data, all the time. But how to deal with these data - data collection and data processing, it’s always a problem. The significance of Big data technologies does not lie in its ability to grasp with large-scale data collection, but in the intelligence to process data and thus extract valuable information from such a large-volume data for further analysis. And the premise of big data…

Continue

Added by Paul Black on December 5, 2016 at 5:00am — No Comments

Monthly Archives

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

Happy 10th year, JCertif!

Notes

Welcome to Codetown!

Codetown is a social network. It's got blogs, forums, groups, personal pages and more! You might think of Codetown as a funky camper van with lots of compartments for your stuff and a great multimedia system, too! Best of all, Codetown has room for all of your friends.

When you create a profile for yourself you get a personal page automatically. That's where you can be creative and do your own thing. People who want to get to know you will click on your name or picture and…
Continue

Created by Michael Levin Dec 18, 2008 at 6:56pm. Last updated by Michael Levin May 4, 2018.

Looking for Jobs or Staff?

Check out the Codetown Jobs group.

There's also a free Java Jobs mailing list. It's a Yahoo group so you have to create a Yahoo account to use it.

 

Enjoy the site? Support Codetown with your donation.



InfoQ Reading List

WebExpo 2019: Make Healthcare Affordable and Accessible Using Tech and AI

Anna Zawilska, Lead User Researcher at Babylon Health, recently presented, at WebExpo 2019 in Prague, the lessons learnt from their experience delivering remote healthcare through a combination of technology and Artificial Intelligence (AI). Babylon Health came to adjust three key assumptions underpinning their product development.

By Bruno Couriol

Article: Q&A on the Book The Driver in the Driverless Car

The book The Driver in the Driverless Car by Vivek Wadhwa and Alex Salkever explores how technology is changing faster and faster, and what impact that can have on the future of our society. It aims to help frame decisions and thinking about rapidly developing technologies. Salkever and Wadhwa cover a wide variety of technologies, including robotics, AI, quantum computing, and driverless cars.

By Ben Linders, Vivek Wadhwa, Alex Salkever

Presentation: Business Agility – Increasing Your Organization’s Competitiveness

Dean Latchana addresses how organizations can handle market pressure and opportunity, covering closing the gap between vision and execution, determining strategic fit with the vision, and others.

By Dean Latchana

Presentation: Introduction to Stateful Property-based Testing

Tomasz Kowal presents a high-level overview that is both encouraging for beginners but also maps the road to mastering Property-based Testing.

By Tomasz Kowal

Introducing Maesh: A Service Mesh for Kubernetes

On September 4th, 2019, Containous, a cloud infrastructure software provider, released Maesh, an open-source service mesh written in Golang and built on top of the reverse proxy and load balancer Traefik. Maesh promises to provide a lightweight service mesh solution that is easy to get started with and to roll out across a microservice application.

By K Jonas

© 2019   Created by Michael Levin.   Powered by

Badges  |  Report an Issue  |  Terms of Service