Skip to content

virtualmarioe/Web_scraping_tutorial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Binder Open In Colab License

Web scraping tutorial

Web scraping tutorial

This tutorial aims to give a short an practical intro into Web scraping. The tutorial is not meant to be an extensive or by any mean complete documente for Web scraping. If you are interested in a more detailed tutorial I invite you to take a look at other excellent tutorials available online.

Web scraping is the process of extracting data from websites or other online sources and copying the data into an structured form (e.g., a database) enabling further retrieval and analysis.

For this particular tutorial, we are going to extract demografic information (e.g., country, state and population) of Colombian towns from Wikipedia.

The tutorial is written in Python and will use two different methods, of the many available, for pulling the data, Beautiful Soup and Pandas.

The tutorial is divided into the following 4 sections:

  • Section 1: Method Beautiful Soup
  • Section 2: Method Pandas
  • Section 3: Structuring and cleaning the data
  • Section 4: Data saving

To run the tutorial you can download the notebook or the plain python tutorial (.py) in your local machine and run it locally. Alternatively you can run the tutorial's notebook online via Binder or Google Colab by directly clicking in the badges at the top of this page.

Questions?

If you have any question or comments don't hesitate to post them in the issues.

About

Short notebook presenting an introduction to web scraping

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published