Web scraping tutorial

This tutorial aims to give a short an practical intro into Web scraping. The tutorial is not meant to be an extensive or by any mean complete documente for Web scraping. If you are interested in a more detailed tutorial I invite you to take a look at other excellent tutorials available online.

Web scraping is the process of extracting data from websites or other online sources and copying the data into an structured form (e.g., a database) enabling further retrieval and analysis.

For this particular tutorial, we are going to extract demografic information (e.g., country, state and population) of Colombian towns from Wikipedia.

The tutorial is written in Python and will use two different methods, of the many available, for pulling the data, Beautiful Soup and Pandas.

The tutorial is divided into the following 4 sections:

Section 1: Method Beautiful Soup
Section 2: Method Pandas
Section 3: Structuring and cleaning the data
Section 4: Data saving

To run the tutorial you can download the notebook or the plain python tutorial (.py) in your local machine and run it locally. Alternatively you can run the tutorial's notebook online via Binder or Google Colab by directly clicking in the badges at the top of this page.

Questions?

If you have any question or comments don't hesitate to post them in the issues.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
LICENSE		LICENSE
README.md		README.md
web_scraping_tutorial.ipynb		web_scraping_tutorial.ipynb
web_scraping_tutorial_python.py		web_scraping_tutorial_python.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web scraping tutorial

Questions?

About

Releases

Packages

Languages

License

virtualmarioe/Web_scraping_tutorial

Folders and files

Latest commit

History

Repository files navigation

Web scraping tutorial

Questions?

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages