Inspired by 100 Numpy exerises, here are 100* short puzzles for testing your knowledge of pandas' power.
Since pandas is a large library with many different specialist features and functions, these excercises focus mainly on the fundamentals of manipulating data (indexing, grouping, aggregating, cleaning), making use of the core DataFrame and Series objects. Many of the excerises here are stright-forward in that the solutions require no more than a few lines of code (in pandas or NumPy - don't go using pure Python or Cython!). Choosing the right methods and following best practices is the underlying goal.
The exercises are loosely divided in sections. Each section has a difficulty rating; these ratings are subjective, of course, but should be a seen as a rough guide as to how elaborate a solution is required.
Section Name | Description | Difficulty |
---|---|---|
Importing pandas | Getting started and checking your pandas setup | Easy |
DataFrame basics | A few of the fundamental routines for selecting, sorting, adding and aggregating data in DataFrames | Easy |
DataFrames: beyond the basics | Slightly trickier: you may need to combine two or more methods to get the right answer | Medium |
DataFrames: harder problems | These might require a bit of thinking outside the box... | Hard |
Series and DatetimeIndex | Exercises for creating and manipulating Series with datetime data | Easy/Medium |
Cleaning Data | Making a DataFrame easier to work with | Easy/Medium |
Using MultiIndexes | Go beyond flat DataFrames with additional index levels | Medium |
Minesweeper | Generate the numbers for safe squares in a Minesweeper grid | Hard |
If you feel like rading up on pandas before starting, the official documentation useful and very extensive. Good places get a broader overview of pandas are:
Good luck solving the puzzles!
* the list of puzzles is not complete! Pull requests or suggestions for additional exercises, corrections and improvements are welcomed.