Skip to content

carbonfact/icanexplain

Repository files navigation

icanexplain

tests code_quality documentation pypi license

Explain why metrics change by unpacking them

This library is here to help with the difficult task of explaining why a metric changes. It's particularly useful for analysts, data scientists, analytics engineers, and business intelligence professionals who need to understand the drivers of a metric's change.

This README provides a small introduction. For more information, please refer to the documentation.

Check out this blog post for some in-depth explanation.

Quickstart

Let's say you're an analyst at an Airbnb-like company. You're tasked with analyzing year-over-year revenue growth. You have obtained the following dataset:

>>> import pandas as pd
>>> fmt_currency = lambda x: '' if pd.isna(x) else '${:,.0f}'.format(x)

>>> revenue = pd.DataFrame.from_dict([
...     {'year': 2019, 'bookings': 1_000, 'revenue_per_booking': 200},
...     {'year': 2020, 'bookings': 1_000, 'revenue_per_booking': 220},
...     {'year': 2021, 'bookings': 1_500, 'revenue_per_booking': 220},
...     {'year': 2022, 'bookings': 1_700, 'revenue_per_booking': 225},
... ])
>>> (
...     revenue
...     .assign(bookings=revenue.bookings.apply('{:,d}'.format))
...     .assign(revenue_per_booking=revenue.revenue_per_booking.apply(fmt_currency))
...     .set_index('year')
... )
     bookings revenue_per_booking
year
2019    1,000                $200
2020    1,000                $220
2021    1,500                $220
2022    1,700                $225

It's quite straightforward to calculate the revenue for each year, and then to measure the year-over-year growth:

>>> (
...     revenue
...     .assign(revenue=revenue.eval('bookings * revenue_per_booking'))
...     .assign(growth=lambda x: x.revenue.diff())
...     .assign(bookings=revenue.bookings.apply('{:,d}'.format))
...     .assign(revenue_per_booking=revenue.revenue_per_booking.apply(fmt_currency))
...     .assign(revenue=lambda x: x.revenue.apply(fmt_currency))
...     .assign(growth=lambda x: x.growth.apply(fmt_currency))
...     .set_index('year')
... )
     bookings revenue_per_booking   revenue    growth
year
2019    1,000                $200  $200,000
2020    1,000                $220  $220,000   $20,000
2021    1,500                $220  $330,000  $110,000
2022    1,700                $225  $382,500   $52,500

Growth can be due to two factors: an increase in the number of bookings, or an increase in the revenue per booking. The icanexplain library to decompose the growth into these two factors. First, let's install the package:

pip install icanexplain

Then, we can use the SumExplainer to decompose the growth:

>>> import icanexplain as ice
>>> explainer = ice.SumExplainer(
...     fact='revenue_per_booking',
...     period='year',
...     count='bookings'
... )
>>> explanation = explainer(revenue)
>>> explanation.map(fmt_currency)
        inner       mix
year
2020  $20,000        $0
2021       $0  $110,000
2022   $7,500   $45,000

Here's how to interpret this explanation:

  • From 2019 to 2020, the revenue growth was entirely due to an increase in the revenue per booking. The number of bookings was exactly the same. Therefore, the $20,000 is entirely due to the inner effect (increase in revenue per booking).
  • From 2020 to 2021, the revenue growth was entirely due to an increase in the number of bookings. The revenue per booking was exactly the same. Therefore, the $110,000 is entirely due to the mix effect (increase in bookings).
  • From 2021 to 2022, there was a $52,500 revenue growth. However, the revenue per booking went down by $10, so the increase is due to the higher number of bookings. The inner effect is -$7,500 while the mix effect is $45,000.

Here's a visual representation of this last interpretation:

example

Contributing

Feel free to reach out to [email protected] if you want to know more and/or contribute 🤗

Check out the contribution guidelines to get started.

License

icanexplain is free and open-source software licensed under the Apache License, Version 2.0.