Implements itemset mining algorithms.
HUIM generalizes the problem of frequent itemset mining (FIM) by considering item values and weights. A popular application of HUIM is to discover all sets of items purchased together by customers that yield a high profit for the retailer. In such a case, item values would show not just that a load of bread is in a basket, but how many there are; and weights would include the profit from a loaf of bread.
More technically, HUIM requires transactions in the transactions "database" to have internal utilities (i.e. item values) associated with each item in each transaction and a "database" of external utilities for each item (i.e. weights).
Algorithm | Class | How to Cite |
---|---|---|
Two-Phase* | itemset_mining.two_phase_huim.TwoPhase | Link |
* Includes max length support
- Address low-correlation HUIs with one of bond, all-confidence, or affinity. Itemsets that are high utility, but where the items aren't correlated can be misleading for making marketing decisions. E.g. if an itemset of a TV and a pen is a HUI, it's likely just because the TV is expensive, not because it's an interesting pattern.
- Add average utility measure support, for easier, more intuitive minutil
- Support discount strategies via a discount strategy table and upgraded external utilities table.
- Add top-k HUI support.
- Support identifying periodic high utility itemsets. This allows detection of purchase patterns among high-utility itemsets to allow cross-promotions to customers who buy sets of items periodically.
- Support items' on-shelf time. Ignmoring on-shelf time will biat toward items that have more shelf time, since they have more chance to generate higher utility.
- Allow incremental transaction updates without rerunning everything.
- Support concise HUI itemsets, specifically closed form. This allows the algorithm to be more efficient, only showing longer itemsets, which may be the most interesting ones (correlation issues aside).
pip install itemset-mining
>>> from operator import attrgetter
>>> from itemset_mining.two_phase_huim import TwoPhase
>>> transactions = [
... [("Coke 12oz", 6), ("Chips", 2), ("Dip", 1)],
... [("Coke 12oz", 1)],
... [("Coke 12oz", 2), ("Chips", 1), ("Filet Mignon 1lb", 1)],
... [("Chips", 1)],
... [("Chips", 2)],
... [("Coke 12oz", 6), ("Chips", 1)]
... ]
>>> # ARP for each item
>>> external_utilities = {
... "Coke 12oz": 1.29,
... "Chips": 2.99,
... "Dip": 3.49,
... "Filet Mignon 1lb": 22.99
... }
>>> # Minimum dollar value generated by an itemset we care about across all transactions
>>> minutil = 20.00
>>> hui = TwoPhase(transactions, external_utilities, minutil)
>>> result = hui.get_hui()
>>> sorted(result, key=attrgetter('itemset_utility'), reverse=True)
... # doctest: +NORMALIZE_WHITESPACE
[HUIRecord(items=('Chips', 'Coke 12oz'), itemset_utility=30.02),
HUIRecord(items=('Chips', 'Coke 12oz', 'Filet Mignon 1lb'), itemset_utility=28.56),
HUIRecord(items=('Chips', 'Filet Mignon 1lb'), itemset_utility=25.979999999999997),
HUIRecord(items=('Coke 12oz', 'Filet Mignon 1lb'), itemset_utility=25.57),
HUIRecord(items=('Filet Mignon 1lb',), itemset_utility=22.99),
HUIRecord(items=('Chips',), itemset_utility=20.93)]