Skip to content

ZaxR/itemset_mining

Repository files navigation

Itemset Mining

downloads

latest release supported python versions package status license

travis build status docs build status coverage status

Implements itemset mining algorithms.

Algorithms

High-utility itemset mining (HUIM)

HUIM generalizes the problem of frequent itemset mining (FIM) by considering item values and weights. A popular application of HUIM is to discover all sets of items purchased together by customers that yield a high profit for the retailer. In such a case, item values would show not just that a load of bread is in a basket, but how many there are; and weights would include the profit from a loaf of bread.

More technically, HUIM requires transactions in the transactions "database" to have internal utilities (i.e. item values) associated with each item in each transaction and a "database" of external utilities for each item (i.e. weights).

Algorithm Class How to Cite
Two-Phase* itemset_mining.two_phase_huim.TwoPhase Link

* Includes max length support

Roadmap (high to low priority):

  • Address low-correlation HUIs with one of bond, all-confidence, or affinity. Itemsets that are high utility, but where the items aren't correlated can be misleading for making marketing decisions. E.g. if an itemset of a TV and a pen is a HUI, it's likely just because the TV is expensive, not because it's an interesting pattern.
  • Add average utility measure support, for easier, more intuitive minutil
  • Support discount strategies via a discount strategy table and upgraded external utilities table.
  • Add top-k HUI support.
  • Support identifying periodic high utility itemsets. This allows detection of purchase patterns among high-utility itemsets to allow cross-promotions to customers who buy sets of items periodically.
  • Support items' on-shelf time. Ignmoring on-shelf time will biat toward items that have more shelf time, since they have more chance to generate higher utility.
  • Allow incremental transaction updates without rerunning everything.
  • Support concise HUI itemsets, specifically closed form. This allows the algorithm to be more efficient, only showing longer itemsets, which may be the most interesting ones (correlation issues aside).

Installation:

pip install itemset-mining

Example:

    >>> from operator import attrgetter
    >>> from itemset_mining.two_phase_huim import TwoPhase
    >>> transactions = [
    ...     [("Coke 12oz", 6), ("Chips", 2), ("Dip", 1)],
    ...     [("Coke 12oz", 1)],
    ...     [("Coke 12oz", 2), ("Chips", 1), ("Filet Mignon 1lb", 1)],
    ...     [("Chips", 1)],
    ...     [("Chips", 2)],
    ...     [("Coke 12oz", 6), ("Chips", 1)]
    ... ]
    >>> # ARP for each item
    >>> external_utilities = {
    ...     "Coke 12oz": 1.29,
    ...     "Chips": 2.99,
    ...     "Dip": 3.49,
    ...     "Filet Mignon 1lb": 22.99
    ... }
    >>> # Minimum dollar value generated by an itemset we care about across all transactions
    >>> minutil = 20.00

    >>> hui = TwoPhase(transactions, external_utilities, minutil)
    >>> result = hui.get_hui()
    >>> sorted(result, key=attrgetter('itemset_utility'), reverse=True)
    ... # doctest: +NORMALIZE_WHITESPACE
    [HUIRecord(items=('Chips', 'Coke 12oz'), itemset_utility=30.02),
     HUIRecord(items=('Chips', 'Coke 12oz', 'Filet Mignon 1lb'), itemset_utility=28.56),
     HUIRecord(items=('Chips', 'Filet Mignon 1lb'), itemset_utility=25.979999999999997),
     HUIRecord(items=('Coke 12oz', 'Filet Mignon 1lb'), itemset_utility=25.57),
     HUIRecord(items=('Filet Mignon 1lb',), itemset_utility=22.99),
     HUIRecord(items=('Chips',), itemset_utility=20.93)]

About

A python package for itemset mining algorithms

Resources

License

Stars

Watchers

Forks

Sponsor this project

 

Packages

No packages published

Languages