Apriori Assignment

C++ Implementation of Apriori Algorithm

To run the implementation

Keep project files in one folder.
compile using command make.

To compile without using the makefile, type the following command.

g++ -std=c++11 apriori.cpp -o apriori.exe

(Note that -std=c++11 option is must be given in g++.)

Run using following command.

apriori.exe [min support (%)] [input file name] [output file name]

ex. apriori.exe 5 input.txt output.txt

Summary of the algorithm

Apriori algorithm is to find frequent itemsets using an iterative level-wise approach based on candidate generation.

Input: A database of transactions, the minimum support count threshold

Output: frequent itemsets in the database

The algorithm solves the problem with a two-step approach.

Step1. Frequent Itemset Generation

Generate all itemsets whose support with a value of minimum support or greater.

But the proccess requires a lot of computation (O(3^(k)-2^(k+1)+1), k=the number of item).

The key idea of apriori algorithmto reduce this operation is that any subset of a frequent itemset must be frequent.

Therefore, if there is any itemset which is infrequent, its superset should not be generated/tested.

As a result, follow the steps below to proceed with this process.

Generate length (k+1)-candidate itemsets from length k frequent itemsets. (This process is called joining.)

And delete newly generated (k+1)-items if the item set that removed one element is not in (k)-candidate. (This process is called pluning.)

Calculate the support of the candidates and remove candidates with support less than min support.

Proceed 1 again until there are no more candidates left.

Step2. Associate Rule Generation

Generate high confidence rules from each frequent itemset, where each rule is a binary partition of a frequent itemset.

Any other specification of the implementation and testing

Note that I use c++11, not c++. therefore -std=c++11 option is must be given in g++.

Since the test code uploaded to Hanyang Portal is for Windows, I added checker class for using in Unix.

About input file

Input file format

[item_id]\t[item_id]\n

[item_id]\t[item_id]\t[item_id]\t[item_id]\t[item_id]\n

[item_id]\t[item_id]\t[item_id]\t[item_id]\n

Row is transaction
[item_id] is a numerical value
There is no duplication of items in each transaction

About output file

output file format

[item_set]\t[associative_item_set]\t[support(%)]\t[confidence(%)]\n

Support: probability that a transaction contains [item_set] [associative_item_set]
Confidence: conditional probability that a transaction having [item_set] also contains [associative_item_set]
The value of support and confidence should be rounded to two decimal places.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
apriori.cpp		apriori.cpp
input.txt		input.txt
output_answer_support4.txt		output_answer_support4.txt
output_answer_support5.txt		output_answer_support5.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Apriori Assignment

To run the implementation

Summary of the algorithm

Any other specification of the implementation and testing

About input file

About output file

About

Releases

Packages

Languages

License

bowbowbow/Apriori

Folders and files

Latest commit

History

Repository files navigation

Apriori Assignment

To run the implementation

Summary of the algorithm

Any other specification of the implementation and testing

About input file

About output file

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages