GitHub - ChengTsang/HDFS-Hash-Distinct: This is a java script to achieve hash distinct with HDFS system，You need to make sure that you are familiar with HDFS first.

Realize Hash Distinct Based on HDFS

The project is to distinct based on hash. I implemented it on HBase and HDFS systems. It may help you get familiar with HBase and HDFS.

In the first, you need to read data from HDFS.The format of the file is text,each row of which is a relational record, each colume is seperated from each coliumn.

for example: 1|AMERICA|hs use ironic, even requests. s|

zeroth column: 1 first column: AMERICA Second columns: HS use ironic, even requests. S

Some basic command below is helpful:

  $ hdfs dfs –help
  -copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst> :

Identical to the -put command.

  -copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst> :

Identical to the -get command.

  -cat [-ignoreCrc] <src> ... :

Fetch all files that match the file pattern and display their content on stdout.

  -ls [-d] [-h] [-R] [<path> ...] :

list contents

You could also read HDFSTest.java to know how to read data from HDFS. The HBaseTest.java may help you know how to use HBase.

The main idea for hash distinct is below:

Build a hash table
Key= distinct all key (for example：R2,R3,R5)
Value= None
Put the input in hash table once and only once, and finally scan out all the items in hash table.

You can use command like this to use my code: java Hw1Grp4 R=/hw1/part.tbl select:R7,gt,1800 distinct:R3,R4,R5

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
HashDistinct		HashDistinct
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Realize Hash Distinct Based on HDFS

About

Releases

Packages

Languages

ChengTsang/HDFS-Hash-Distinct

Folders and files

Latest commit

History

Repository files navigation

Realize Hash Distinct Based on HDFS

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages