-
Notifications
You must be signed in to change notification settings - Fork 1
Parses raw twitter JSON from stdin using python. I'm only extracting a few fields for quick processing in PIG. Still a lot of work to do. Currently, it extracts id, timestamp, client program, author, and tweet text. I'll add more fields such as geo, if requested. The filenames for the output and bad tweets are currently hardcoded for my testing.…
neilkod/tweetParser
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
twitter parser 2010 18Data license license license.... accepts twitter JSON from stdin and extracts tweet id, username, timestamp, client used, and tweet text trying to keep the output lightweight for performance reasons and to quickly process in map/reduce environments such as apache pig. big to-do - override the default filenames for the output file and bad file.
About
Parses raw twitter JSON from stdin using python. I'm only extracting a few fields for quick processing in PIG. Still a lot of work to do. Currently, it extracts id, timestamp, client program, author, and tweet text. I'll add more fields such as geo, if requested. The filenames for the output and bad tweets are currently hardcoded for my testing.…
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published