Skip to content
Davide Tampellini edited this page Mar 8, 2016 · 8 revisions

#####Q: Do you have any plan to create a GUI? A: Yes, I have. Creating a portable GUI in Windows and Linux seems to be a very long process. I'd rather spend the available time on improving the code and the algorithm instead of creating a GUI.
Moreover, at the moment the size of every package is about 50~70Mb, adding a GUI would inflate them tremendously.

#####Q: How are tweets organized? A: The main idea behind Dump Scraper is to work on files saved on the filesystem. Every step (scrape, organize, classify) will create new files instead of moving the existing ones. In this way, if anything goes wrong or we improve the algorithm, you'll be able to work again on the same files, you'll simply have to delete the output.

All dumps will be stored under the data directory:

data
  `- organized
    `- hash
      `- YYYY
         `- MM
           `- DD
             `- <tweet id>.txt
    `- plain
    `- trash
  `- processed
    `- hash
      `- YYYY
         `- MM
           `- DD
            `- <tweet id>.txt
    `- plain
  `- raw
      `- YYYY
         `- MM
           `- DD
            `- <tweet id>.txt
        `- features.csv
  • raw Stores all the dumps downloaded from PasteBin, creating one directory for each day
  • organize Contains the dumps divided into three categories: Trash (files we don't care), Plain (files with plain passwords) and Hash (files with encrypted passwords)
  • processed Contains the final result: on each line there will be one hash or plain password, depending by the original type

#####Q: It used to be some ready-to-use binaries, where are they? A: Building an executable for Windows and Linux is causing too many problems: I spent several days just to fix "hidden imports". I'd rather spend this time to improve Dump Scraper, instead of applying several one workaround after another.

Clone this wiki locally