- Increasing parallelism performance (real multiprocessing implementa…

…tion, addressing #29) - Better handling of config parser errors (addressing #22) - Fixing typos
r3nt0n · Aug 30, 2024 · c66f91a · c66f91a
1 parent adb2050
commit c66f91a
Show file tree

Hide file tree

Showing 9 changed files with 48 additions and 48 deletions.
diff --git a/README.md b/README.md
@@ -14,7 +14,7 @@ Thanks dude :)
 [![Packaging status](https://repology.org/badge/tiny-repos/bopscrk.svg)](https://repology.org/project/bopscrk/versions)
 ![[GPL-3.0 License](https://github.com/r3nt0n)](https://img.shields.io/badge/license-GPL%203.0-brightgreen.svg)
 ![[Python 3](https://github.com/r3nt0n)](http://img.shields.io/badge/python-3-blue.svg)
-![[Version 2.4.5](https://github.com/r3nt0n)](http://img.shields.io/badge/version-2.4.5-orange.svg)
+![[Version 2.4.6](https://github.com/r3nt0n)](http://img.shields.io/badge/version-2.4.6-orange.svg)
 
 
 
@@ -108,7 +108,7 @@ Thanks dude :)
 
 ### What's new
 
-**2.4.5 RELEASED**: Progress bar with ETA implemented
+**2.4.6 RELEASED** (30/08/2024): Speed and performance dramatically increased, real multiprocessing implementation.   
 
 [//]: # (<p align="center"><img src="https://github.com/r3nt0n/bopscrk/blob/master/img/progressbar_example1.gif" /></p>)
 
@@ -200,7 +200,6 @@ It will retrieve all lyrics from all songs which belongs to artists that you pro
 
 #### Customizing behaviour using .cfg file
 + In `bopscrk.cfg` file you can specify your own charsets and enable/disable options:
-  + **threads**: number of threads to use in multithreaded operations
   + **extra_combinations** (like `(john, doe) => 123john, john123, 123doe, doe123, john123doe doe123john`) are *enabled by default*. You can disable it in the configuration file in order to get more focused wordlists.  
   + **separators_chars**: characters to use in extra-combinations. *Can be a single char or a string of chars, e.g.: `!?-/&(`*  
   + **separators_strings**: strings  to use in extra-combinations. *Can be a single string or a list of strings space-separated, e.g.: `123` `34!@`*
@@ -214,7 +213,6 @@ It will retrieve all lyrics from all songs which belongs to artists that you pro
   + **lyric_space_replacement**: same with lyrics found
   + **space_replacement_chars**: characters to insert instead of spaces inside an artist name or a lyric phrase.  *Can be a single char or a string of chars, e.g.: `!?-/&(`*
   + **space_replacement_strings**: strings to insert instead of spaces inside an artist name or a lyric phrase.  *Can be a single string or a list of strings space-separated, e.g.: `123` `34!@`*
-+ Some transforms have **extensive charsets** preincluded. To use it instead of the basic ones, just **comment and uncomment** the corresponding lines (It's important to comment the original one, if you let two lines with the same keyname uncommented, it will throw an error: `AttributeError: 'bool' object has no attribute 'split'`).
 
 + **Parameters configuration examples**
   + Combine all the words using dots as separator, and same using commas  
@@ -232,11 +230,12 @@ It will retrieve all lyrics from all songs which belongs to artists that you pro
 - [ ] Improve **memory management**
     - [ ] Write wordlists into filesystem during execution and use it as cache (<a href="https://github.com/r3nt0n/bopscrk/issues">#12</a>)
 - [ ] Improve **performance**
-    - [ ] Refactor and improve threads and transforms logic
+    - [x] Improve parallelism logic
 - [ ] Extra features
     - [x] Implement **progress bar** to keep user informed of the execution state
     - [ ] Implement **session file** to keep track of the execution point and **be able to stop and resume sessions** (<a href="https://github.com/r3nt0n/bopscrk/issues">#12</a>)
     - [ ] Create **config options** for customized **case transforms** (e.g.: disable pair/odd transforms)
+    - [ ] Implement "pipable" output to allow integration with other tools (`-q` flag will just output final wordlist to sdout)
 
 See the [open issues](https://github.com/r3nt0n/bopscrk/issues) for a full list of proposed features (and known issues).
 
@@ -272,6 +271,11 @@ Thank you all!
 
 ## Changelist
 [//]: # (+ `last development version &#40;available on Github&#41;`)
++ `2.4.6 version notes (30/08/2024)`
+  + **Increasing parallelism performance** (real multiprocessing implementation)
+  + Better handling of config parser errors
+  + Fixing typos
+
 + `2.4.5 version notes (02/08/2022)`
   + **progress bar** implemented and working
   + `version` argument included

diff --git a/bopscrk/bopscrk.cfg b/bopscrk/bopscrk.cfg
@@ -10,24 +10,25 @@
 ###################################################################################
 
 [GENERAL]
-# Number of threads to use in multithreaded operations
-threads=32
+# Reserved for potential future uses
 
 [COMBINATIONS]
-# Enables extra combination and additions at begining and end of words
+# Enables extra combination and additions at beginning and end of words
 # example: (john, doe) => 123john, john123, 123doe, doe123, john123doe doe123john
 extra_combinations=true
 # SEPARATORS CHARSET - Characters to use in extra-combinations
 separators_chars=._-$%%&#@
-separators_strings=123 xXx !!
-# To get an extensive charset, comment the previous line and uncomment the next one (having both enabled could cause an error)
-# separators_chars=!"#$%%&'()*+,-./:;<=>?@[\]^_`{|}~
+separators_strings=!! 123 xXx
+# To get extensive charsets, uncomment the following lines:
+#separators_chars=!"#$%%&'()*+,-./:;<=>?@[\]^_`{|}~
+#separators_strings=!! ¡¡ !!! ¡¡¡ ¡!¡ !¡! 123 1234 xXx XxX WwW wWw
+
 
 [TRANSFORMS]
 # LEET REPLACEMENT CHARSET
 # characters to replace and correspondent substitute in leet transforms
 leet_charset=a:4 e:3 i:1 o:0 s:$
-# To get an extensive charset, comment the previous line and uncomment the next one (having both enabled could cause an error)
+# To get an extensive charset, uncomment the following line
 # leet_charset=a:4 a:@ e:3 i:1 i:! i:¡ l:1 o:0 s:$ s:5 b:8 t:7 c:(
 
 # RECURSIVE LEET TRANSFORMS - Enables a recursive call to leet_transforms() function
@@ -50,5 +51,5 @@ lyric_space_replacement=true
 # Comment two above lines or set it empty in order to don't replace spaces, just remove them
 space_replacement_chars=!@+._-
 space_replacement_strings=
-# To get an extensive charset, comment the previous line and uncomment the next one (having both enabled cause an error)
+# To get an extensive charset, uncomment the following line
 #space_replacement_chars=!"#$%%&'()*+,-./:;<=>?@[\]^_`{|}~
diff --git a/bopscrk/bopscrk.py b/bopscrk/bopscrk.py
@@ -6,7 +6,7 @@
 
 name = 'bopscrk.py'
 desc = 'Generate smart and powerful wordlists'
-__version__ = '2.4.5'
+__version__ = '2.4.6'
 __author__ = 'r3nt0n'
 __status__ = 'Development'
 

diff --git a/bopscrk/modules/banners.py b/bopscrk/modules/banners.py
@@ -21,9 +21,9 @@ def banner(name, version, author="r3nt0n"):
         name_rand_leet = name
     name_rand_case = case_transforms(name)
     name_rand_case = name_rand_case[randint((len(name_rand_case) - 3), (len(name_rand_case) - 1))]
-    version = version[:3]
+    #version = version[:3]
     print('  ,----------------------------------------------------,   ,------------,');sleep(interval)
-    print('  | [][][][][]  [][][][][]  [][][][]  [][__]  [][][][] |   |    v{}{}{}    |'.format(color.BLUE, version, color.END));sleep(interval)
+    print('  | [][][][][]  [][][][][]  [][][][]  [][__]  [][][][] |   |   v{}{}{}   |'.format(color.BLUE, version, color.END));sleep(interval)
     print('  |                                                    |   |------------|');sleep(interval)
     print('  |  [][][][][][][][][][][][][][_]    [][][]  [][][][] |===| {}{}{} |'.format(color.RED, name_rand_leet, color.END));sleep(interval)
     print('  |  [_][][][]{}[]{}[][][][]{}[][]{}[][][ |   [][][]  [][][][] |===| {}{}{}{} |'.format(color.KEY_HIGHL, color.END, color.KEY_HIGHL, color.END, color.BOLD, color.RED, name, color.END));sleep(interval)

diff --git a/bopscrk/modules/config.py b/bopscrk/modules/config.py
@@ -10,13 +10,14 @@
 class Config:
     def __init__(self, cfg_file):
         self.CFG_FILE = cfg_file
+        self.cfg = configparser.ConfigParser(strict=False)
 
     def read_config(self, category, field):
-        cfg = configparser.ConfigParser()
         try:
-            cfg.read([self.CFG_FILE])
-            value = cfg.get(category, field)
-        except:
+            self.cfg.read([self.CFG_FILE])
+            value = self.cfg.get(category, field)
+        except Exception as e:
+            print(e)
             value = False
         return value
 
@@ -36,12 +37,7 @@ def parse_booleans(self, value):
         except AttributeError:
             return None
 
-    def parse_threads(self, value):
-        try: value = int(value); return value
-        except ValueError: return 4  # default number of threads if error in config provided
-
     def setup(self):
-        self.THREADS = self.parse_threads(self.read_config('GENERAL', 'threads'))
         self.EXTRA_COMBINATIONS = self.parse_booleans(self.read_config('COMBINATIONS', 'extra_combinations'))
         self.SEPARATORS_CHARSET = self.merge_settings(self.read_config('COMBINATIONS', 'separators_chars'),
                                                       self.read_config('COMBINATIONS', 'separators_strings'))

diff --git a/bopscrk/modules/excluders.py b/bopscrk/modules/excluders.py
@@ -3,7 +3,7 @@
 # https://github.com/r3nt0n/bopscrk
 # bopscrk - transform functions module
 
-from multiprocessing.dummy import Pool as ThreadPool
+from multiprocessing import Pool, cpu_count
 from collections import OrderedDict
 
 from . import Config
@@ -15,7 +15,7 @@ def compare(word_to_exclude, word_in_wordlist):
 # Remove word to exclude from final_wordlist
 def multithread_exclude(word_to_exclude, wordlist):
     diff_wordlist = []
-    with ThreadPool(Config.THREADS) as pool:
+    with Pool(cpu_count()) as pool:
         #args = (word, words_to_exclude)
         diff_wordlist += pool.starmap(compare, [(word_to_exclude, word) for word in wordlist])
 

diff --git a/bopscrk/modules/main.py b/bopscrk/modules/main.py
@@ -10,7 +10,7 @@
 from .auxiliars import clear, remove_duplicates_from_file
 from . import banners
 from .color import color
-from .transforms import leet_transforms, case_transforms, artist_space_transforms, lyric_space_transforms, multithread_transforms, take_initials, transform_cached_wordlist_and_save
+from .transforms import leet_transforms, case_transforms, artist_space_transforms, lyric_space_transforms, multiprocess_transforms, take_initials, transform_cached_wordlist_and_save
 from .combinators import combinator, add_common_separators
 from .excluders import remove_by_lengths, remove_duplicates, multithread_exclude
 
@@ -24,7 +24,7 @@ def run(name, version):
     if args.print_version: print(name + '_' + version); sys.exit(0)
 
     try:
-        # setting args whter interactive or not
+        # setting args whether interactive or not
         if args.interactive:
             clear()
             banners.bopscrk_banner()
@@ -92,7 +92,7 @@ def run(name, version):
                     # Take just the initials on each phrase and add as a new word to FINAL wordlist
                     if Config.TAKE_INITIALS:
                         base_lyrics = lyrics[:]
-                        ly_initials_wordlist = multithread_transforms(take_initials, base_lyrics)
+                        ly_initials_wordlist = multiprocess_transforms(take_initials, base_lyrics)
                         final_wordlist += ly_initials_wordlist
 
                     # Make space transforms and add it too
@@ -102,7 +102,7 @@ def run(name, version):
                     elif Config.LYRIC_SPACE_REPLACEMENT:
                         print('  {}[+]{} Producing new words replacing spaces in {} phrases...'.format(color.BLUE, color.END, len(lyrics)))
                         base_lyrics = lyrics[:]
-                        space_transformed_lyrics = multithread_transforms(lyric_space_transforms, base_lyrics)
+                        space_transformed_lyrics = multiprocess_transforms(lyric_space_transforms, base_lyrics)
                         final_wordlist += space_transformed_lyrics
 
                 except ImportError:
@@ -121,16 +121,16 @@ def run(name, version):
         if Config.EXTRA_COMBINATIONS:
             if Config.SEPARATORS_CHARSET:
                 #print('  {}[+]{} Creating extra combinations (separators charset in {}{}{})...'.format(color.BLUE, color.END,color.CYAN, args.cfg_file,color.END))
-                print('  {}[+]{} Creating extra combinations with separators charset...'.format(color.BLUE,color.END))
+                print('  {}[+]{} Creating extra combinations using separators charset...'.format(color.BLUE,color.END))
                 final_wordlist += add_common_separators(base_wordlist)
                 print('  {}[*]{} Words produced: {}'.format(color.CYAN, color.END, len(final_wordlist)))
             else:
-                print('  {}[!]{} Any separators charset specified in {}{}'.format(color.ORANGE, color.END, args.cfg_file,color.END))
+                print('  {}[!]{} No separators charset specified in {}{}'.format(color.ORANGE, color.END, args.cfg_file,color.END))
 
         # Remove words by min-max length range established
         print('  {}[-]{} Removing words by min and max length provided ({}-{})...'.format(color.PURPLE, color.END,args.min_length,args.max_length))
         final_wordlist = remove_by_lengths(final_wordlist, args.min_length, args.max_length)
-        print('  {}[*]{} Words remained: {}'.format(color.CYAN, color.END, len(final_wordlist)))
+        print('  {}[*]{} Words remaining: {}'.format(color.CYAN, color.END, len(final_wordlist)))
         # (!) Check for duplicates (is checked before return in combinator() and add_common_separators())
         #final_wordlist = remove_duplicates(final_wordlist)
 
@@ -164,14 +164,14 @@ def run(name, version):
                     #       '      max-length configured (now is {}{}{}) and the size of your\n'
                     #       '      wordlist at this point (now contains {}{}{} words), this process\n'
                     #       '      could take a long time{}\n'.format(color.ORANGE,color.END,args.max_length,color.ORANGE,color.END,len(final_wordlist),color.ORANGE,color.END))
-                    recursive_msg = '{}recursive{} '.format(color.RED,color.END)
+                    recursive_msg = '{}recursive{} '.format(color.ORANGE,color.END)
                 print('  {}[+]{} Applying {}leet transforms to {} words...'.format(color.BLUE, color.END, recursive_msg,len(final_wordlist)))
 
                 #transform_cached_wordlist_and_save(leet_transforms, args.outfile)
                 #remove_duplicates_from_file(args.outfile)
 
                 temp_wordlist = []
-                temp_wordlist += multithread_transforms(leet_transforms, final_wordlist)
+                temp_wordlist += multiprocess_transforms(leet_transforms, final_wordlist)
                 final_wordlist += temp_wordlist
 
         # CASE TRANSFORMS
@@ -181,14 +181,14 @@ def run(name, version):
             # transform_cached_wordlist_and_save(case_transforms, args.outfile) # not working yet, infinite loop ?¿?¿
 
             temp_wordlist = []
-            temp_wordlist += multithread_transforms(case_transforms, final_wordlist)
+            temp_wordlist += multiprocess_transforms(case_transforms, final_wordlist)
             final_wordlist += temp_wordlist
 
         print('  {}[-]{} Removing duplicates...'.format(color.PURPLE, color.END))
         final_wordlist = remove_duplicates(final_wordlist)
-        print('  {}[*]{} Words remained: {}'.format(color.CYAN, color.END, len(final_wordlist)))
+        print('  {}[*]{} Words remaining: {}'.format(color.CYAN, color.END, len(final_wordlist)))
 
-        # EXCLUDE FROM OTHER WORDLISTS
+        # EXCLUDE FROM OTHER WORDLISTS (deprecated)
         #if args.exclude_wordlists:
             # For each path to wordlist provided
             # for wl_path in args.exclude_wordlists:
@@ -218,7 +218,7 @@ def run(name, version):
         # PRINT RESULTS
         ############################################################################
         print('\n  {}[+]{} Words generated:\t{}{}{}'.format(color.GREEN, color.END, color.RED, len(final_wordlist),color.END))
-        print('  {}[+]{} Time elapsed:\t{}'.format(color.GREEN, color.END, total_time))
+        print('  {}[+]{} Elapsed time:\t{}'.format(color.GREEN, color.END, total_time))
         print('  {}[+]{} Output file:\t{}{}{}{}'.format(color.GREEN, color.END, color.BOLD, color.BLUE, args.outfile, color.END))
         #print('  {}[+]{} Words generated:\t{}{}{}\n'.format(color.GREEN, color.END, color.RED, str(sum(1 for line in open(args.outfile))), color.END))
         sys.exit(0)

diff --git a/bopscrk/modules/transforms.py b/bopscrk/modules/transforms.py
@@ -3,9 +3,8 @@
 # https://github.com/r3nt0n/bopscrk
 # bopscrk - transform functions module
 
-from multiprocessing.dummy import Pool as ThreadPool
+from multiprocessing import cpu_count, Pool
 
-#from tqdm import tqdm
 from alive_progress import alive_bar
 
 from . import Config
@@ -136,10 +135,10 @@ def lyric_space_transforms(word):
     return new_wordlist
 
 
-def multithread_transforms(transform_type, wordlist):
+def multiprocess_transforms(transform_type, wordlist):
     # process each word in their own thread and return the results
     new_wordlists = []
-    with ThreadPool(Config.THREADS) as pool:
+    with Pool(cpu_count()) as pool:
         with alive_bar(bar=None,spinner='bubbles', monitor=False,elapsed=False,stats=False,receipt=False) as progressbar:
             new_wordlists += pool.map(transform_type, wordlist)
             progressbar()
@@ -173,7 +172,7 @@ def transform_cached_wordlist_and_save(transform_type, filepath):
                 counter += 1
                 last_position = f.tell()  # save last_position
 
-        new_wordlist += multithread_transforms(transform_type, cached_wordlist)
+        new_wordlist += multiprocess_transforms(transform_type, cached_wordlist)
         #cached_wordlist += new_wordlist
         append_wordlist_to_file(filepath, new_wordlist)
 

diff --git a/bopscrk/tests/transforms_tests.py b/bopscrk/tests/transforms_tests.py
@@ -9,7 +9,7 @@
 from os import path
 sys.path.append(path.dirname(path.dirname(path.abspath(__file__))))
 
-from ..modules.transforms import case_transforms, leet_transforms, multithread_transforms, \
+from ..modules.transforms import case_transforms, leet_transforms, multiprocess_transforms, \
                            take_initials, artist_space_transforms, lyric_space_transforms
 
 
@@ -29,8 +29,8 @@ def test_case_transform(self):
 
     def test_multithread_transform(self):
         wordlist = ['hello', 'world', 'lorem', 'ipsum']
-        self.assertEqual(33, len(multithread_transforms(case_transforms, wordlist)))
-        self.assertEqual(10, len(multithread_transforms(leet_transforms, wordlist)))
+        self.assertEqual(33, len(multiprocess_transforms(case_transforms, wordlist)))
+        self.assertEqual(10, len(multiprocess_transforms(leet_transforms, wordlist)))
 
     def test_take_initials(self):
         word = 'hello world lorem ipsum'