Skip to content

GSoC 2023 ‐ Code Refactoring and Parallelization

mshudrak edited this page Sep 12, 2023 · 2 revisions

GSoC 2023

Google Summer of Code' 23 Final Report

  • Name - Harsh (@peb-peb)
  • Organisation - GCP Scanner
  • Project - Code Refactoring & Parallelization
  • Proposal - [link to proposal]

Summary

Parallelization

GCP Scanner didn't support parallel enumeration of GCP resources and parallel scanning of GCP targets.

To address this issue, project based and resource based parallelization was done using multithreading. This feature can be used by adding the -wc or --worker-count followed by an integer stating the number of workers to appoint while crawling in the existing GCP Scanner.

Example command:

gcp-scanner -o <<output_dir>> -g - --worker-count 8

Note: Some of the issues faced while developing the above solution and their discussion can be found here.

Code Refactoring

GCP Scanner had one giant scanning loop from where it launched GCP resource crawlers. We needed to split each crawler into individual modules with proper error handling that would improve code readability and quality.

This issue was addressed and solved by implementing the factory design for the crawlers. I leveraged Python classes for the state of execution control, config parsing, and enabling/disabling certain functionality in the scanner.

What I learned

  • Python parallization and multiprocessing libraries: I learned about parallelization and multiprocessing in Python, including the different libraries available, such as multiprocessing, threading, and concurrent.futures. I also learned about the pros and cons of each library.
  • Multiprocessing vs Multithreading: I learned the difference between multiprocessing and multithreading, and when to use each one. I learned that multiprocessing is used fro CPU bound task and multithreading for IO bound tasks.
  • Refactoring: I learned the art of refactoring, which is the process of improving the structure of the code without changing its functionality. Refactoring can help to make the code more readable, maintainable, and efficient.
  • Communication: The one thing that I learned in this program that would help me throughout my career would be: communication with my mentors. I am grateful to my mentors for their support and helping me learn this skill.
  • Time management: During the entire period of the program I never felt stressed. My mentors were awesome. Also, I learnt the importance of time management and task prioritization.

Tasks Achieved

Future Work

The tool has improved and changed a lot since I first started contributing. I plan to keep working on the project and contribute as much as I can. Some of the features I'd like to work in the future are:

  1. Add local testing support for developers of the tool.
  2. Expand support for more GCP resources.
  3. Improve logging and CLI appearance of the tool.

I would like to thank Google and GCP Scanner for providing me with this wonderful opportunity and my mentors Maksim Shudrak and Calle Svensson who guided me and taught me all sorts of things during this summer.

I would also like to thank my fellow GSoCer Sudipto Baral and GCP Scanner Community for helping me during the program.