-
Notifications
You must be signed in to change notification settings - Fork 97
GSoC 2023 ‐ Expanding Support for GCP Resources with Modularized Crawler
This document describes the Google Summer of Code 2023 project titled "Expanding Support for GCP Resources with Modularized Crawler."
By leveraging Google Cloud APIs, the GCP Scanner has the capacity to query a range of GCP resources. However, the current version only supports scanning 13 resources, despite Google Cloud offering over 100 different products. To address this limitation, the project aims to enhance the scanning capabilities by incorporating additional GCP resources, thereby maximizing its utility.
Furthermore, the existing structure consolidates the crawlers into a single file, potentially posing challenges in scalability and maintainability as the roster of crawlers expands. Consequently, a refactoring effort is indispensable to modularize the crawler, and this includes adapting its unit tests to ensure its enduring maintainability.
In the previous iteration, the crawl.py
file encompassed all the logic necessary for data crawling from the GCP APIs. However, this project introduces a new crawl module, fundamentally transforming the approach. The introduction of this module enables the modularization of the existing crawler, significantly enhancing its maintainability. Furthermore, this refinement simplifies the process of implementing new crawlers. If you're intrigued by the intricacies of the implementation, feel free to explore Epic: Refactor the Crawler for modularity and better maintainability for more in-depth details.
From the outset, I took a proactive approach by organizing tasks early on. I encountered a situation where the concept of code refactoring overlapped with the project of another GSoC contributor. To maintain a clear structure, I took the initiative to outline the plan and detailed all the corresponding subtasks. This proactive approach allowed me to work collaboratively with fellow participants, successfully addressing each subtask. Throughout the process, I consistently engaged in peer reviews and actively shared my insights.
-
#168 Storage, Cloud SQL, Bigquery, and Pubsub Client factory
-
#170 client factory for cloud function, bigtable, spanner and filestore
-
#179 client factory for kms, service usage, sourcerepo, resource manager
Furthermore, during the process of refactoring the crawler, it became apparent that we could enhance the scanner.py
by eliminating repetitive if-else
statements. Issue link here.
A strategy to incorporate additional crawlers has been outlined. Subsequently, I added support for the following crawlers crawlers. If you are interested in adding new crawlers refer to Epic: add support for additional GCP resources.