diff --git a/docs/2023/cyclonedx/updates/2023-07-13.md b/docs/2023/cyclonedx/updates/2023-07-13.md new file mode 100644 index 000000000..e4affc745 --- /dev/null +++ b/docs/2023/cyclonedx/updates/2023-07-13.md @@ -0,0 +1,28 @@ +--- +title: Week 7 +author: Sushant Kumar +--- + + +*(July,13,2023)* + +### Updates: + +- This week I have started working on the scanning speed improvement of + [scancode + agent](https://github.com/fossology/fossology/tree/master/src/scancode). +- Explored different approaches for running ScanCode with varied parameters. +- Successfully integrated changes to execute ScanCode through its API. +- Discovered that the API execution is notably faster (13 seconds) compared to + the CLI (23 seconds). + +### Conclusion and further plans: + +- Refine the process of reading output from the Python script. +- Work on enhancing the integration with the database to ensure accurate updates. +- Also, to work on changes requested on CDX PR + [#2507](https://github.com/fossology/fossology/pull/2507) \ No newline at end of file diff --git a/docs/2023/cyclonedx/updates/2023-07-20.md b/docs/2023/cyclonedx/updates/2023-07-20.md new file mode 100644 index 000000000..5a83b454b --- /dev/null +++ b/docs/2023/cyclonedx/updates/2023-07-20.md @@ -0,0 +1,31 @@ +--- +title: Week 8 +author: Sushant Kumar +--- + + +*(July,20,2023)* + +### Updates: + +- This week, I was mainly working on the modifications requested by + mentors regarding Pull Request + [#2507](https://github.com/fossology/fossology/pull/2507). +- Major changes include: + - Made licenseRefs as license expressions, as CycloneDX schema does not + support SPDX LicenseRef as valid license identifier. + - Refactored the SPDX agent code, effectively eliminating the multiple +implementation of same functions being used in both CycloneDX and SPDX agents. + - Successfully resolved failing test cases within the SPDX agent for the pull +request. + - Added download option to download report from UI. + + +### Conclusion and further plans: + +- In upcoming weeks, I will continue working on scancode agent improvement in + FOSSology. \ No newline at end of file diff --git a/docs/2023/cyclonedx/updates/2023-07-27.md b/docs/2023/cyclonedx/updates/2023-07-27.md new file mode 100644 index 000000000..81d671e80 --- /dev/null +++ b/docs/2023/cyclonedx/updates/2023-07-27.md @@ -0,0 +1,33 @@ +--- +title: Week 9 +author: Sushant Kumar +--- + + +*(July,27,2023)* + +### Updates: + +- During this week, my focus was on leveraging the [ScanCode + API](https://github.com/nexB/scancode-toolkit/blob/develop/src/scancode/api.py) + within the ScanCode agent to efficiently retrieve licenses and copyrights + information from files. The API has demonstrated faster results compared to + the command line interface (CLI). +- I've also made improvements in how the output from Python scripts, invoked by + the ScanCode agent, is processed and utilized. +- Notably, the current process lacks the inclusion of emails and URLs identified + in a file when invoking ScanCode via CLI. To address this, I have made changes + to add the missing information to the database for each file. +- A compilation of all the changes made this week and in preceding weeks + regarding scancode agent can be reviewed + [here](https://github.com/its-sushant/fossology/commit/649807f54f02453850c7043f53af7cea4c0fb250). + + +### Conclusion and further plans: + +- In the coming weeks, I will continue to try different approaches to improve + the ScanCode agent in FOSSology. \ No newline at end of file diff --git a/docs/2023/cyclonedx/updates/2023-08-03.md b/docs/2023/cyclonedx/updates/2023-08-03.md new file mode 100644 index 000000000..fe010759d --- /dev/null +++ b/docs/2023/cyclonedx/updates/2023-08-03.md @@ -0,0 +1,35 @@ +--- +title: Week 10 +author: Sushant Kumar +--- + + +*(August,03,2023)* + +### Updates: + +- Throughout this week, my primary focus remained on enhancing the ScanCode agent. +- A significant concern with the agent is its current practice of invoking + ScanCode through the command line interface (CLI) for each individual file, + leading to a file-by-file scanning process. Unfortunately, this resulted in a + considerable amount of time being spent on bootstrapping ScanCode for each + file. +- As a solution to this inefficiency, I explored a different approach. I + attempted to leverage the [ScanCode + API](https://github.com/nexB/scancode-toolkit/blob/develop/src/scancode/api.py) + to scan all files in a single call, consolidating the results into a unified + location, potentially a JSON file. +- The intended workflow involves storing the outcomes from the API call in a +centralized JSON file. Subsequently, the data extracted from the JSON results +will be efficiently populated into the database for each file during the upload +process. + + +### Conclusion and further plans: + +- In the coming weeks, I will try to implement the aforementioned workflow in + FOSSology. \ No newline at end of file diff --git a/docs/2023/cyclonedx/updates/2023-08-10.md b/docs/2023/cyclonedx/updates/2023-08-10.md new file mode 100644 index 000000000..837fe066e --- /dev/null +++ b/docs/2023/cyclonedx/updates/2023-08-10.md @@ -0,0 +1,52 @@ +--- +title: Week 11 +author: Sushant Kumar +--- + + +*(August,10,2023)* + +### Updates: + +- This past week, my focus was on implementing a method to simultaneously scan + all files, streamlining the scanning process. +- The following procedure was followed to achieve this objective: + - **File Location Retrieval:** + - Utilized the Fossology ScanCode agent to gather the file locations for + each individual file. + - Stored these file locations in a temporary text file. + - **Python Script Integration:** + - Passed the path of the temporary text file containing file locations to a + dedicated Python script which will scan result using scancode api. + - **Parallel Scanning Script:** + - Developed a Python script responsible for the concurrent scanning of + files. + - Employed a loop to iterate through each file location stored in the text + file. + - For each file location, invoked the ScanCode API to initiate scanning. + - Captured the resulting output and appended it to a JSON file. + - **Updating results to database:** + - Following the script's completion, extracted the data from the generated + JSON file. + - Leveraged the ScanCode agent to retrieve the data and subsequently saved + it to the Fossology database. + - **Clean-up Process:** + - Concluded the process by erasing both the temporary text file and the + generated JSON file. +- This strategic shift offers notable advantages: + - Drastically reducing the time spent on ScanCode's bootstrapping process. + - Optimizing the utilization of the ScanCode toolkit within the Fossology +framework. +- Raised a [pull request](https://github.com/fossology/fossology/pull/2569) + after making all these changes. + +### Conclusion and further plans: + +- In the coming weeks, I will start making my final report for final + evaluation. +- Will also work on this [PR](https://github.com/fossology/fossology/pull/2569), + if any changes are required. \ No newline at end of file