Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add updates for week 7 to week 11 #181

Merged
merged 1 commit into from
Aug 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions docs/2023/cyclonedx/updates/2023-07-13.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
title: Week 7
author: Sushant Kumar
---
<!--
SPDX-License-Identifier: CC-BY-SA-4.0

SPDX-FileCopyrightText: 2023 Sushant Kumar <[email protected]>
-->

*(July,13,2023)*

### Updates:

- This week I have started working on the scanning speed improvement of
[scancode
agent](https://github.com/fossology/fossology/tree/master/src/scancode).
- Explored different approaches for running ScanCode with varied parameters.
- Successfully integrated changes to execute ScanCode through its API.
- Discovered that the API execution is notably faster (13 seconds) compared to
the CLI (23 seconds).

### Conclusion and further plans:

- Refine the process of reading output from the Python script.
- Work on enhancing the integration with the database to ensure accurate updates.
- Also, to work on changes requested on CDX PR
[#2507](https://github.com/fossology/fossology/pull/2507)
31 changes: 31 additions & 0 deletions docs/2023/cyclonedx/updates/2023-07-20.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
title: Week 8
author: Sushant Kumar
---
<!--
SPDX-License-Identifier: CC-BY-SA-4.0

SPDX-FileCopyrightText: 2023 Sushant Kumar <[email protected]>
-->

*(July,20,2023)*

### Updates:

- This week, I was mainly working on the modifications requested by
mentors regarding Pull Request
[#2507](https://github.com/fossology/fossology/pull/2507).
- Major changes include:
- Made licenseRefs as license expressions, as CycloneDX schema does not
support SPDX LicenseRef as valid license identifier.
- Refactored the SPDX agent code, effectively eliminating the multiple
implementation of same functions being used in both CycloneDX and SPDX agents.
- Successfully resolved failing test cases within the SPDX agent for the pull
request.
- Added download option to download report from UI.


### Conclusion and further plans:

- In upcoming weeks, I will continue working on scancode agent improvement in
FOSSology.
33 changes: 33 additions & 0 deletions docs/2023/cyclonedx/updates/2023-07-27.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
---
title: Week 9
author: Sushant Kumar
---
<!--
SPDX-License-Identifier: CC-BY-SA-4.0

SPDX-FileCopyrightText: 2023 Sushant Kumar <[email protected]>
-->

*(July,27,2023)*

### Updates:

- During this week, my focus was on leveraging the [ScanCode
API](https://github.com/nexB/scancode-toolkit/blob/develop/src/scancode/api.py)
within the ScanCode agent to efficiently retrieve licenses and copyrights
information from files. The API has demonstrated faster results compared to
the command line interface (CLI).
- I've also made improvements in how the output from Python scripts, invoked by
the ScanCode agent, is processed and utilized.
- Notably, the current process lacks the inclusion of emails and URLs identified
in a file when invoking ScanCode via CLI. To address this, I have made changes
to add the missing information to the database for each file.
- A compilation of all the changes made this week and in preceding weeks
regarding scancode agent can be reviewed
[here](https://github.com/its-sushant/fossology/commit/649807f54f02453850c7043f53af7cea4c0fb250).


### Conclusion and further plans:

- In the coming weeks, I will continue to try different approaches to improve
the ScanCode agent in FOSSology.
35 changes: 35 additions & 0 deletions docs/2023/cyclonedx/updates/2023-08-03.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
---
title: Week 10
author: Sushant Kumar
---
<!--
SPDX-License-Identifier: CC-BY-SA-4.0

SPDX-FileCopyrightText: 2023 Sushant Kumar <[email protected]>
-->

*(August,03,2023)*

### Updates:

- Throughout this week, my primary focus remained on enhancing the ScanCode agent.
- A significant concern with the agent is its current practice of invoking
ScanCode through the command line interface (CLI) for each individual file,
leading to a file-by-file scanning process. Unfortunately, this resulted in a
considerable amount of time being spent on bootstrapping ScanCode for each
file.
- As a solution to this inefficiency, I explored a different approach. I
attempted to leverage the [ScanCode
API](https://github.com/nexB/scancode-toolkit/blob/develop/src/scancode/api.py)
to scan all files in a single call, consolidating the results into a unified
location, potentially a JSON file.
- The intended workflow involves storing the outcomes from the API call in a
centralized JSON file. Subsequently, the data extracted from the JSON results
will be efficiently populated into the database for each file during the upload
process.


### Conclusion and further plans:

- In the coming weeks, I will try to implement the aforementioned workflow in
FOSSology.
52 changes: 52 additions & 0 deletions docs/2023/cyclonedx/updates/2023-08-10.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
---
title: Week 11
author: Sushant Kumar
---
<!--
SPDX-License-Identifier: CC-BY-SA-4.0

SPDX-FileCopyrightText: 2023 Sushant Kumar <[email protected]>
-->

*(August,10,2023)*

### Updates:

- This past week, my focus was on implementing a method to simultaneously scan
all files, streamlining the scanning process.
- The following procedure was followed to achieve this objective:
- **File Location Retrieval:**
- Utilized the Fossology ScanCode agent to gather the file locations for
each individual file.
- Stored these file locations in a temporary text file.
- **Python Script Integration:**
- Passed the path of the temporary text file containing file locations to a
dedicated Python script which will scan result using scancode api.
- **Parallel Scanning Script:**
- Developed a Python script responsible for the concurrent scanning of
files.
- Employed a loop to iterate through each file location stored in the text
file.
- For each file location, invoked the ScanCode API to initiate scanning.
- Captured the resulting output and appended it to a JSON file.
- **Updating results to database:**
- Following the script's completion, extracted the data from the generated
JSON file.
- Leveraged the ScanCode agent to retrieve the data and subsequently saved
it to the Fossology database.
- **Clean-up Process:**
- Concluded the process by erasing both the temporary text file and the
generated JSON file.
- This strategic shift offers notable advantages:
- Drastically reducing the time spent on ScanCode's bootstrapping process.
- Optimizing the utilization of the ScanCode toolkit within the Fossology
framework.
- Raised a [pull request](https://github.com/fossology/fossology/pull/2569)
after making all these changes.

### Conclusion and further plans:

- In the coming weeks, I will start making my final report for final
evaluation.
- Will also work on this [PR](https://github.com/fossology/fossology/pull/2569),
if any changes are required.
Loading