-
Notifications
You must be signed in to change notification settings - Fork 200
MeetingMinutes2021
We meet online on Tuesday at 16:00 UTC as a reference. See https://www.timeanddate.com/worldclock/meeting.html to get the time in your timezone.
Join us at https://meet.jit.si/VulnerableCode
The current meeting notes is at:
Here are the running meeting notes:
Agenda:
- Improver review
- Speeding up the importer-improver structure migration
- GitHub externships
Participants:
- Philippe (@pombredanne)
- Hritik (@hritik14)
Quick reference to review is available at https://gist.github.com/Hritik14/d02a2c24a50e0afcaa219cc4bf8abef9
Hritik:
With the framework for importers nearly ready in this structure we can move forward with rewriting other importers like nvd, ubuntu etc. We can also, in parallel, start development for OvalDataSource and GitDataSource
Philippe:
Mail sent to github for inquiry. We'll try to participate as an org.
Agenda:
- Review improvers
- Hacktoberfest twitter VulnerableCode logo
- Univers
- GSoC guest invite - Why do open source
Participants:
- Philippe (@pombredanne)
- Hritik (@hritik14)
Do we want OSV Design in the database as well ? How to solve nginx multiple branch problem No, we'll carry on with the current model and use the qualifiers in PackageURL to specify the branches, in that way a package in a different branch will essentially be a different package and be displayed accordingly as a fix for that branch type.
More detailed info at https://github.com/nexB/univers/blob/386eb32468c75ecac25ec872ea004b3257962946/VERSION-RANGE-SPEC.rst
Invitation accepted: https://www.youtube.com/watch?v=VNL8eO6Phj8
Agenda:
- Hacktoberfest
- VulnerableCode
- Univers
Participants:
- Philippe (@pombredanne)
- Hritik (@hritik14)
Hritik and Tushar will be setting up the ground for hacktoberfest. Looking for good-first-issues is mostly the task.
Nothing significant has changed since the last call. Hritik will carry on with the importer-improver framework.
Philippe is refactoring the codebase. Commits yet to be pushed.
Agenda:
- Preview RTD before merging (local RTD via docker)
- Merging new importer-improver model
Participants:
- Philippe (@pombredanne)
- Hritik (@hritik14)
Hritik is going on a 2 week holiday thus the new model needs to be merged ASAP. We decided to achieve a checkpoint by today and push the latest version for Philippe to improve upon for the time being.
Pinged Ayan to look into it
Agenda:
- Review improvers
- Advisory data structure
- Support http proxies,
ticket <https://github.com/nexB/vulnerablecode/issues/559>
_ - Preview RTD before merging (local RTD via docker)
Participants:
- Hritik (@hritik14)
- Philippe (@pombredanne)
Batch processing of advisories needs to be avoided. Philippe:
each Improver has a method to process a single Advisory model instance such as Improver.get_inferences(self, advisory): -> Inference or something like that.
the framework then iterates on an Improver-provided query set such as Improver.get_interesting_advisories(self): -> QuerySet that has the Advisories it is interested in.
in the framework, there is an atomic transaction that updates both the Advisory (e.g. date and later a log of improvements with select for update) and whatever is updated or create from the Inference
More: https://github.com/nexB/vulnerablecode/pull/525#issuecomment-921722348
Also, some basic refactoring was discussed.
This is not our prime objective right now. We could deal with it after a brief thought. Let's take this up in next session as well.
Agenda:
- Nginx version notations, ticket: https://github.com/nexB/vulnerablecode/issues/553
- Review improvers
- VULCOIDS, ticket: https://github.com/nexB/vulnerablecode/issues/552
- Naming: VersionSpecifer and VersionRange, ticket: https://github.com/nexB/univers/issues/8
- Preview RTD before merging (local RTD via docker)
Participants:
- Hritik (@hritik14)
- Philippe (@pombredanne)
Hritik contacted Nginx over their mailing list to clarify their version notations like 1.21.0+
. We got the following reply:
The 1.21.0+ notation means "1.21.0 and newer", or, more
formally, "1.21.0 and derived versions". This includes all
future nginx versions on the mainline branch, and all future
stable branches (which aren't yet created).
More: https://mailman.nginx.org/pipermail/nginx/2021-September/061039.html
Further, according to Nginx <https://www.nginx.com/blog/nginx-1-16-1-17-released/>
_
- Mainline is the active development branch where the latest features and bug fixes get added. It is denoted by an odd number in the second part of the version number, for example 1.17.
- Stable receives fixes for high‑severity bugs, but is not updated with new features. It is denoted by an even number in the second part of the version number, for example 1.16.0.
These information need to be accounted for in the nginx importer.
The function to check if there is an existing VULCOID
for an advisory with an alloted CVE is missing. We need to implement that i.e. effectively move VULCOID
to from vulnerability_id
to old_vulnerability_id
.
The currently named VersionRange
should be renamed to VersionConstraint
and a VersionRange
should be implemented to represent an actual range with an upper and lower bounds. The canonical string representation for this needs some discussion and an RFC needs to be drafted for the same. The current proposed representations are in the ticket <https://github.com/nexB/univers/issues/8>
_.
Agenda:
- Review improvers
- Preview RTD before merging (local RTD via docker)
Reviewed at: https://github.com/nexB/vulnerablecode/pull/525#pullrequestreview-745189344
Agenda:
- Review improvers
- Consider adopting the OSV API as alternative output, ticket: https://github.com/nexB/vulnerablecode/issues/540
- drf-spectacular vs redoc, ticket: https://github.com/nexB/vulnerablecode/pull/542
- Decide on a uniform regular time for meetings
- Preview RTD before merging (local RTD via docker)
Partial review done, more to come next day
Marked as low priority. Let's do it at some later stage.
redoc licensing is a huge mess due to the webpack (for eg: some licenses are not mentioned or enforced properly, http://tartarus.org/~martin/PorterStemmer/js.txt, feross). A detailed review needs to be done and all licenses should be mentioned and enforced explicitly. We should create a ticket about all unknown/unenforced packges and propagate that to upstream. Ticket: https://github.com/nexB/vulnerablecode/issues/549
Phlippe is comfortable with - before 3pm CET on Tuesday
Agenda:
- Decide on the structure of improvers
Participants:
- @Hritik14
- @pombredanne
- @sbs2001
The importers would now directly insert the Advisories into the database in the following format. Creating database relationships and populating Vulnerability
, Packages
, PackageRelatedVulnerabiliy
etc would be the job of a default improver.
class Advisory(models.Model):
date_published = models.DateField()
date_collected = models.DateField()
source = models.CharField()
improved_on = models.DateTimeField()
# data would contain a data_source.Advisory
data = JSONField()
@dataclasses.dataclass(order=True)
class AdvisoryData:
summary: str
vulnerability_id: Optional[str] = None
fix_packages: List[AffectedPackage] = dataclasses.field(default_factory=list)
affected_packages: List[AffectedPackage] = dataclasses.field(default_factory=list)
references: List[Reference] = dataclasses.field(default_factory=list)
@dataclasses.dataclass(order=True, frozen=True)
class AffectedPackage:
# this package MUST NOT have a version
package: PackageURL
# the version specifier contains the version scheme as is: semver:>=1,3,4
version_specifier: VersionSpecifier
There would be two kinds of improvers - Default improver - Populates the tables and creates concrete relationships mentioned above - Inference generating improvers - These improvers take metrics like time travel and create new relationships with respective confidence scores.
The Advisory model above must never be written by an improver (as of now). It would be a true upstream data that is populated by the importers and used by the improvers to generate appropriate relationships.
Agenda:
- Deployment
- Importer restructure
- Hand written migrations (not generated by
makemigrations
, eg: package) - Dumping import_yielder
- Any update on nix tests?
Let's take this next week
PackageRelatedVulnerability
- Have data source - could be a text field of one message per line or a JSON field with list of strings showing where it came from
- if a new information comes, add that information and then change that confidence, if a new advisory comes which is similar to an improvement then overwrite the improvement confidence to max
- add confidence here
- Higher confidence overrides lower confidence
Properly comment the fields in a dataclass
Not hand written, find in from packageurl.contrib.django.models import PackageURLMixin
- Simply store data source as concrete lists
- Don't store configuration or last run, maybe we can have better
Log Importer Runs
later. Ticket: https://github.com/nexB/vulnerablecode/issues/526
No
Participants:
- @pombredanne
- @Hritik14
- @Divya
Agenda:
- Refactor / Redesign importers to strictly import
- Dump set_api from DataSource
- Knowing all the existing versions is not necessary to collect Advisories
- Strictly no inference
- Revisit updated_advisories vs added_advisories
- Consistent naming for fixed / vulnerable packages. No other package type should exist in a DataSource
- Should a container be used instead of
make postgres
in scancodeIO ? - https://github.com/nexB/scancode.io/blob/4bebd7ae88ecaa00eb526bd831530c497903faf8/.github/workflows/ci.yml#L15-L28 - Should Dockerfiles create virtualenvs as well ?
- ScancodeIO uses
FROM python:3.9
in Dockerflie - Docker upstream says: This is the defacto image. If you are unsure about what your needs are, you probably want to use this one. It is designed to be used both as a throw away container (mount your source code and start the container to start your app), as well as the base to build other images off of. - Docker upstream also recommends not to use-slim
version unless there are space constraints - Docker images are reusable - Using a-slim
requires to explicitly installbuild-essential
,libpq-dev
,git
and in futuresvn
- Most of the aforementioned packages come preinstalled in buster - Drastically increases Docker build time as now our image doesn't reuse already installed PYTHON image
Dump set_api from DataSource: Yes. Separate them. We cannot not import a vulnerability when we don't have version info. We need advisories however we get. Clean organization of code in separate modules/functions should be there.
Dump this. We can have data related to importer inside importers themselves. Problems with import_yielder
:
- Doesn't make sure importer class is actually a class
- No type checking
- This is a setting thing. It could appear as a Django setting (perhaps), which would enable/activate importers with fully qualified class names with module prefixes. Let's discuss this later in detail.
- Do not return sets in both of the functions (TBD at later stage)
- This should be handled by the framework before doing an update to the advisory. Something that could exist at the insertion time of importer (process_advisory).
- Incremental update should be attribute of an specific importer which has that capability.
- Importers have to return via one single function with the Advisory with some marks if they are new updated advisories. Importers with incremental supports will have different frequency of run, different entry points etc
Inspired from https://tinyurl.com/vuln-json we'll use affects and fixed.
Let's create an issue about control of versions (here postgres version) in both places. Then fix them at a later stage with similar fixes.
Yes. Even ScancodeIO should do that. TODO: Create ticket : https://github.com/nexB/scancode.io/blob/4bebd7ae88ecaa00eb526bd831530c497903faf8/Dockerfile#L52
In CI tests: Be explicit- 3.8.11-buster
In Docker: Let's support 3.8
A test for docker-build and run
Participants:
- @pombredanne
- @Hritik14
Minutes:
- Revert virtualenv caching in GitHub actions
- We don't want any surprises
- Ticket: https://github.com/nexB/vulnerablecode/issues/515
- Introduce labels in Dockerfile
- We need to decide on which labels would be there.
- Needs to be done in all AbouteCode projects, currently verified absence in VulnerableCode, ScancodeIO
- Ticket: https://github.com/nexB/vulnerablecode/issues/513
- TravisCI
- Get rid of
webhooks
- Get rid of TravisCI totally
- This is redundant after https://github.com/nexB/vulnerablecode/pull/295
- Get rid of
- Decide on python versions VulnerableCode supports.
- Currently move forward with
3.8.11-slim-buster
- Ticket: https://github.com/nexB/vulnerablecode/issues/512
- Currently move forward with
Participants:
- @pombredanne
- @Hritik14
- @sbs2001
- @Divya
The proposed agenda is:
- security review status
- server / infrastructure
- SVN for git tags: which approach or dep using
- Zip file in testcases: https://github.com/nexB/vulnerablecode/pull/393#discussion_r668006209
- Add API rate limiting and auth https://github.com/nexB/vulnerablecode/issues/460
- Revert time traveling - how to go about it ?
- In flight PR merge status
- nix tests are failing: We should ping for help nix folks
- All tickets are entered
- Delayed for now, will deploy on provisioned server. Philippe to work on it sometimes between this and next week
- No more Django migration reset
- using `svn --xml ls /tags/ should be enough
These take a lot of space and cannot be diffed. Text files are better. See https://github.com/nexB/vulnerablecode/pull/393#discussion_r668006209 The code in rust and npm importers likely refactoring to use a different approach, but feel free to refactor.
Conclusion: tests and code refactoring may make the zip file usage moot.
- https://github.com/nexB/vulnerablecode/issues/460
- Using the DRF AnonRateThrottle should be good enough for a start
Definition: finding the set of versions that existed for a package at a point in the past when the advisory was published.
-
Advisory without correct/actionable package/version info
- goes into a review queue, or error log
- especially if we are missing which version of a package the bug is fixed in
-
Advisory with well defined set of impacted and fixed packages
- there no problem and we should import this alright
-
Advisory with only one of impacted or fixed packages, both well defined (e.g. no open ranges)
-
we should import this alright
-
Case1: only fix and no impacted:
- do we want to time travel to find the version(s) that were impacted just before this?
- there could be version marked as impacted and that's the one prior to the fixed version This may be often ambiguous... and this is inferred: should store this stating this is inferred/concluded and not a hard fact and we should have a log or queue entry so this issue can be revisited.
- may be best done outside of importers proper, as some batch inference job See https://github.com/nexB/vulnerablecode/issues/500
-
Case2: only impacted and no fix:
- we should import this alright, this is normal
- we should "time travel" and create relationships with the set of versions that matched the impacted closed/well defined range at
-
it is OK to have no fix
-
it is OK to have knowledge of when the bug was introduced:
- do we want to time travel to fin
-
-
Advisory with impacted or fixed packages using open ranges
- If we have concrete enumerated exact versions, then we can store Vuln<>Package relationship as a fact.
- Otherwise this is some version range and its resolution is always an inference.
- With time travel we provide a high degree of confidence in the inference, other the inference is weak.
- All inferences may need some level of review (guided the confidence of the inference)
- Each importer should have an explicit separation between the code to import and the code to resolve version ranges (and time travel).
-
review usage of
def test_foo(self, _)
with an underscaore in tests See https://github.com/nexB/vulnerablecode/blob/05fcc64f2b48b186b7f742a2adf869b82293b024/vulnerabilities/tests/test_npm.py#L88 -
review #nopep8 and #NOQA tags
-
the npm importer design could be improved
-
we should add template for Github issues
Participants:
- @pombredanne
- @Hritik14
The proposed agenda is:
- documentation of new importers
- alpine package versions
- sorting imports
- issue #494
These two items were not discussed:
- security review status
- server and infra
- we should progressively adopt the same documentation structure as scancode.io Docker files: we should progressively copy the same structure and approach as in scancode.io
- Also README.rst needs to be beefed up and
- add license to alpine advisory
- diff yaml and JSON from alpine secdb
- consider using alpine distro main/arch/version in alpine purls
- switch to using json fo alpine secbd instead of YAML
- Most of https://github.com/nexB/vulnerablecode/pull/436 need to be revisited as we are now losing data
- https://github.com/nexB/vulnerablecode/pull/449
- create ticket to refactor importer_yielder.py
- revisit time travel. It should not be part of importer but something that is about data improvement, inference and refinements
- and something that is about data improvement, inference and refinements should be a separate tool
Let's close the ones that are not from Hritik14
We pinged:
- https://github.com/nexB/vulnerablecode/pull/391 : will need to be closed or finished by us
- https://github.com/nexB/vulnerablecode/pull/408 : pinged
- https://github.com/nexB/vulnerablecode/pull/400 : pinged
- https://github.com/nexB/vulnerablecode/pull/464 : pinged
This is also related to https://github.com/nexB/vulnerablecode/pull/436 Eventually we need a common way to obtain a range of version for a given importer (or rather for job that's fixing data after imports)
-
Isn't it a standard linux convention to have local configs take precedence over ones in /etc ? https://github.com/nexB/scancode.io/blob/a249402fbbdebd84e0f298d15cb4c3702bf41e63/scancodeio/settings/base.py#L33-L35 appears to do it in reverse
-
sorting imports should be done only when we have no or few in flight PRs
-
commit messages are not informative at all and we should make this explicit Possibly we should use the same approach as the kernel and reject commits without a message body There may be a bot for this
-
https://xenproject.org/developers/security-policy/ license is not clear
-
updated_advisories should be the same API for every importer and it is a mess now
Participants:
- Shivam @sbs2001
- Philippe @pombredanne
Agenda:
- Code review of https://github.com/nexB/vulnerablecode/pulls/467 (Shivam)
- Integration for significant volume of vulnerability lookups (Shivam)
- Getting to clean slate to avoid upcoming merge conflicts (Philippe)
code review of https://github.com/nexB/vulnerablecode/pulls/467
"Time travel" is to find about the range of affected package versions "at the time" of publication of an advisory.
Follow ups:
- https://github.com/nexB/vulnerablecode/issues/481
- https://github.com/nexB/vulnerablecode/issues/482
- https://github.com/nexB/vulnerablecode/issues/483
- https://github.com/nexB/vulnerablecode/issues/484
To integrate VulnerableCode (VC) in ScanCode.io and other tools that can make intensive lookups on may packages, we would need to have a way to efficiently partially mirror VC data though API and reuse the VC models on the client application side (when based on the same stack like it is the case with ScanCode.io). Other things to consider:
- using pub/sub or webhooks?
We need an issue for this. Design is needed
- was not discussed.