Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend CLI for automated collaboration analyses #302

Open
nicolehoess opened this issue Jun 12, 2024 · 0 comments
Open

Extend CLI for automated collaboration analyses #302

nicolehoess opened this issue Jun 12, 2024 · 0 comments

Comments

@nicolehoess
Copy link
Collaborator

Additional CLIs for git entity parsing and network construction would be helpful to automate large-scale developer collaboration analyses.

Additional interfaces for git log entity parsing (including a parallelized version for speed-up) could be added to exec/git.R. The parallelized version should also offer the possibility to specify time windows explicitly by date for flexibility.

For developer collaboration analysis, two new interfaces for network construction (bipartite projection and temporal collaboration networks) could be added to exec/graph.R.

nicolehoess added a commit to nicolehoess/kaiaulu that referenced this issue Jun 12, 2024
Adds CLI to parse entities (functions, classes, etc.) from a previously
parsed gitlog for the entire timespan of this log.

Adds CLI to parse git entities from a previously parsed gitlog for
multiple time windows in parallel. Time windows can be configured by
explicitly defined dates or by the number of days (see configuration
example kaiaulu_analysis.yml).

Git interfaces also perform identity matching and file filtering as
specified in the configuration file.

Signed-off-by: Nicole Hoess <[email protected]>
nicolehoess added a commit to nicolehoess/kaiaulu that referenced this issue Jun 12, 2024
The merge() function's argument "sorted" results in an "unknown
argument" warning. Replace the "sorted" argument by "sort" to fix this.

Signed-off-by: Nicole Hoess <[email protected]>
nicolehoess added a commit to nicolehoess/kaiaulu that referenced this issue Jun 12, 2024
Adds interfaces to create bipartite projections and temporal
collaboration networks from a previously parsed gitlog or from gitlog
entities.

An additional configuration file is added to keep track of the CLI
parameter choices.

Signed-off-by: Nicole Hoess <[email protected]>
nicolehoess added a commit to nicolehoess/kaiaulu that referenced this issue Jun 13, 2024
Replace absolute path to git repository by relative path.

Signed-off-by: Nicole Hoess <[email protected]>
nicolehoess added a commit to nicolehoess/kaiaulu that referenced this issue Jun 17, 2024
Users may configure only a subset of possible filter option.

If a filter option was missing, it could corrupt the git log. For
instance, not specifying any file path substrings to remove
(remove_filepaths_containing) caused all substrings to be removed,
resulting in an empty git log.

Also, the commit size filter option is now respected.

Signed-off-by: Nicole Hoess <[email protected]>
nicolehoess added a commit to nicolehoess/kaiaulu that referenced this issue Jun 17, 2024
The CLI configuration (e.g. kaialu_cli.yml) now has a section for git
exec. This section allows to specify whether developer identities should
be matched or not. It also offers a configuration option to match
identities by names only.

Signed-off-by: Nicole Hoess <[email protected]>
nicolehoess added a commit to nicolehoess/kaiaulu that referenced this issue Jul 9, 2024
The author timestamp was accidentally overwritten by the committer
timestamp, causing the git log to be splitted according to the committer
timestamp instead of the author timestamp as suggested by the vignettes.

Also, make sure that the range boundary commits are included in both
ranges.

Signed-off-by: Nicole Hoess <[email protected]>
nicolehoess added a commit to nicolehoess/kaiaulu that referenced this issue Jul 9, 2024
In evolutionary analyses, users can generate time windows either based
on the author timestamp or the committer timestamp. Adds an option to
choose the desired timestamp.

Also removes the filtering options from the tabulation CLI, as we are
interested in the entire git log here.

Signed-off-by: Nicole Hoess <[email protected]>
nicolehoess added a commit to nicolehoess/kaiaulu that referenced this issue Jul 10, 2024
Identity matching in the git CLI was so far limited to author names and
e-mail adresses. Now, the committer names and e-mail-addresses are
matched as well.

Signed-off-by: Nicole Hoess <[email protected]>
nicolehoess added a commit to nicolehoess/kaiaulu that referenced this issue Jul 16, 2024
During entity analysis, we save an empty data frame in case no entities
were found in the respective time window. This indicates that a specific
range has not been skipped accidentally, but did not contain any changed
entities. Change the header of this data frame match the standard format
to facilitate subsequent analyses.

Signed-off-by: Nicole Hoess <[email protected]>
nicolehoess added a commit to nicolehoess/kaiaulu that referenced this issue Jul 16, 2024
Allows the CLI users to choose whether to include the time window
boundaries (start and end time) in the parallel entity analysis.

Allows the CLI users to choose which columns to include in identity
matching.

Signed-off-by: Nicole Hoess <[email protected]>
nicolehoess added a commit to nicolehoess/kaiaulu that referenced this issue Aug 7, 2024
In the project configuration file, we can specify options such as file
filters which can be applied to file-based or entity-based analysis
modes or both.

So far, the application of these options was hard-coded in both git
CLIs.

Now, users may specify the desired options and their application to
file-based and entity-based analysis modes separately in the CLI
configuration file. This gives users more flexibility in their analyses.

Signed-off-by: Nicole Hoess <[email protected]>
nicolehoess added a commit to nicolehoess/kaiaulu that referenced this issue Aug 7, 2024
Similar to the git CLI, users may want to choose different
configurations for file and entity network construction. Thus, add
separate options to the CLI configuration file.

Signed-off-by: Nicole Hoess <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant