Version 3.7 #190

bockthom · 2020-12-02T15:47:24Z

3.7

Added

Add a new file util-tensor.R containing the class FourthOrderTensor to create (author x relation x author x relation) tensors from a list of networks (with each network having a different relation) and its corresponding utility function get.author.networks.for.multiple.relations (PR Add tensor representation and EDCPTD centrality calculation #173, c136b1f, e4ee0dc, 051a5f0)
Add function calculate.EDCPTD.centrality for calculating the EDCPTD centrality for a fourth-order tensor in the above described form (c136b1f, e4ee0dc, 051a5f0)
Add new file util-networks-misc.R which contains miscellaneous functions for processing network data and creating and converting various kinds of adjacency matrices: get.author.names.from.networks, get.author.names.from.data, get.expanded.adjacency, get.expanded.adjacency.matrices, get.expanded.adjacency.matrices.cumulated, convert.adjacency.matrix.list.to.array (051a5f0)
Add tests for sliding-window functionality and make parameterized tests possible (a3ad0a8, 2ed84ac, PR Fix sliding-window creation in various situations & add tests for sliding-window functionality #184)
Add function cleanup.pasta.data to remove wrong commit hashes and message ids from the PaStA data (1797e03, PR Fix PaStA bugs and minor fixes #189)

Changed/Improved

Adjust the function get.authors.by.data.source: Rename its single parameter to data.sources and change the function so that it can extract the authors for multiple data sources at once. The default value of the parameter is a vector containing all the available data sources (commits, mails, issues) (051a5f0)
Adjust recommended R version to 3.6.3 in README (92be262)
Add R version 4.0 to test suite and adjust package installation in install.R to improve compatibility with Travis CI (40aa0d8, 1ba0367, R versions to support and which to cover in test suite #161)

Fixed

Fix sliding-window creation in various splitting functions (split.network.time.based, split.networks.time.based, split.data.time.based, split.data.activity.based, split.network.activity.based) and also fix the computation of overlapping ranges in the function construct.overlapping.ranges to make sure that the last and the second-last range do not cover the same range) (1abc1b8, c34c42a, 097cebc, 9a1b651, 0fc179e, cad28bf, 7602af2, PR Fix sliding-window creation in various situations & add tests for sliding-window functionality #184)
Fix off-by-1 error in the function get.data.cut.to.same.date (f0744c0)
Fix missing or wrongly set layout when plotting networks ("How-to" example in the README.md currently leads to an error when plotting the created network #186, 720cc7b, 877931b)
Fix reading of the PaStA data since the file format has changed (712bbaf, PR Fix PaStA bugs and minor fixes #189)
Fix bug that duplicates revision set ids in the mail and commit data when merging the PaStA data and also copy-paste error when merging PaStA data to commit data (1797e03, PR Fix PaStA bugs and minor fixes #189)
Fix bug that results in an error when there is a variable called 'c' in the R environment (de42eb2, PR Fix PaStA bugs and minor fixes #189)
Fix bug that when applying filter.patchstack.mails() to an environment with no mail data, the mail data gets set to NULL (8261475, PR Fix PaStA bugs and minor fixes #189)

Methods that enable to calculate the EDCPTD centrality for a project are added to a provisional file 'util-tensor.R'. This include methods that retrive the needed single-layer developer networks, a method that builds a forth-order tensor using this single-layer networks and e method that calculates the EDCPTD centrality using this tensor. Signed-off-by: Anselm Fehnker <[email protected]>

@clhunsen

The process of how EDCPTD centrality is computed is changed. In the first step the author network for each relation is build. In the second step the author networks are converted to a forth-order tensor and in the third step EDCPTD centrality is computed. Changes are based on the reviews from @clhunsen, @bockthom and @ecklbarb. Signed-off-by: Signed-off-by: Anselm Fehnker <[email protected]>

The use of the library rTensor is added to the 'README.md' file. Signed-off-by: Anselm Fehnker <[email protected]>

@bockthom

Apply the Review from @bockthom and @clhunsen on the previous changes. This includes compliance of coding conventions, update of copyright headers and improvement of documentation. Move the functions for 'get.author.names.from.networks' and 'get.expanded.adjacency' to new file 'util-networks-misc.R'. Also add two functions 'get.author.names.from.data' and 'convert.adjacency.matrix.list.to.array' from the 'dev-network-growth' project to the new file. Signed-off-by: Anselm Fehnker <[email protected]>

The changelog file 'News.md' is adjusted. The section '## Unversioned' is added above the latest release version. The new features added in this Pull-Request are described in the subscetions '### Added' and '### Changed/Improved'. Signed-off-by: Signed-off-by: Anselm Fehnker <[email protected]>

Add tensor representation and EDCPTD centrality calculation Reviewed-by: Thomas Bock <[email protected]> Reviewed-by: Claus Hunsen <[email protected]>

@clhunsen

Travis builds currently fail (for a few weeks) due to two different reasons: On the one hand, the log output exceeds the maximum log length of travis, resulting in failing builds. This is prevented by setting the parameters `verbose` and `quiet` to `TRUE`, as then the package compilation log is not printed to the log but the result of package installation is clearly perceptible in the log. On the other hand, travis installs lots of packages which are not necessary as they are just suggestions of other packages but not necessary dependencies. Installing such unnecessary packages my lead to problems. This is circumvented by setting the `dependencies` parameter to `NA` as this results in installing only necessary dependencies and imports, whereas the former `TRUE` value of this parameter also led to installing suggested but unnecessary packages. Those two fixes hopefully makes travis builds successful again. Props to @clhunsen for experimenting with the parameters and proposing the first part of this commit regarding the `verbose` and `quiet` parameters of the `install.packages` function. This commit addresses parts of #161. Signed-off-by: Thomas Bock <[email protected]>

This is related to issue #161. Signed-off-by: Thomas Bock <[email protected]>

As R version 3.3.1 is quite outdated, update the corresponding statement in our README and recommend R version 3.6.3. Signed-off-by: Thomas Bock <[email protected]>

Signed-off-by: Thomas Bock <[email protected]>

Travis fixes Merged-by: Christian Hechtl <[email protected]>

The repository of coronet was transfered from the se-passau organization to the se-sic organization. Therefore, links in README.md and CONTRIBUTING.md are also updated. Signed-off-by: Thomas Bock <[email protected]>

When computing overlapping ranges, sometimes we ended up in having one more range in the end than needed. This led to the having ranges covering almost the same time span. Consider the following example: When constructing overlapping ranges of three month ranges which overlap fifty-fifty, we might end up in the following scenario: 2020-06-27 ——— 2020-09-26 — 2020-11-11 (2020-12-25) 2020-08-12 ————————— 2020-11-11 There is one range from 2020-06-27 to 2020-09-26, one range from 2020-08-12 to 2020-11-11, and one range from 2020-09-26 to 2020-12-25. However, when assuming that the last action in the data takes place on 2020-11-11, then we would have two ranges covering the last range: one complete ranges and half a range: 2020-08-12 to 2020-11-11 and 2020-09-26 to 2020-11-11. As can be clearly seen (and also above in the visual example), the range 2020-09-26 to 2020-11-11 is superfluous as it does not even represent a full range and it fully overlapps with the previous range. Hence we don't need this half range and can just end up with the following ranges: 2020-06-27 ——— 2020-09-26 2020-08-12 ————————— 2020-11-11 To achieve this, determine the number of complete ranges by rounding downwards instead of rounding to the nearest integer. However, if the complete time span does not even include one complete range, set the number of complete ranges to 1 because multiplying with 0 further down in the function will end up having no range at all (even not an incomplete one). However, the implementation of this corner case works correctly as there already exist some extra checks for this special case. Signed-off-by: Thomas Bock <[email protected]>

The function `get.data.cut.to.same.date` cuts the data of multiple data sources to a common date range for all the specified data sources. However, in the previous implementation, there was an off-by-one error, caused by excluding end dates in data splitting: If a range is specified, the end date of the range is exclusive (while the start of the range is inclusive). As the data cutting functionality uses the split functionality, the specified end date was excluded whereas it should be included, as the following example shows: Assume there are two data sources d1 and d2. Let d1 contain the time stamps t1 and t4, and let d2 contain the time stamps t2 and t3, with t1 < t2 < t3 < t4. Then a visualization of this example on a time line could be as follows: d1: | | d2: | | t1 t2 t3 t4 According to the proceedure of data cutting, we look for a common data range. It can be easily seen that the interval [t2,t3] completely covers data from both data sources d1 and d2. Hence, all the data needs to be cut to this interval [t2,t3]. However, what the former implementation did was the following: t2 and t3 are determined as start and end of the range the data has to be cut to. As the cutting functionality uses the splitting functionality, t2 is used as start of the range and t3 as used as the end of the range. In here the error comes in: As the end of ranges is exclusive, t3 is not included in the cut data, resulting in interval [t2, t3). To include t3 into the cut data, we need to use t3+1 as the end of the range to cut, as this leads to [t2,t3] then. In this commit, this off-by-one error at the end of the data cutting range is fixed by adding 1 second to the end. In addition, all the corresponding tests in the test suite (i.e., data cutting and network cutting) are adjusted to incorporate the end of the cutting range. Signed-off-by: Thomas Bock <[email protected]>

@clhunsen

When calling the function `split.networks.time.based` (which splits several networks at the same time), the `sliding.window` parameter was useless: This function, in turn, calls `split.network.time.based` for each network, specifying fixed bins and passing the `sliding.window` parameter to it. However, if bins are given to this function, the `sliding.window` parameter is ignored. To fix this, create the overlapping ranges need to get sliding windows and call `split.network.time.based.by.ranges` to get the networks split using sliding windows. With this fix, it is possible to use sliding windows when splitting multiple networks simultaneously. Props to @clhunsen for suggesting to use `construct.overlapping.ranges` and `split.network.time.based.by.ranges` for this fix. Signed-off-by: Thomas Bock <[email protected]> Signed-off-by: Claus Hunsen <[email protected]>

Create the sliding windows in `split.network.time.based` exactly in the same way as in `split.networks.time.based`. Signed-off-by: Thomas Bock <[email protected]>

Signed-off-by: Thomas Bock <[email protected]>

For some of the functions in the split module, some default values of parameters have been missing in the function documentation. This commit fixes that. Signed-off-by: Thomas Bock <[email protected]>

Create the sliding windows in `split.data.time.based` exactly in the same way as in `split.network.time.based` and `split.networks.time.based`. In addition, also stream line the code in `split.data.time.based` (e.g., don't execute steps that are only needed when no sliding windows are used). Signed-off-by: Thomas Bock <[email protected]>

Create the sliding windows in `split.data.activity.based` in a similar way as in `split.data.time.based`. In particular, fix some bugs in the sliding-window creation for activity-based data splitting and handle some special corner cases with respect to the last range. Signed-off-by: Thomas Bock <[email protected]>

Create the sliding windows in `split.network.activity.based` in a similar way as in `split.data.activity.based`. In particular, fix some bugs in the sliding-window creation for activity-based network splitting and handle some special corner cases with respect to the last range. Signed-off-by: Thomas Bock <[email protected]>

Add a completely new test module 'test-split-sliding-window.R', which is based on the exisiting test module 'test-split.R'. The new test module contains almost the same test setups as the exisiting test module, but it tests all the splitting functions only using sliding windows. In addition, some new test scenarios are added to test certain corner cases with respect to the last range of sliding-window splitting. In particular, the following functions are tested with sliding windows: - split.data.time.based - split.data.activity.based - split.network.time.based - split.network.activity.based Signed-off-by: Thomas Bock <[email protected]>

In this commit, additional tests for sliding-window splitting are added. For example, tests for the function 'split.networks.time.based'. As these tests don't need their expected outcomes to be determined manually, but rely on the outcomes of other, already tested functions, it is not necessary to completely copy existing tests and have them duplicated for different values of `sliding.window`. Hence, this is a good example for having parameterized tests: Parameterized tests should succeed independently of which value certain parameters take. Hence, we can just implement the test once and run it twice, for example, the first run with `sliding.window = TRUE` and the second run with `spliding.window = FALSE`. However, es our standard test library 'testthat' does not provide the possiblity to run parameterized tests, we need to add an additiona library to be able to run parameterized tests: the library 'patrick'. For more details on 'patrick', see: https://github.com/google/patrick Signed-off-by: Thomas Bock <[email protected]>

For unknown reasons, setting the layout for ggraph, which is used for plotting networks, has changed. To avoid errors due to a missing or wrongly set layout, the corresponding implementation in function 'plot.get.plot.for.network' is adjusted in to ways: 1) When creating a ggraph object, we already need to explicitly specify the layout we want to use. As we rely on igraph and its layouts, we use 'layout = "igraph"' and then we have to specify the concrete layout algorithm from igraph (as 'algorithm' paramater). 2) Opposed to previous layout specification, now a layout name has to be stated instead of naming a layout function. Unfortunately, there is no official list of supported igraph layouts. Hence, we need to rely on the following table (which comes from python and not from R): https://igraph.org/python/doc/tutorial/tutorial.html#layout-algorithms Notice that the layout algorithm needs to be given as string in form of the 'short name' given in this table. However, if there are multiple short names specified in the table, only one of them might actually work for ggraph. Now as before, the default layout is 'kk' (kamada kawai). If one wants to use another igraph layout, just set the graph attribute 'layout' to the short name of the layout and then this one will be used for creating the ggraph object. This commit fixes issue #186. Signed-off-by: Thomas Bock <[email protected]>

Signed-off-by: Thomas Bock <[email protected]>

In the function 'convert.adjacency.matrix.list.to.array', there was a typo in a variable name which led to an error when calling the function. Signed-off-by: Thomas Bock <[email protected]>

Add more documentation to activity-based data splitting annd add a `unique` call to make sure that there is only one date even if there could be several identical dates in `end.event.date`. In addition, fix some minor inconsistencies. Signed-off-by: Thomas Bock <[email protected]>

This is a follow-up commit for commit d9e604e. In more recent package versions of igraph and ggraph, there is no layout "igraph". Instead, the layout algorithm has to be set directly to the "layout" parameter. This fixes #186. Signed-off-by: Claus Hunsen <[email protected]> Signed-off-by: Thomas Bock <[email protected]>

Signed-off-by: Thomas Bock <[email protected]>

Fix sliding-window creation in various situations & add tests for sliding-window functionality Reviewed-by: Christian Hechtl <[email protected]> Reviewed-by: Claus Hunsen <[email protected]>

The PaStA data is now called 'patch-groups'. On top of that the format of the file changed to include all unmatched commits. This leads to a lot of log outputs, which is why we delete the logwarn. Signed-off-by: Christian Hechtl <[email protected]>

Calling 'do.call(c, ...)' leads to an error if there is any variable called 'c' in the R environment. This is why we change it to 'base::c'. Signed-off-by: Christian Hechtl <[email protected]>

First, fix the copy-paste error where the existence of mail data was checked when adding the PaStA data to the mail data. Second, fix the duplication of revision.set.ids that happens when merging the PaStA data to the commit or mail data. Last, throw out PaStA items that contain message ids or commit hashes that do not appear in the mail or commit data. Signed-off-by: Christian Hechtl <[email protected]>

@clhunsen

As the edge attributes and node attributes `edge.type.char` and `node.type.char` do not exist anymore, remove the outdated '.char' suffixes. Then the corresponding plotting examples should work again. Props to @clhunsen for pointing this out! Signed-off-by: Thomas Bock <[email protected]> Signed-off-by: Claus Hunsen <[email protected]>

Signed-off-by: Christian Hechtl <[email protected]>

If there are no mails (e.g., after splitting the data to a certain range), the function `filter.patchstack.mails()` prints warnings and sets the mail data to NULL, whereas the mail data should stay as created by `create.empty.mails.list()`. When then accessing certain columns of the mail data, this leads to an error. To prevent this, don't perform the patchstack mail filtering if there are no mails - just return the mail data as they have been before. Signed-off-by: Thomas Bock <[email protected]> Signed-off-by: Christian Hechtl <[email protected]>

Signed-off-by: Christian Hechtl <[email protected]>

Fix PaStA bugs and minor fixes Reviewed-by: Thomas Bock <[email protected]>

Signed-off-by: Thomas Bock <[email protected]>

bockthom · 2020-12-02T15:52:44Z

As everything has been reviewed already, I will merge right away.

Notice that the TravisCI tests do currently not run here due to the changed pricing model of TravisCI.
As compensation, I have run TravisCI tests for the latest commit af4eaa6 on my own fork (luckily, I currently still have a few free credits of TravisCI on my personal account) and all tests pass there: https://github.com/bockthom/coronet/runs/1487219481

Anselm Fehnker and others added 30 commits September 23, 2020 18:28

Add rTensor to 'README.md'

7acf76e

The use of the library rTensor is added to the 'README.md' file. Signed-off-by: Anselm Fehnker <[email protected]>

Merge pull request #173 from fehnkera/tensor

5e3baa8

Add tensor representation and EDCPTD centrality calculation Reviewed-by: Thomas Bock <[email protected]> Reviewed-by: Claus Hunsen <[email protected]>

Add R version 4.0 to test suite

40aa0d8

This is related to issue #161. Signed-off-by: Thomas Bock <[email protected]>

Adjust recommended R version in README

92be262

As R version 3.3.1 is quite outdated, update the corresponding statement in our README and recommend R version 3.6.3. Signed-off-by: Thomas Bock <[email protected]>

Update changelog

df367d2

Signed-off-by: Thomas Bock <[email protected]>

Merge pull request #183 from bockthom/travis-fixes

bd1ab32

Travis fixes Merged-by: Christian Hechtl <[email protected]>

Transfer repository from se-passau to se-sic

f8c7cd2

The repository of coronet was transfered from the se-passau organization to the se-sic organization. Therefore, links in README.md and CONTRIBUTING.md are also updated. Signed-off-by: Thomas Bock <[email protected]>

Adjust sliding-window creation in 'split.network.time.based'

a3a8e6d

Create the sliding windows in `split.network.time.based` exactly in the same way as in `split.networks.time.based`. Signed-off-by: Thomas Bock <[email protected]>

Fix wrong logging statements in 'split.network.by.bins'

820f850

Signed-off-by: Thomas Bock <[email protected]>

Add missing default values to function documentations in util-split.R

606963a

For some of the functions in the split module, some default values of parameters have been missing in the function documentation. This commit fixes that. Signed-off-by: Thomas Bock <[email protected]>

Fix minor issues from review in PR #184

f0b3c10

Signed-off-by: Thomas Bock <[email protected]>

Update changelog file

1ba5551

Signed-off-by: Thomas Bock <[email protected]>

Fix typo in variable name 'adjacency.list'

63382b4

In the function 'convert.adjacency.matrix.list.to.array', there was a typo in a variable name which led to an error when calling the function. Signed-off-by: Thomas Bock <[email protected]>

Update changelog file again

4afe8b8

Signed-off-by: Thomas Bock <[email protected]>

hechtlC and others added 11 commits December 1, 2020 13:16

Merge pull request #184 from bockthom/thomas-updates

78e99a1

Fix sliding-window creation in various situations & add tests for sliding-window functionality Reviewed-by: Christian Hechtl <[email protected]> Reviewed-by: Claus Hunsen <[email protected]>

Fix bug with using 'do.call(c, ...)'

de42eb2

Calling 'do.call(c, ...)' leads to an error if there is any variable called 'c' in the R environment. This is why we change it to 'base::c'. Signed-off-by: Christian Hechtl <[email protected]>

Add copyright headers

9cd35a7

Signed-off-by: Christian Hechtl <[email protected]>

Update changelog file

474ed76

Signed-off-by: Christian Hechtl <[email protected]>

Update changelog file again

cd40097

Signed-off-by: Christian Hechtl <[email protected]>

Merge pull request #189 from hechtlC/master

bb30e20

Fix PaStA bugs and minor fixes Reviewed-by: Thomas Bock <[email protected]>

Version v3.7

af4eaa6

Signed-off-by: Thomas Bock <[email protected]>

bockthom added the versioning label Dec 2, 2020

bockthom added this to the v3.7 milestone Dec 2, 2020

bockthom merged commit a7f4123 into master Dec 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version 3.7 #190

Version 3.7 #190

bockthom commented Dec 2, 2020

bockthom commented Dec 2, 2020 •

edited

Loading

Version 3.7 #190

Version 3.7 #190

Conversation

bockthom commented Dec 2, 2020

3.7

Added

Changed/Improved

Fixed

bockthom commented Dec 2, 2020 • edited Loading

bockthom commented Dec 2, 2020 •

edited

Loading