Skip to content

Commit

Permalink
chore(docs): Add docs about scoring
Browse files Browse the repository at this point in the history
  • Loading branch information
rhiaro committed May 24, 2024
1 parent 9e5a0e3 commit 486f4e4
Showing 1 changed file with 13 additions and 0 deletions.
13 changes: 13 additions & 0 deletions docs/howto.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,19 @@ You can adjust some thresholds based on the accuracy and completeness of the dat

### Scoring

The tool first compares all pairs nodes in the two networks which are within the [node match radius](#settings) of each other. It then updates the span data to use the consolidated nodes, and then compares all pairs of spans which have the same start and end nodes.

The **overall confidence score** of the similarity between two features is generated by comparing the values of each field of each feature. Confidence scores for each pair of fields are generated, which are then combined to generate the overall score.

The scoring is based on heuristics, which are derived from:

* the purpose of the field, according to the Open Fibre Data Standard, and
* the type of data the field holds

We use a combination of exact matching, string similarity metrics, list overlaps, and geographical distance to calculate the scores.

When doing a manual comparison, the overall confidence score, and the breakdown of the fields this was derived from, are shown in the interface, alongside the maps displaying the features being compared. You can use this information to make the final decision about whether the two features are the same (and should be consolidated into one) or not (and should both be kept).

### Output

The final output of the tool are geoJSON files saved to your computer locally.
Expand Down

0 comments on commit 486f4e4

Please sign in to comment.