Skip to content

Commit

Permalink
typos
Browse files Browse the repository at this point in the history
  • Loading branch information
SamueleSoraggi committed Mar 22, 2024
1 parent 03d093a commit f27aa83
Show file tree
Hide file tree
Showing 3 changed files with 3 additions and 3 deletions.
2 changes: 1 addition & 1 deletion datasets/datapolicy.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ There are large public de-identified EHR datasets that serve as benchmark resour

## Synthetic data

![]("../images/Tradeoff_base.svg"){width="400px" fig-align="right"}
![]("../images/Tradeoff_base.svg"){width=400 fig-align="right"}

<!---
![tradeoff](../assets/images/Tradeoff_base.svg#right)
Expand Down
2 changes: 1 addition & 1 deletion datasets/synthdata.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ Recently, a few interesting libraries / pipelines have been released that enable
There are 3 key principles to consider when judging the overall quality of a synthetic dataset: fidelity to the original dataset, risk to privacy, and prediction utility. Fidelity and utility are often grouped together as similarity to the original data which exists in a trade-off with privacy - the more similar your synthetic dataset to the original, the higher your risk to patient privacy. However, the distinction between them is important as they can be achieved independently of each other depending on the project frame. Fidelity refers to reproduction of the multivariate shape and structure of the original data (including complex nonlinear relationships) while utility refers to how well the synthetic dataset matches the predictive accuracy of the original dataset. Risk to privacy includes both risk of patient reidentification and risk of sensitive information disclosure about a patient. There are many proposed evaluation metrics for measuring different aspects of these three qualities. We are actively investigating the performance of these metrics against our different datasets.


![](../assets/images/SynthDataQualities.png)
![](../images/SynthDataQualities.png)


We should point out that while using quantitative metrics to assess privacy preservation is a critical step in creating a synthetic dataset, positive results do not absolve us from any concerns regarding risk to privacy in the synthetic data. Regulatory guidelines regarding the safety of synthetic data and the ability to openly share it are extremely unclear. No authorities have specified quantitative cut-offs using these metrics that enable open release, for example. For this reason, we have developed our own internal guidelines for how to handle this aspect of the project, which are based on a comprehensive examination of relevant EU and Danish legislation (i.e. the GDPR, the Artificial Intelligence Act, the Danish Health Law, and the Danish Data Protection Act). We continue work on synthesis with hope that new legislation such as the development of the European Health Data Space will provide further guidance in the future.
Expand Down
2 changes: 1 addition & 1 deletion news.qmd
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: "News"
listing:
fields: [image, date, title, author, categories]
fields: [date, title, author]
contents: news/*.qmd
type: table
sort:
Expand Down

0 comments on commit f27aa83

Please sign in to comment.