Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diagnosis for encoding #105

Open
JasonLo opened this issue Jan 23, 2024 · 2 comments
Open

Diagnosis for encoding #105

JasonLo opened this issue Jan 23, 2024 · 2 comments
Assignees

Comments

@JasonLo
Copy link
Collaborator

JasonLo commented Jan 23, 2024

Problematic examples:

5fc4fd44d76fca4a3f0cd224:

  • 'Pearson correlation (R) is a statistical measure of goodness of fit between M number of model function values (y) and COVID-19 data (z) as follows [17]: ������������������1 = ������ ������ ∑ ������������ − ∑ ������ ∑ ������ = √(������ ∑ ������2 − (∑ ������)2)(������ ∑ ������2 − (∑ ������)2)\nThe regression coefficient (R2) is another statistical measure for goodness of fit between M number of projected values (y) and COVID-19 data (z), although we used it as follows [18-19]: = − =\nIn equation (15), ������̅ is the average of COVID-19 data and better fits provide the regression coefficient (R2) values close to unity and Obj2 to zero value'

5f2d7604a58f1dfd5210abe7:

  • E [D(41 )] = pd (1 ) · N (1 ) · P (41 ) + pd (2 ) · N (2 ) · P (40 ) + · · · + pd (34 ) · N (34 ) · P (8 )\nE [D(42 )] = pd (2 ) · N (2 ) · P (41 ) + pd (3 ) · N (3 ) · P (40 ) + · · · + pd (35 ) · N (35 ) · P (8 ) ..\nE [D(103 )] = pd (63 ) · N (63 ) · P (41 ) + pd (64 ) · N (64 ) · P (40 )',

5ed7bd5b768935d2be5cd1df:

  • being susceptible to be infected by the SARS-Cov2 virus, the fraction having been exposed to it, the fraction infected, and the fraction removed (including recoveries and deceases), the epidemic is assumed to obey the following continuous-time dynamics: \uf8f1 \uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f2 ds dt (t) de dt (t) = −β(t)i(t)s(t) = β(t)i(t)s(t) − γe(t) \uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f3 di dt (t) dr dt (t) = γe(t) − δi(t) = δi(t) s(t) + e(t) + i(t) + r(t) = 1 where: • β(t), t ∈ R, represents the time-varying virus transmission rate; • γ denotes the rate at which the exposed subject develops the disease (this includes people presenting symptoms and asymptomatics).'

5b192c78cf58f11cd52c690c:

  • S ~ ~ sCynEtax into GIS for niodelling urban spaces,l r ~ k r r i n r i ~ i i d [6] Jin, G. J.. 2001, Introduction to Urbnn Design Guidelines in USA, Ur*DirriPlrvrriiirig O!wrerrs, 2: 6-10.',

5d5090dd0b45c76cafa47d80:

  • Thus, the rate of data-driven behaviour change p(t) can be defined as p(t) = news(t) = news(t) , max{news(t), t ∈ NT} t ∈ NT = {1, … , 38}, then we obtain the rate of behaviour change in the Shaanxi province of China during the 2009 A/H1N1 influenza epidemic from September 3 to October 10 (the first wave), which together with new hospital notifications has been used to estimate all unknown parameters (OR0, OR1, OR2, OR3, OR4, ������, ������, ������, ������, ������) of Equation 6. We took account of agent heterogeneity by allowing each individual parameter to vary across a distribution (ie, the same mean but different standard deviation), from which agents sampled their values using log-normal distributions for the ORk(k = 0, 1, … , 4) (ie, ORk ∼ exp{N(������, ������2)}) and normal distributions for ������, ������, ������, ������, and ������.'
@JasonLo
Copy link
Collaborator Author

JasonLo commented Jan 30, 2024

Identified as an ElasticSearch problem.

@iross
Copy link
Collaborator

iross commented Feb 14, 2024

I need to think about the best path forward here. It looks like some of these are valid, but unfortunately poor quality extractions (5b192c78cf58f11cd52c690c, 5f2d7604a58f1dfd5210abe7). Others look like extraction issues that are fixable (5d5090dd0b45c76cafa47d80, 5fc4fd44d76fca4a3f0cd224). But 5ed7bd5b768935d2be5cd1df is an odd case. It looks like it just has invalid UTF characters included.

So the real question is how to systematically detect problematic documents and fix them.. Probably I'll end up having to do a full sweep but that sounds painful..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants