Skip to content

Latest commit

 

History

History
62 lines (36 loc) · 1.16 KB

Scores.md

File metadata and controls

62 lines (36 loc) · 1.16 KB

Ngram_range = (1,1)

Samples = 40000

Distribution thingy

Clusters left 16219

Score samples = 5000

  • Silhouette Score: 0.3714598392780462
  • Calinski-Harabasz Index: 531.2487680337874
  • Davies-Bouldin Index: 0.3510776700047736

thingy2

Ngram_range = (3,4)

Samples = 40000

Clusters left 20780

  • Silhouette Score: 0.3731614464962901
  • Calinski-Harabasz Index: 1710.8685446622549
  • Davies-Bouldin Index: 0.26170137702074264

Ngram_range(1,4)

Samples = 40000

Clusters left 21078

  • Silhouette Score: 0.394332678120356
  • Calinski-Harabasz Index: 2121.956261477304
  • Davies-Bouldin Index: 0.2418388814956972

Ngram_range(1,6)

Samples = 40000

Clusters left 20959

  • Silhouette Score: 0.3873160908743396
  • Calinski-Harabasz Index: 1844.4039985962393
  • Davies-Bouldin Index: 0.25929544332685023

Introduction of reproducible random sample selection

Ngram_range = (1,6)

  • Silhouette Score: 0.3760824861012863
  • Calinski-Harabasz Index: 1314.3402354925827
  • Davies-Bouldin Index: 0.30394380049564845

Ngram_range = (1,5)

  • Silhouette Score: 0.375777584363468
  • Calinski-Harabasz Index: 1134.77627081612
  • Davies-Bouldin Index: 0.3099807140343013