A simple python repository for developing perceptron based text mining involving dataset linguistics preprocessing for text classification and extracting similar text for a given query.
New Implementation: Added PyTorch based optimization handling buggy loading of sparse 'csr_matrix' to cuda tensor.
-
Numpy implementation,
Vanilla Optimization Optimization with L2-Regularization Top 5 weighted terms,
Terms Weights Terms: L2 Weights: L2 langeweile 7.094 top 5.8911 geilo 7.0535 langeweile 5.8396 best 6.7828 geilo 5.7615 love 6.376 perfekt 5.6325 exzellent 6.3534 super 5.6279 -
PyTorch implementation,
Vanilla Optimization Optimization with L2-Regularization Histogram:Weights Penalized Weights Top 5 weighted terms,
Terms Weights Terms: L2 Weights: L2 erfolgreichen 20.5452 cool 8.8814 anmeldungen 20.0064 geil 8.0933 angemessene 19.658 super 6.7332 eonfach 19.5906 top 5.4004 verarbeitung 19.5136 gut 4.8924
Install dependencies using:
pip3 install -r requirements.txt
- Email: [email protected]
- Website: https://kanishknavale.github.io/