Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POC: Huggingface Demo Model #77

Closed
wants to merge 5 commits into from

Conversation

dieko95
Copy link
Member

@dieko95 dieko95 commented May 7, 2021

Google Colab Link

Contexto

En este POC voy a enfocarme en realizar clasificación binaria. Para esto, voy a escoger el label con mayor frequencia de la columna tipo_de_evento.

La clasificación va a estar enfocada en clasificar:

  • Denuncia Falta de Servicio (label = 1)
  • No es Denuncia Falta de Servicio (label = 2)
tipo_de_evento
DENUNCIA FALTA DEL SERVICIO 1134
FALTA DE SERVICIO 525
PROTESTA FALTA DEL SERVICIO 324
DENUNCIA SERVICIO MALA CALIDAD 146
DENUNCIA ALTO COSTO SERVICIO 96
SERVICIO MALA CALIDAD 69
DENUNCIA COBRO EN DIVISAS 37
ALTO COSTO SERVICIO 21
COBRO EN DIVISAS 20
PROTESTA ALTO COSTO SERVICIO 6
DENUNCIA 5
PROTESTA 4
ENEFERMEDAD ASOCIADA 3
PROTESTA COBRO EN DIVISAS 3
METODO ALTERNATIVO PARA COCINAR 2
DENUNCIA TIEMPO PARA ADQUIRIR SERVICIO 2
ENFERMEDAD ASOCIADA 1
DENUNCIA RETARDO EN LA ENTREGA 1
PROTESTA RETARDO EN LA ENTREGA 1
PROTESTA TIEMPO PARA ADQUIRIR SERVICIO 1

Steps to run the script - link to script

  1. Open Google Colab and select GPU runtime.
  2. Upload elpitazo_positivelabels_devdataset.csv to Google Colab
  3. copy the github branch that you're working on and the Experiment that you're running on line 190
  4. Copy all the script to Google Colab
  5. Run Script in Google Colab.

@dieko95 dieko95 self-assigned this May 7, 2021
@dieko95 dieko95 requested review from Edilmo and asciidiego May 7, 2021 17:05
@dieko95 dieko95 added good first issue Good for newcomers help wanted Extra attention is needed labels May 7, 2021
@dieko95 dieko95 linked an issue May 7, 2021 that may be closed by this pull request
1 task
@dieko95
Copy link
Member Author

dieko95 commented May 7, 2021

@Edilmo I'm blocked when I execute the training cell. We cant talk about this on Saturday's call

trainer.train()

TypeError                                 Traceback (most recent call last)
<ipython-input-26-1a10a9109f73> in <module>()
      1 # training with custom dataset
----> 2 trainer.train()

15 frames
/usr/local/lib/python3.7/dist-packages/datasets/formatting/torch_formatter.py in _tensorize(self, value)
     42             default_dtype = {"dtype": torch.float32}
     43 
---> 44         return torch.tensor(value, **{**default_dtype, **self.torch_tensor_kwargs})
     45 
     46     def _recursive_tensorize(self, data_struct: dict):

TypeError: new(): invalid data type 'numpy.str_'

@dieko95
Copy link
Member Author

dieko95 commented Jul 17, 2021

@Edilmo @LDiazN you may check the POC here :) feel free to dive directly into the poc script

@dieko95 dieko95 requested a review from LDiazN July 17, 2021 22:47
@LDiazN LDiazN mentioned this pull request Aug 3, 2021
@dieko95 dieko95 closed this Sep 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[POC] Create classification example with PSCDD
1 participant