Define training set schema #49

dieko95 · 2021-03-24T15:50:29Z

Problem

We currently haven't defined the flattened dataset's schema that will be consumed by the huggingface transformer.

Proposed Solution

Define the training dataset schema that will be used to train the huggingface transformer.

For example:
- Column names: text, news_title, location, issue, source_type, author, etc...
- Is column Nullable
- Variable type (varchar, int, float, etc..)

Deliverable

readme.md with dataset's schema.

The text was updated successfully, but these errors were encountered:

marianelamin · 2021-04-20T22:01:29Z

From the first PoC with El Pitazo, looks like we can get:
title, content, date, author, categories and tags.
It would be good to explore on our next sources whether or not they can be extracted also.

In the mean time, for a VP we are counting on just the content of the post. In case of a change on design, it will be notified here.

dieko95 · 2021-04-21T06:26:53Z

From the first PoC with El Pitazo, looks like we can get:
title, content, date, author, categories and tags.
It would be good to explore on our next sources whether or not they can be extracted also.

In the mean time, for a VP we are counting on just the content of the post. In case of a change on design, it will be notified here.

@marianelamin Gotcha! Thanks a lot for the update 🙌

dieko95 self-assigned this Mar 24, 2021

dieko95 added the documentation Improvements or additions to documentation label Mar 24, 2021

dieko95 assigned marianelamin, Edilmo and VKorelsky Mar 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define training set schema #49

Define training set schema #49

dieko95 commented Mar 24, 2021 •

edited

Loading

marianelamin commented Apr 20, 2021 •

edited

Loading

dieko95 commented Apr 21, 2021

Define training set schema #49

Define training set schema #49

Comments

dieko95 commented Mar 24, 2021 • edited Loading

Problem

Proposed Solution

Deliverable

marianelamin commented Apr 20, 2021 • edited Loading

dieko95 commented Apr 21, 2021

dieko95 commented Mar 24, 2021 •

edited

Loading

marianelamin commented Apr 20, 2021 •

edited

Loading