Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot create and train a new trait entity using HTTP API #811

Closed
Boyko-Karadzhov opened this issue Oct 5, 2017 · 19 comments
Closed

Cannot create and train a new trait entity using HTTP API #811

Boyko-Karadzhov opened this issue Oct 5, 2017 · 19 comments

Comments

@Boyko-Karadzhov
Copy link

Do you want to request a feature, report a bug, or ask a question about wit?
bug

What is the current behavior?

  • Entity is not recognized using expressions that were previously given as samples
  • For some time after submitting the samples, the values do not appear in the entity while training status remains clean
  • After the values appear, there are duplicating values (probably same as this Duplicate trait values for intent #743)

If the current behavior is a bug, please provide the steps to reproduce and if possible a minimal demo of the problem.
Create an entity:

curl -X POST \
  https://api.wit.ai/entities \
  -H 'authorization: Bearer xxxxxxx' \
  -H 'cache-control: no-cache' \
  -H 'content-type: application/json' \
  -d '{
   "id":"Conversation"
}'

Add samples for two values:

curl -X POST \
  'https://api.wit.ai/samples?v=20170307' \
  -H 'authorization: Bearer xxxxxxx' \
  -H 'cache-control: no-cache' \
  -H 'content-type: application/json' \
  -d '[{"text":"Book a doctor","entities":[{"entity":"Conversation","value":"bookDoctor"}]},{"text":"I would like to book a doctor","entities":[{"entity":"Conversation","value":"bookDoctor"}]},{"text":"Can I book a doctor for this Tuesday?","entities":[{"entity":"Conversation","value":"bookDoctor"}]},{"text":"Is doctor Burke available this Tuesday?","entities":[{"entity":"Conversation","value":"bookDoctor"}]},{"text":"Can I check doctor Burke'\''s schedule for this week?","entities":[{"entity":"Conversation","value":"bookDoctor"}]}]'

curl -X POST \
  'https://api.wit.ai/samples?v=20170307' \
  -H 'authorization: Bearer xxxxxxxx' \
  -H 'cache-control: no-cache' \
  -H 'content-type: application/json' \
  -d '[{"text":"Contact support","entities":[{"entity":"Conversation","value":"contactOperator"}]},{"text":"Can I talk to an operator?","entities":[{"entity":"Conversation","value":"contactOperator"}]},{"text":"Would you get me in touch with a human?","entities":[{"entity":"Conversation","value":"contactOperator"}]},{"text":"It will be great if I can talk to a person.","entities":[{"entity":"Conversation","value":"contactOperator"}]},{"text":"Can you get me in touch with an operator?","entities":[{"entity":"Conversation","value":"contactOperator"}]},{"text":"Forward me to an operator","entities":[{"entity":"Conversation","value":"contactOperator"}]}]'

Wait for training status to become clean and test application's understanding for one of the expressions like: Book a doctor.

What is the expected behavior?
When training status is clean:

  • the entity should have only 2 values: bookDoctor and contactOperator
  • messages that match expressions (or are similar) should be recognized as one of the entity's values. In this case: Book a doctor should be recognized as Conversation entity with bookDoctor value

What is the App ID where you are experiencing this issue (if applicable)?
59d6256d-faae-4935-a4ba-7ff546707d4d (easily reproduced in a new app)

@l5t
Copy link

l5t commented Oct 5, 2017

Thanks for reporting. We identified the bug and are working on it

@patapizza patapizza added the bug label Oct 5, 2017
@patapizza patapizza self-assigned this Oct 5, 2017
@patapizza
Copy link
Member

@Boyko-Karadzhov I removed the duplicates for Conversation. We are still working on a fix.

@blandinw
Copy link
Contributor

@Boyko-Karadzhov this is now fixed. We had a bug in our normalization code, causing the uppercase letter in your value to not be properly handled. Apologies for the inconvenience and thanks for your patience.

image

@Boyko-Karadzhov
Copy link
Author

Hello again,

The values are not duplicating now but there is still the issue with the recognition. I have made a new app to demonstrate: 59e47043-64f9-4daf-acf2-5c2710a999dc

It has the entity, values and samples persisted. I have trained it using this bash script (same as before):

curl -X POST \
  https://api.wit.ai/entities \
  -H 'authorization: Bearer xxxxxx' \
  -H 'cache-control: no-cache' \
  -H 'content-type: application/json' \
  -d '{
   "id":"Conversation"
}'

curl -X POST \
  'https://api.wit.ai/samples?v=20170307' \
  -H 'authorization: Bearer xxxxxx' \
  -H 'cache-control: no-cache' \
  -H 'content-type: application/json' \
  -d '[{"text":"Book a doctor","entities":[{"entity":"Conversation","value":"bookDoctor"}]},{"text":"I would like to book a doctor","entities":[{"entity":"Conversation","value":"bookDoctor"}]},{"text":"Can I book a doctor for this Tuesday?","entities":[{"entity":"Conversation","value":"bookDoctor"}]},{"text":"Is doctor Burke available this Tuesday?","entities":[{"entity":"Conversation","value":"bookDoctor"}]},{"text":"Can I check doctor Burke'\''s schedule for this week?","entities":[{"entity":"Conversation","value":"bookDoctor"}]}]'

curl -X POST \
  'https://api.wit.ai/samples?v=20170307' \
  -H 'authorization: Bearer xxxxxx' \
  -H 'cache-control: no-cache' \
  -H 'content-type: application/json' \
  -d '[{"text":"Contact support","entities":[{"entity":"Conversation","value":"contactOperator"}]},{"text":"Can I talk to an operator?","entities":[{"entity":"Conversation","value":"contactOperator"}]},{"text":"Would you get me in touch with a human?","entities":[{"entity":"Conversation","value":"contactOperator"}]},{"text":"It will be great if I can talk to a person.","entities":[{"entity":"Conversation","value":"contactOperator"}]},{"text":"Can you get me in touch with an operator?","entities":[{"entity":"Conversation","value":"contactOperator"}]},{"text":"Forward me to an operator","entities":[{"entity":"Conversation","value":"contactOperator"}]}]'

After waiting for the status to become "clean" I have tried recognition of a message "Book a doctor" and it is not recognized.

I don't think it is a matter of waiting and status. At this point, it is stuck and it will not learn to recognize Conversation entities until I add another value (I have experimented).

Is it possible that the actual training is queued for the first request (create entity, no samples) and then next requests don't trigger new training since one is already queued but with outdated model? I'm trying to explain how adding new value after a while, results in correct training of all values.

If you make the requests manually, with a significant delay, then training is working. The problem is that I'm doing it programmatically and adding constant delays still leaves it to chance.

@Boyko-Karadzhov
Copy link
Author

@blandinw Would you reopen the issue?
I'm not sure if it is reaching you since it is already closed.

@blandinw blandinw reopened this Oct 17, 2017
@blandinw
Copy link
Contributor

@Boyko-Karadzhov I'm looking into it.
Also, as the creator of the issue, are you not able to reopen it yourself?

@blandinw
Copy link
Contributor

@Boyko-Karadzhov I used your script and was not able to reproduce your issue.

Please keep in mind that POST /samples will train your app asynchronously, getting a 200 back does not mean your app is trained. During normal operations, your app should be trained within a few seconds of the POST /samples request. However at the moment, during peak traffic (like we had a few times these past few weeks), it may take a few minutes or even a few hours. We are working a new dataset infra that should make the worst case scenario way faster (max 1min), but this has not been released yet.

I'm going to close this, please comment back if it still does not work.

@Boyko-Karadzhov
Copy link
Author

@blandinw The app, 59e47043-64f9-4daf-acf2-5c2710a999dc that I used to demonstrate, was created 3 days ago and it is still not trained. I doubt it will start recognizing entities no matter how much we wait. I have just tried it again with a new app - 59e854b0-264f-4825-87bc-de62686bd9a6. Again - no recognition. Asynchronicity aside, I think there is a bug because the apps will be trained if I force them with one extra sample after a while. They just don't train on the first run. I reproduce it consistently.

There is also the issue is with the uncertainty of the outcome. There is no way to know if it is done now, we should wait more or if something went wrong along the way and it will never be done.

Can we get a status endpoint to tell us if training is queued, in progress, ready or failed? I'm using the /status which you use in the wit.ai UI but it seems to return clean before processing the samples and cannot be used reliably as an indicator.

@grinono
Copy link

grinono commented Oct 23, 2017

We have the same issue, Pushing keywords to a entity via the HTTP rest API works, But once their, they will never be recognized.

@hristoborisov
Copy link

hristoborisov commented Oct 25, 2017

@blandinw we are continuing to experience this problem. Can you please help us resolve it before you close the issue? We are relying on your API to train chatbots on the go, and it simply doesn't work.

@stopachka stopachka reopened this Oct 25, 2017
@stopachka
Copy link
Contributor

stopachka commented Oct 25, 2017

Hey Hristo,

There's quite a few different issues here.

  1. Training not stopping
  2. Training never starting
  3. Duplicate keywords

Are you experiencing issues with 1, 2, or 3?

If you could tell me your app-id, and a repro of your issue happy to look into it

(Also 3. does not relate to this initial issue, as trait entities do not have keywords, if that is the issue will consolidate into a different item, so we can be on the same page)

@hristoborisov
Copy link

hristoborisov commented Oct 26, 2017

Hey @stopachka,

I am referring to the only issue that wasn't resolved in this thread - The training is done (clean status), but there is no understanding. Let me copy/paste @Boyko-Karadzhov's step to reproduce again here.

I have made a new app to demonstrate: 59e47043-64f9-4daf-acf2-5c2710a999dc. It has the entity, values and samples persisted. I have trained it using this bash script (same as before):

  https://api.wit.ai/entities \
  -H 'authorization: Bearer xxxxxx' \
  -H 'cache-control: no-cache' \
  -H 'content-type: application/json' \
  -d '{
   "id":"Conversation"
}'

curl -X POST \
  'https://api.wit.ai/samples?v=20170307' \
  -H 'authorization: Bearer xxxxxx' \
  -H 'cache-control: no-cache' \
  -H 'content-type: application/json' \
  -d '[{"text":"Book a doctor","entities":[{"entity":"Conversation","value":"bookDoctor"}]},{"text":"I would like to book a doctor","entities":[{"entity":"Conversation","value":"bookDoctor"}]},{"text":"Can I book a doctor for this Tuesday?","entities":[{"entity":"Conversation","value":"bookDoctor"}]},{"text":"Is doctor Burke available this Tuesday?","entities":[{"entity":"Conversation","value":"bookDoctor"}]},{"text":"Can I check doctor Burke'\''s schedule for this week?","entities":[{"entity":"Conversation","value":"bookDoctor"}]}]'

curl -X POST \
  'https://api.wit.ai/samples?v=20170307' \
  -H 'authorization: Bearer xxxxxx' \
  -H 'cache-control: no-cache' \
  -H 'content-type: application/json' \
  -d '[{"text":"Contact support","entities":[{"entity":"Conversation","value":"contactOperator"}]},{"text":"Can I talk to an operator?","entities":[{"entity":"Conversation","value":"contactOperator"}]},{"text":"Would you get me in touch with a human?","entities":[{"entity":"Conversation","value":"contactOperator"}]},{"text":"It will be great if I can talk to a person.","entities":[{"entity":"Conversation","value":"contactOperator"}]},{"text":"Can you get me in touch with an operator?","entities":[{"entity":"Conversation","value":"contactOperator"}]},{"text":"Forward me to an operator","entities":[{"entity":"Conversation","value":"contactOperator"}]}]'

After waiting for the status to become "clean" I have tried recognition of a message "Book a doctor" and it is not recognized.

I don't think it is a matter of waiting and status. At this point, it is stuck and it will not learn to recognize Conversation entities until I add another value (I have experimented).

Is it possible that the actual training is queued for the first request (create entity, no samples) and then next requests don't trigger new training since one is already queued but with outdated model? I'm trying to explain how adding new value after a while, results in correct training of all values.

If you make the requests manually, with a significant delay, then training is working. The problem is that I'm doing it programmatically and adding constant delays still leaves it to chance.

@patapizza
Copy link
Member

Hey @hristoborisov, just to make sure, does it work after a while without adding in a new value?

@darvinai
Copy link

darvinai commented Nov 2, 2017

@patapizza no, it doesn't work. We have projects created a month ago that we haven't touched and are still not working. If you touch them with new values, they start to work. I am writing from our system github account.

-Hristo Borisov

@blandinw
Copy link
Contributor

blandinw commented Nov 6, 2017

We'll look into it again, thanks for your patience

@bpleao
Copy link

bpleao commented Nov 29, 2017

Any news on this topic? I'm facing similar issues. Thanks

@stopachka
Copy link
Contributor

Update on this: #876
tl:dr -- reproing is quite hard for this, but we have 2 action items to get to a solution. Moving the convo to that thread

@mohit2494
Copy link

any update on the above points?
is anybody still facing the issue?

I wanted to know how to create a spanless entity (trait) which was formerly called intent ( now deprecated ).
Can anyone help me out with an example where I can create a spanless entity and train is using some expressions? Thanks for your help.

@patapizza
Copy link
Member

@mohit2494 Seems like you got your answers in #231? Please create a new issue instead of bubbling up old ones, it makes it easier to track.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants