Problems with parsing quotes #2

NikhilPr95 · 2016-07-04T09:36:48Z

There are three types of problems that come about when parsing quotes -

It very frequently divides the quote and the rest of the sentence into two separate sentences.

E.g. - "So what?" said Harry.

Here ' "So what?" ' and ' said Harry. ' are parsed as two separate sentences, rather than one.
2. Similar to the first, It divides a quote and the rest of the sentence into two sentences, but here the first word after the quote is a character identified by a character id.

E.g. "What is?" George demanded.

is parsed as two sentences ' "What is?" ' and 'George demanded. '
3. It concatenates two separate quotes which belong in different sentences into the same sentence.

E.g. "How are you?" "I'm fine, thank you", he replied.

Here while ' "How are you" ' is a separate sentence, it is considered as part of the second sentence.
4. It takes the beginning opening quotes ' " ' of a dialogue and takes it as the last token of the previous sentence.

E.g. There was a big blue shape in the sky. " What is it? " Asked Beth.

It parses these two individual sentences as ' There was a big blue shape in the sky. " ' and
' What is it ? " Asked Beth.

However the 'in quotes' values for 'What' here is 'true' making these easy to discover.

I found these errors and corrected them through hard coding in my own program ( For 1 - checking if the first word of a sentence is either in lower case or a character and appropriately concatenating the sentences. For 2 - Checking for every instance of consecutive quotes and dividing, For 3 - Checking if the first word of a sentence is 'in quotes', the word before it in the previous sentence is a double quote, and the word before that is a period, and correcting appropriately)

I was pleased with the results UNTIL I realised that the parser which constructs dependency trees does so on the original 'wrong' sentences and not on mine.
This left me trying to use the actual MaltParser for these affected sentences but I found that the parsing is not exactly the same - I assume that your code does not use the MaltParser directly and uses extra information as well.

I would really like this fixed as I am otherwise using only the tokens document that I got from implementing your code and this complicates things a lot.

If you could tell me a quick fix to this, it would be appreciated as well. In the meanwhile, I'll try to see if it is possible for me to make the necessary changes in your code myself.

P. S.

I am very grateful for this repository without which a project I am working on analyzing novels would have been much much more difficult. Thanks!

NikhilPr95 · 2016-07-04T13:00:53Z

EDIT:

I realised that issues 1 and 2 may have been left by design, however, issues 3 and 4 remain legitimate

NikhilPr95 · 2016-07-04T13:01:48Z

EDIT 2:

Upon examining the code, It seems that at least some of these issues are caused due to issues in the StanfordCoreNLP API and its parser, rather than anything in this repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems with parsing quotes #2

Problems with parsing quotes #2

NikhilPr95 commented Jul 4, 2016 •

edited

Loading

NikhilPr95 commented Jul 4, 2016

NikhilPr95 commented Jul 4, 2016 •

edited

Loading

Problems with parsing quotes #2

Problems with parsing quotes #2

Comments

NikhilPr95 commented Jul 4, 2016 • edited Loading

NikhilPr95 commented Jul 4, 2016

NikhilPr95 commented Jul 4, 2016 • edited Loading

NikhilPr95 commented Jul 4, 2016 •

edited

Loading

NikhilPr95 commented Jul 4, 2016 •

edited

Loading