I need just a Boolean yes/no answer or some accuracy score on the grammatical correctness of an English sentence. Does any parser/tools do that? For example, "I drink water" is correct, but "I water drink" is not! Stanford parser, for example, parses this sentence without catching the grammatical incorrectness (it says in the documentation that it does not check for grammatical correctness).

asked Dec 12 '10 at 21:36

Golam%20Kawsar's gravatar image

Golam Kawsar
86469

I'm by no means an NLP expert, but this sounds like a very difficult problem. Grammatical correctness isn't really defined, and is mostly a subjective judgment. I'm sure there are tons of cases where human experts would disagree with each other.

(Dec 12 '10 at 22:58) Kevin Canini
3

Your example is particularly tough because "water" can be a verb and "drink" can be a noun.

(Dec 13 '10 at 16:04) rm999
2

@rm999 is exactly right. "I water drink" is syntactically valid, but semantically very strange. The implication would be something like, you water (as in watering plants) drink (a group form of "drinks" as in beverages). The more coverage your parser has, the more likely you'll be able to find these sort of "false" ambiguities. It's essentially a precision/recall trade off. You can either parse a limited set of syntactic constructions, or you can have a tremendous amount of ambiguity in the parse output.

Regarding the stanford parser (for the PCFG parser anyway) you should be able to break into it to get a confidence measure. With the confidence measure, you could then threshold the score to your liking.

(Dec 13 '10 at 16:35) Andrew Rosenberg

OK, maybe I should have changed the sentence to "I am drinking water" or something like that. The point was to ask if there are parsers in existence that can determine whether a "fairly unambiguous" English sentence is grammatically correct or not.

(Dec 13 '10 at 21:57) Golam Kawsar

This sentence come across my mind which might better explain the already bad enough situation. "I water plant" and "I plant water", both are syntactically correct and the only thing we can say it is wrong is it doesn't make sense. +1 to @rm999

(Dec 21 '10 at 23:51) cherhan

3 Answers:

Lots of research is going on at University of Edinburgh. They have a good NLP parser. Check it it suits your needs. http://www.inf.ed.ac.uk/resources/nlp/

answered Dec 13 '10 at 00:34

kapil_dalwani's gravatar image

kapil_dalwani
16113

The comment by m999 goes right to the point, I think. A lot of the research in statistical parsing has shown that:

  1. really weird sentences can be generated by sensible grammars and can make sense in a twisted way

  2. ordinary sentences can be considered grammatical or ungrammatical depending on how you look at them

So it's really hard to do a pure test of grammaticality. As m999's comment says, in "I water drink" "drink" can be a noun (let's say I have a violet affectionately named "drink") and the sentence then is parseable as me stating that I usually water the plant. However, most probabilistic parsers can also produce a log-likelihood of a given parse tree for a given sentence, and you can roughly use that as a proxy for the grammaticality of a sentence. I think any parser would give the maximum likelihood parse tree for "I drink water" a much higher likelihood than the one for "I water drink", if only because tagging "water" as a verb and "drink" as a noun is really rare. The one thing you must be careful is that longer sentences will always have smaller log-likelihoods, so you should control for that is you want to set a hard threshold.

answered Dec 13 '10 at 16:37

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

Thanks Alexandre. I am using the Stanford parser. Will have to figure out how to get and use this likelihood score from that.

(Dec 13 '10 at 22:00) Golam Kawsar

I would be careful when interpreting the log-probabilities coming out of treebank-based generative parsers (i.e. the Stanford Parser) as a confidence measure, and be even more careful when attempting to interpret this score as indicative of grammaticality. While there might be some weak correlation between the log-score and the grammaticality, there are tons of other factors that are likely to affect the score much more than grammaticality does (e.g. sentence length, the specific words used in the sentence, the specific syntactic constructions used in the sentence, etc.). Keep in mind that these kinds of parsers were specifically designed to be "robust" in the sense that they should assign the best possible structure to ANY sentence. They were never designed to asses grammaticality, which is a completely different task.

Joachim Wagner did some work specifically on grammaticality assessment, and how it relates to parser scores, which you might find interesting.

answered Dec 22 '10 at 09:26

yoavg's gravatar image

yoavg
741122331

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.