1. Natural language processing
  2. Generation
  3. Understanding
  4. Syntactic Analysis
  5. What is a language?
  6. What should a grammar tell us?
  7. A more complex grammar
  8. Components of a syntactic analysis program
  9. Additional grammar issues
  10. Results of syntactic analysis
  11. Semantic Analysis
  12. Reference
  13. Lexical Meaning
  14. Relationship among entities

Natural language processing (NLP)

Why do it? Note that NLP is concerned with:

Generation

Natural language generation is a planning problem.

Need to:

  1. decide that a speech act is called for
  2. decide which speech act is the right one
  3. decide what information you want to convey
  4. make choices about vocabulary
  5. create language that is appropriate (i.e., syntactically correct and unambiguous)
Speech acts are as follows:

Understanding

Goal: to determine what the speaker/writer is trying to say. Note that we will consider only written text.

Need to:

  1. understand syntax - i.e., the structure of the text.
  2. understand semantics - i.e., a partial representation of the meaning of the text
  3. pragmatics - i.e., the complete meaning of the text, determined by using contextual information.
Note that we will focus on understanding, rather than generation.

Syntactic Analysis

Why bother with syntactic analysis?

Because it helps us understand the roles played by different words in a body of text. The words themselves are not enough. Consider

Innocent peacefully children sleep little vs

Innocent little children sleep peacefully

There is some evidence that human understanding of language is, in part, based on structural analysis. Consider

"Twas brillig and the slithy toves did gyre and gimble in the wabe." [Lewis Carroll]

Here we can understand the sentence on some level, even though most of the words make no sense to us.

Colorless green ideas sleep furiously makes more sense to us than

Ideas green furiously colorless sleep

Finally, consider the sentence

The old dog the footsteps of the young.

In reading this sentence, you might have found yourself having to re-examine the first part. In all likelihood, you thought that the word "dog" was being used as a noun, when, in fact, it is a verb.


What is a language?

The most basic building blocks of language are words. Every language has a large, but finite, set of words.

Words are formed into sentences. A sentence, then, is a well-formed sequence of words.

The language is the set of all sentences that are well-formed, i.e., that follow a set of rules.

This set of rules is called a grammar.

To do syntactic analysis, we build a parser, i.e., a software systems that checks whether the rules are followed and that provides an analysis based on the grammar.


What should a grammar tell us?

As much information as possible about the structure of sentences.

For example, the grammar might tell us that some legal sentence structures are as follows:

  1. noun verb noun (i.e., a noun followed by a verb followed by a noun), as in

    Dogs chase cats.

  2. determiner noun verb determiner noun, as in

    The cat ate the fish.

  3. det noun prep adj noun verb prep det noun, as in

    The dog with one eye ran from the cat.

But none of these helps us determine the relationships between the words.

We want a grammar to give us more structural information. So, for example, some better grammar rules for sentences might be:

  1. Sentence -> NP VP (i.e., a sentence can be formed from a noun phrase followed by a verb phrase).
  2. NP -> noun (a noun phrase can be just a single noun)
  3. NP -> det noun (a noun phrase can be a determiner followed by a noun)
  4. VP -> verb (a verb phrase can be just a single verb)
  5. VP -> verb NP (a verb phrase can be a verb followed by a noun phrase)
Now if we were given the sentence

The cat ate the fish

our grammar could help us determine the following structure, which we call a parse tree:

This might be represented textually as:

(Sentence
   (NP
      (det  the)
      (noun cat))
   (VP
      (verb ate)
      (NP
         (det  the)
         (noun fish))))

A more complex grammar

We could add complexity to the grammar to allow for prepositional phrases:

  1. Sentence -> NP VP
  2. NP -> noun
  3. NP -> det noun
  4. NP -> adj noun
  5. NP -> det noun PP (where PP means "prepositional phrase")
  6. VP -> verb
  7. VP -> verb NP
  8. VP -> verb PP
  9. PP -> prep NP
Now if we had the sentence

The dog with one eye ran from the cat., a parser could produce the following parse tree:


Components of a syntactic analysis program

In order to perform syntactic analysis, we need
  1. a parser - i.e., a program that takes as input a sentence and produces the analysis.
  2. a grammar - i.e., a set of rules that the parser can use.
  3. a lexicon - i.e., a dictionary of legal words and their parts of speech
Note that semantic analysis is limited in the following ways:

Additional grammar issues

There's more to grammar than determining parts of speech and overall sentence structure, including: Can augment the grammar and the lexicon to help us deal with these issues.

Results of syntactic analysis

Syntactic analysis helps us to identify sentence structure and relationships between entities. This gives us
  1. clues on word meaning:

    The nurses hand the doctors the scalpels.

    The nurse's hand was bandaged.

  2. clues on overall phrase meaning:

    Flying planes is dangerous.

    Flying planes are dangerous.


Semantic Analysis

Semantic analysis includes:

Reference

Here the goal is to determine what words and phrases refer to in the real world. Issues include

Determining the type of thing referred to. For example, noun phrases can refer to:

while verbs can refer to: Definite vs indefinite reference. For example,

Your parser should be able to handle 10 sentences. (indefinite)

Your parser should be able to handle the 10 sentences I gave you. (definite)

Generic vs instance. For example,

Of all the breeds of dogs, the dalmation is my favorite.

My friend Jane has two dogs. My favorite is the dalmation.

Anaphora, i.e., pronouns and definite reference. For example,

The lizard's tail fell off and three days later it had grown a new one.

Quantification. For example,

Jane bought every girl a yellow T-shirt.


Lexical Meaning

A single word can have multiple meanings. For example, "fly" can mean: (a) a winged insect, (b) a fish hook, (c) the action of flying, (d) a baseball hit, (d) motion (as in "on the fly").

Kathy was so busy she ate her lunch on the fly.

Kathy was distracted and put her sandwich on the fly.

How can we do lexical disambiguation?


Relationship among entities

Here the issue is determining the role played by each entity in a sentence. Roles include: For example, I gave my students an assignment.

Syntax can sometimes signal a role.