CS 334: Lab 8: Random Sentence Generator

Overview

In this lab, you will practice writing class hierarchies in Scala, utilizing the Composite Design Pattern to build a random sentence generator, and exploring generics and traits to build flexible set abstractions.

Partner

You are encouraged to work with a partner on this lab. As always, please send email if you would like help finding a partner.

Getting Started

Setting Up Your Repository

You will receive an email with an invitation link to the lab8 assignment on GitHub Classroom. You can follow the same instructions as on Lab 2 for accessing and cloning your repository. See the GitHub reference for instructions to add a partner. You should answer the following in the appropriate files in your repository.

Scala Setup

See instructions here for setting up Scala.

The scala command will give you a "read-eval-print" loop, as in Lisp and ML.

You can also compile and run a whole file as follows. Suppose file A.scala contains:

object A {
    def main(args : Array[String]) : Unit = {
        println(args(0));
    }
}

You then compile and run the program with scala A.scala hello".

Programming

1. Random Sentence Generator (60 points)

The goals of this problem are to:

  1. write a class hierarchy in Scala, and

  2. utilize the Composite Design Pattern.

Random Sentence Generator

Long before ChatGPT, there was the "Random Sentence Generator". It creates random sentences from a grammar. With the right grammar you could, for example, use this program to generate homework extension requests:

  • Wear down the Professor's patience: I need an extension because I used up all my paper and then my dorm burned down and then I didn't know I was in this class and then I lost my mind and then my karma wasn't good last week and on top of that my dog ate my notes and as if that wasn't enough I had to finish my doctoral thesis this week and then I had to do laundry and on top of that my karma wasn't good last week and on top of that I just didn't feel like working and then I skied into a tree and then I got stuck in a blizzard on Mt. Greylock and as if that wasn't enough I thought I already graduated and as if that wasn't enough I lost my mind and in addition I spent all weekend hung-over and then I had to go to the Winter Olympics this week and on top of that all my pencils broke.

  • Plead innocence: I need an extension because I forgot it would require work and then I didn't know I was in this class.

  • Honesty: I need an extension because I just didn't feel like working.

Grammars

The program reads in grammars written in a form illustrated by this simple grammar file to generate poems:

<start> = The <object> <verb> tonight   
;

<object> =
  waves
| big yellow flowers    
| slugs
;

<verb> =
  sigh <adverb> 
| portend like <object>
| die <adverb>
;

<adverb> =
  warily    
| grumpily
;

The strings in brackets (<>) are the non-terminals. Each non-terminal definition is followed by a sequence of productions, separated by '|' characters, and with a ';' at the end. Each production consists of a sequence of white-space separated terminals and non-terminals. A production may be empty so that a non-terminal can expand to nothing. There will always be whitespace surrounding the '|', '=', and ';' characters to make parsing easy.

Here are two possible poems generated by generating derivations for this grammar:

The big yellow flowers sigh warily tonight

The slugs portend like waves tonight

Your program will create a data structure to represent a grammar it reads in and then produce random derivations from it. Derivations will always begin with the non-terminal <start>. To expand a non-terminal, simply choose one of its productions from the grammar at random and then recursively expand each word in the production. For example:

<start>
-> The <object> <verb> tonight 
-> The big yellow flowers <verb> tonight            
-> The big yellow flowers sigh <adverb> tonight     
-> The big yellow flowers sigh warily tonight

System Architecture

A grammar consists of terminals, non-terminals, productions, and definitions. These four items have one thing in common: they can all be expanded into a random derivation for that part of a grammar. Thus, we will create classes organized in the following class hierarchy to store a grammar:

The abstract class GrammarElement provides the general interface to all pieces of a grammar. It is defined as follows:

abstract class GrammarElement {

    /**
    * Expand the grammar element as part of a random 
    * derivation.  Use grammar to look up the definitions
    * of any non-terminals encountered during expansion.
    */
    def expand(grammar : Grammar) : String;

    /**
    * Return a string representation of this grammar element.
    * This is useful for debugging.  (Even though we inherit a
    * default version of toString() from the Object superclass, 
    * I include it as an abstract method here to ensure that 
    * all subclasses provide their own implmementaiton.)
    */
    def toString() : String;    
}

The Grammar object passed into expand is used to look up the definitions for non-terminals during the expansion process, as described next.

The Grammar Class

A Grammar object maps non-terminal names to their definitions. At a minimum, your Grammar class should implement the following:

class Grammar {

    // add a new non-terminal, with the given definition
    def addNonTerminal(nt : String, defn : Definition)

    // look up a non-terminal, and return the definition, or null
    // if not def exists.
    def apply(nt : String) : Definition

    // Expand the start symbol for the grammar.
    def expand() : String

    // return a String representation of this object.
    override def toString() : String
}

The toString method is useful for debugging.

Subclasses

The four subclasses of GrammarElement represent the different pieces of the grammar and describe how each part is expanded:

  • Terminal: A terminal just stores a terminal string (like "slugs"), and a terminal expands to itself.

  • NonTerminal: A non-terminal stores a non-terminal string (like "<start>"). When a non-terminal expands, it looks up the definition for its string and recursively expands that definition.

  • Production: A production stores a list of GrammarElements. To expand, a production simply expands each one of these elements.

  • Definition: A definition stores a list of Productions. A definition is expanded by picking a random Production from its list and expanding that Production.

This design is an example of the Composite Design Pattern. The hierarchy of classes leads to an extensible design where no single expand method is more than a few lines long.

Implementation Steps

  1. Download the starter code from the handouts web page. Once compiled with scalac (or fsc), you will run the program with a command like

    scala RandomSentenceGenerator < Poem.g

    You will need to use Scala's generic library classes. In particular, you will probably want to use both Lists and Maps from the standard Scala packages. The full documentation for these classes is accessible from the cs334 links web page.

  2. Begin by implementing the four subclasses of GrammarElement. Do not write expand yet, but complete the rest of the classes so that you can create and call toString() on them.

  3. The next step is to parse the input to your program and build the data structure representing the grammar in RandomSentenceGenerator.scala. The grammar will be stored in the instance variable grammar.

    I have provided a skeleton of the parsing code. The parser uses a java.util.Scanner object to perform lexical analysis and break the input into individual tokens. I use the following two Scanner methods:

    1. next(): String: Removes the next token from the input stream and returns it.

    2. hasNext(pattern: String): boolean: Returns true if and only if the next token in the input matches the given pattern. (If pattern is missing, this will return true if there are any tokens left in the input.)

    When parsing the input, it is useful to keep in mind what form the input will have. In particular, we can write an EBNF grammar for the input to your program as follows:

    <Grammar>     ::= [ Non-Terminal '=' <Definition> ';' ]*
    <Definition>  ::= <Production> [ '|' <Production> ]*
    <Production>  ::= [ <Word> ]*
    

    where Non-Terminal is a non-terminal from the grammar being read and Word is any terminal or non-terminal from the grammar being read. Recall that the syntax [ Word ]* matches zero or more Words.

    The parsing code follows this definition with the following three methods:

    protected def readGrammar(in : Scanner): Grammar
    protected def readDefinition(in : Scanner): Definition
    protected def readProduction(in : Scanner): Production
    

    Modify these methods to create appropriate Terminal, NonTerminal, Production, Definition, and Grammar objects for the input. You may wish to print the objects you are creating as you go to ensure the grammar is being represented properly. You will need to complete the definition of Grammar at this point as well.

  4. Once the grammar can be properly created and printed, implement the expand methods for your GrammarElements. Scala provides a random number generator that can be used as follows:

    val number = Random.nextInt(N);  // number is in range [0,N-1].
    

    Change RandomSentenceGenerator to create and print three random derivations after printing the grammar.

  5. You may also submit new a new grammar if you like. It can be as simple or complicated as you like.

A few details about producing derivations:

  • The grammar will always contain a <start> non-terminal to begin the expansion. It will not necessarily be the first definition in the file, but it will always be defined eventually. I have provided some error checking in the parsing code, but you may assume that the grammar files are otherwise syntactically correct.

  • The one error condition you should catch reasonably is the case where a non-terminal is used but not defined. It is fine to catch this when expanding the grammar and encountering the undefined non-terminal rather than attempting to check the consistency of the entire grammar while reading it. The starter code contains a

    RandomSentenceGenerator.fail(String msg)
    

    method that you can call to report an error and stop.

  • When generating the output, just print the terminals as you expand. Each terminal should be preceded by a space when printed, except the terminals that begin with punctuation like periods, comma, dashes, etc. You can use the Character.isLetterOrDigit method to check whether a character is punctuation mark. This rule about leading spaces is just a rough heuristic, because some punctuation (quotes for example) might look better with spaces. Don't worry about the minor details--- we're looking for something simple that is right most of the time and it's okay if is little off for some cases.

  • There is some latitude in how parts of this are implemented, so we will not have any autograder tests for this question.

Submitting Your Work

Submit your code to the GradeScope assignment named, for example, "Lab 1". You can submit in one of two ways:

  • Upload files: Click "Upload" and select all of your source files, or
  • Link GitHub: Click "GitHub" and select your repository and branch.

Please do not change the names of the starter files. Also:

  • If you worked with a partner, only one of each pair needs to submit the code.
  • Indicate who your partner is when you submit. Specifically, after you upload your files, there will be an "Add Group Member" button on the right of the Gradescope webpage -- click that and add your partner.

Autograding: Gradescope will run an autograder on your code that performs some simple tests. Be sure to look at the autograder output to verify your code works as expected. We will run more extensive tests on your code after the deadline.