Machine Problem: Monadic Parsing

Overview

In this assignment you will use the monadic parsing primitives discussed in class to implement a parser for a simple subset of the C language.

The subset of C you will be parsing consists of properly formatted (but not necessarily semantically correct!) function definitions of the following form:

type func_name(type param, type param, ...) {
    type local_var, local_var, ...;
    type local_var, local_var, ...;
    ...
    var = VAR_OR_VALUE;
    var = VAR_OR_VALUE;
    ...
    return VAR_OR_VALUE;
}

Permitted types are int and char, permitted strings for param, local_var, and var are identifiers which start with a lowercase character and are followed by alphanumeric characters, and VAR_OR_VALUE can either be an identifier as previously described or an integer.

The following are examples of valid input for your parser:

char foo1() { }

int foo2(char param1) { }

char foo3(char param1) {
  return param1;
}

char foo4(char param1) {
  return -1;
}

char foo5(char p1, char p2, int p3) {
  return 0;
}

char foo6(char p1, char p2, int p3) {
  char local1;
  return local1;
}

char foo7(char p1, char p2, int p3) {
  char l1, l2, l3;
  int l4, l5, l6;
  return -1;
}

char foo8(char p1, char p2, int p3) {
  char buf1;
  int n, m;
  char buf2;
  n = m;
  buf1 = 10;
  buf2 = -20;
  return 100;
}

Our language subset requires that all local variable declarations occur before any assignments, and that the return statement appears at most once, and only as the last statement in the function. Whitespace is not important, as in C.

Working on the assignment

All your code for this assignment should go into a file named "MP4.hs" within the "src/mps" subdirectory in your repository.

Implementation details

You will implement the parser funcDef :: Parser (String,[String],[String],String), which, when used to parse a string containing a valid function definition, will return a tuple containing:

  1. The name of the function
  2. A list of the names of the parameters of the function (if any)
  3. A list of the names of the local variables declared within the function (if any)
  4. The variable name or integer value returned by the function (as a string), or the empty string, if there is no return statement.

Invoking parse funcDef on the examples above, for instance, will return:

[(("foo1",[],[],""),"")]

[(("foo2",["param1"],[],""),"")]

[(("foo3",["param1"],[],"param1"),""])

[(("foo4",["param1"],[],"-1"),"")]

[(("foo5",["p1","p2","p3"],[],"0"),"")]

[(("foo6",["p1","p2","p3"],["local1"],"local1"),"")]

[(("foo7",["p1","p2","p3"],["l1","l2","l3","l4","l5","l6"],"-1"),"")]

[(("foo8",["p1","p2","p3"],["buf1","n","m","buf2"],"100"),"")]

For invalid functions, the parser will of course return the empty list. Examples of these are given below.

We highly recommend that you implement parsers for different parts of a function definition, which you can then sequence (inside a do block) to build your function parser. Here are some of the parsers you might want to consider building:

typeName :: Parser String -- a parser that recognizes "int" or "char"
paramList :: Parser [String] -- a parser for a parameter list
assignment :: Parser () -- a parser for a single assignment statement
varDecls :: Parser [String] -- a parser for variable declarations of a given type

Test Cases

Paste the following pre-formatted test cases into your source file to test your work. The test cases are worth 3 points each.

testCase1 = "int foo1() { }"
testCase2 = "int foo2(char param1) { }"
testCase3 = "char foo3(char param1) { \
           \   return param1; \
           \ }"
testCase4 = "char foo4(char param1) { \
           \   return -1; \
           \ }"
testCase5 = "char foo5(char p1, char p2, int p3) { \
           \   return 0; \
           \ }"
testCase6 = "char foo6(char p1, char p2, int p3) { \
           \   char local1; \ 
           \   return local1; \
           \ }"
testCase7 = "char foo7(char p1, char p2, int p3) { \
           \   char l1, l2, l3; \ 
           \   int l4, l5, l6;  \ 
           \   return -1; \
           \ }"
testCase8 = "char foo8(int p1, int p2, int p3) { \
           \   char buf1; \ 
           \   int n, m; \
           \   char buf2; \
           \   n = m; \ 
           \   buf1 = 10; \
           \   buf2 = -20; \
           \   return 100; \
           \ }"

failCase1 = "foo() { }"
failCase2 = "void foo() { }"
failCase3 = "int foo(int) { }"
failCase4 = "int foo(int i,) { }"
failCase5 = "int foo(char i) { \
           \   return i \
           \ }"
failCase6 = "int foo(char i) { \
           \   char l \
           \   return l; \
           \ }"
failCase7 = "int foo(char i) { \
           \   return l; \
           \   char l; \
           \ }"
failCase8 = "int foo(char i) { \
           \   char l; \
           \   return l; \
           \   l = 10; \
           \ }"
failCase9 = "int foo(char i) { \
           \   l = 10; \
           \   char l; \
           \   return l; \
           \ }"