Skip to content
Snippets Groups Projects
Commit 02545e55 authored by gyuri's avatar gyuri
Browse files

Fix some typos/grammar in the paper.

parent 88b61f4d
No related branches found
No related tags found
No related merge requests found
......@@ -52,7 +52,7 @@ used to represent the lexical rules. I used \texttt{flex}~\cite{flex} to
generate my lexer.
Fortunately, the lexical rules for C are quite simple. (The preprocessor is not
implemented here, the lexer expects a preprocessed C source.) We need to
implemented here, my lexer expects a preprocessed C source.) We need to
recognize comments, string literals, numeric literals, keywords, identifiers,
and operators. Comments and whitespace are ignored (we simply don't emit any
tokens). (Whitespace is only used to separate otherwise ambiguous tokens.)
......@@ -105,11 +105,7 @@ unary_operator : AND { $$ = AST_UNARY_REF; }
\end{center}
These are then used in higher level rules. Here, we recognize unary
expressions. Note, that increment and decrement operators are handled
separately, since for these, there is no direct correspondence between a token,
and the operation it denotes. A \texttt{++} token can denote either a pre, or a
post increment operator. This grammar rule only recognizes prefix operators.
(The postfix variants are handled by a different rule.)
expressions:
\begin{center}
\begin{BVerbatim}
unary_expression : postfix_expression { $$ = $1; }
......@@ -119,9 +115,18 @@ unary_expression : postfix_expression { $$ = $1; }
;
\end{BVerbatim}
\end{center}
Note, that increment and decrement operators are handled separately, since for
these, there is no direct correspondence between a token, and the operation it
denotes. A \texttt{++} token can denote either a pre, or a post increment
operator. This grammar rule only recognizes prefix operators. (The postfix
variants are handled by a different rule.)
At the top, we arrive at the \texttt{translation\_unit} rule. It simply says
that a C file is a list of function definitions, and declarations.
that a C file is a list of function definitions, and declarations. (The actions
for constructing the syntax tree can seem a bit complicated at first, because
of recursive definition the grammar uses for lists. Basically, at the first
\texttt{external\_declaration} we initialize the list with a single item, and
for each subsequent item, we append it to the end.)
\begin{center}
\begin{BVerbatim}
translation_unit : external_declaration { $$ = ast_translation_unit($1); }
......@@ -147,7 +152,7 @@ a * b;
\end{BVerbatim}
\end{center}
Whether this is a multiplication experssion, or a declaration depends on
Whether this is a multiplication expression, or a declaration depends on
whether \texttt{a} is a typedef name.
To make matters worse, typedef names are also have to adhere to scoping:
......@@ -173,7 +178,11 @@ For an in-depth discussion about \emph{correct} C parsing, see
\begin{figure}
\centering
\begin{tikzpicture}[ scale=0.6, level 2/.style={sibling distance=80mm} ]
\begin{tikzpicture}[
scale=0.6,
level 2/.style={sibling distance=75mm},
level 3/.style={sibling distance=50mm}
]
\node (a) {translation\_unit}
child {
node (b) {function\_definition}
......@@ -224,8 +233,9 @@ The syntax tree is made up of polymorphic nodes. The leaves of the tree are
usually identifiers or literals. (We can also have for example an empty
expression statement, but this is unusual.)
Let's look at the following program.~\footnote{This program is only
syntactically correct. \texttt{a} and \texttt{b} are undeclared identifiers.}
Let's look at the following program.~\footnote{This program is syntactically
correct, but it would not compile: \texttt{a} and \texttt{b} are undeclared
identifiers.}
(See Figure~\ref{fig:ast} for the syntax tree.)
\begin{center}
\begin{BVerbatim}
......@@ -307,7 +317,8 @@ mov dword [rbp-8], eax ; store
If the expression is an identifier, we look up the corresponding variable in
our current stack of scopes, and return its value handle.
If the expression is a unary or binary operator, the generally following scheme is followed:
If the expression is a unary or binary operator, generally the following scheme
is followed:
\begin{enumerate}
\item Call the expression code generator for each operand.
\item Emit code for performing the operation.
......@@ -352,7 +363,7 @@ mov dword [rbp-8], eax ; store
Along with the data flow of the values, we also track types.
Some operations don't change the type of their operands, and the resulting
value simply hase the same type as the operand.
value simply has the same type as the operand.
Some operations can change the type. The three main ones are:
\begin{itemize}
......@@ -451,8 +462,8 @@ jmp label\_0 \\
label\_1:
}
For compound statements, we iterate its children, and call either the statement
or the declaration code generator.
For compound statements, we iterate through its children, and call either the
statement or the declaration code generator.
\subsection{Declarations}
Every declaration consists of a set of declaration specifiers (such as storage
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment