cprover
goto-programs/README.md
Go to the documentation of this file.
1 \ingroup module_hidden
2 \defgroup goto-programs goto-programs
3 
4 # Folder goto-programs
5 
6 \author Kareem Khazem, Martin Brain
7 
8 \section overview Overview
9 Goto programs are the intermediate representation of the CPROVER tool
10 chain. They are language independent and similar to many of the compiler
11 intermediate languages. Section \ref goto-programs describes the
12 `goto_programt` and `goto_functionst` data structures in detail. However
13 it useful to understand some of the basic concepts. Each function is a
14 list of instructions, each of which has a type (one of 18 kinds of
15 instruction), a code expression, a guard expression and potentially some
16 targets for the next instruction. They are not natively in static
17 single-assign (SSA) form. Transitions are nondeterministic (although in
18 practise the guards on the transitions normally cover form a disjoint
19 cover of all possibilities). Local variables have non-deterministic
20 values if they are not initialised. Variables and data within the
21 program is commonly one of three types (parameterised by width):
22 `unsignedbv_typet`, `signedbv_typet` and `floatbv_typet`, see
23 `util/std_types.h` for more information. Goto programs can be serialised
24 in a binary (wrapped in ELF headers) format or in XML (see the various
25 `_serialization` files).
26 
27 The `cbmc` option `–show-goto-programs` is often a good starting point
28 as it outputs goto-programs in a human readable form. However there are
29 a few things to be aware of. Functions have an internal name (for
30 example `c::f00`) and a ‘pretty name’ (for example `f00`) and which is
31 used depends on whether it is internal or being presented to the user.
32 The `main` method is the ‘logical’ main which is not necessarily the
33 main method from the code. In the output `NONDET` is use to represent a
34 nondeterministic assignment to a variable. Likewise `IF` as a beautified
35 `GOTO` instruction where the guard expression is used as the condition.
36 `RETURN` instructions may be dropped if they precede an `END_FUNCTION`
37 instruction. The comment lines are generated from the `locationt` field
38 of the `instructiont` structure.
39 
40 `goto-programs/` is one of the few places in the CPROVER codebase that
41 templates are used. The intention is to allow the general architecture
42 of program and functions to be used for other formalisms. At the moment
43 most of the templates have a single instantiation; for example
44 `goto_functionst` and `goto_function_templatet` and `goto_programt` and
45 `goto_program_templatet`.
46 
47 \section data_structures Data Structures
48 
49 FIXME: This text is partially outdated.
50 
51 The common starting point for working with goto-programs is the
52 `read_goto_binary` function which populates an object of
53 `goto_functionst` type. This is defined in `goto_functions.h` and is an
54 instantiation of the template `goto_functions_templatet` which is
55 contained in `goto_functions_template.h`. They are wrappers around a map
56 from strings to `goto_programt`’s and iteration macros are provided.
57 Note that `goto_function_templatet` (no `s`) is defined in the same
58 header as `goto_functions_templatet` and is gives the C type for the
59 function and Boolean which indicates whether the body is available
60 (before linking this might not always be true). Also note the slightly
61 counter-intuitive naming; `goto_functionst` instances are the top level
62 structure representing the program and contain `goto_programt` instances
63 which represent the individual functions. At the time of writing
64 `goto_functionst` is the only instantiation of the template
65 `goto_functions_templatet` but other could be produced if a different
66 data-structures / kinds of models were needed for functions.
67 
68 `goto_programt` is also an instantiation of a template. In a similar
69 fashion it is `goto_program_templatet` and allows the types of the guard
70 and expression used in instructions to be parameterised. Again, this is
71 currently the only use of the template. As such there are only really
72 helper functions in `goto_program.h` and thus `goto_program_template.h`
73 is probably the key file that describes the representation of (C)
74 functions in the goto-program format. It is reasonably stable and
75 reasonably documented and thus is a good place to start looking at the
76 code.
77 
78 An instance of `goto_program_templatet` is effectively a list of
79 instructions (and inner template called `instructiont`). It is important
80 to use the copy and insertion functions that are provided as iterators
81 are used to link instructions to their predecessors and targets and
82 careless manipulation of the list could break these. Likewise there are
83 helper macros for iterating over the instructions in an instance of
84 `goto_program_templatet` and the use of these is good style and strongly
85 encouraged.
86 
87 Individual instructions are instances of type `instructiont`. They
88 represent one step in the function. Each has a type, an instance of
89 `goto_program_instruction_typet` which denotes what kind of instruction
90 it is. They can be computational (such as `ASSIGN` or `FUNCTION_CALL`),
91 logical (such as `ASSUME` and `ASSERT`) or informational (such as
92 `LOCATION` and `DEAD`). At the time of writing there are 18 possible
93 values for `goto_program_instruction_typet` / kinds of instruction.
94 Instructions also have a guard field (the condition under which it is
95 executed) and a code field (what the instruction does). These may be
96 empty depending on the kind of instruction. In the default
97 instantiations these are of type `exprt` and `codet` respectively and
98 thus covered by the previous discussion of `irept` and its descendents.
99 The next instructions (remembering that transitions are guarded by
100 non-deterministic) are given by the list `targets` (with the
101 corresponding list of labels `labels`) and the corresponding set of
102 previous instructions is get by `incoming_edges`. Finally `instructiont`
103 have informational `function` and `location` fields that indicate where
104 they are in the code.
105 
106 \section goto-conversion Goto Conversion
107 
108 In the \ref goto-programs directory.
109 
110 **Key classes:**
111 * goto_programt
112 * goto_functionst
113 * \ref goto_programt::instructiont
114 
115 \dot
116 digraph G {
117  node [shape=box];
118  rankdir="LR";
119  1 [shape=none, label=""];
120  2 [label="goto conversion"];
121  3 [shape=none, label=""];
122  1 -> 2 [label="Symbol table"];
123  2 -> 3 [label="goto-programs, goto-functions, symbol table"];
124 }
125 \enddot
126 
127 At this stage, CBMC constructs a goto-program from a symbol table. It
128 does not use the parse tree or the source file at all for this step. This
129 may seem surprising, because the symbols are stored in a hash table and
130 therefore have no intrinsic order; nevertheless, every \ref symbolt is
131 associated with a \ref source_locationt, allowing CBMC to figure out the
132 lexical order.
133 
134 The structure of what is informally called a goto-program follows. The
135 entire target program is converted to a single \ref goto_functionst
136 object. The goto functions contains a set of \ref goto_programt objects;
137 each of these correspond to a "function" or "method" in the target
138 program. Each goto_program contains a list of
139 \ref goto_programt::instructiont "instructions"; each
140 instruction contains an associated statement---these are subtypes of
141 \ref codet. Each instruction also contains a "target", which will be
142 empty for now.
143 
144 \dot
145 digraph G{
146  graph [nojustify=true];
147  node [shape=box];
148  compound=true;
149 
150  subgraph cluster_src {
151  1 [shape="none", label="source files"];
152  2 [label="file1.c\n-----------\nint main(){
153  int x = 5;
154  if(x > 7){
155  x = 9; } }
156 
157 void foo(){}"];
158  1 -> 2 [color=white];
159 
160  100 [label="file2.c\n--------\nchar bar(){ }"];
161  2 -> 100 [color=white];
162  }
163 
164  1 -> 3 [label="corresponds to", lhead=cluster_goto,
165  ltail=cluster_src];
166 
167  subgraph cluster_goto {
168  3 [label="a\ngoto_functionst", URL="\ref goto_functionst", shape=none];
169  4 [label="function_map\n(a map from irep_idt\nto goto_programt)"];
170  }
171  4 -> 5 [lhead=cluster_funmap, label="value:"];
172 
173  subgraph cluster_funmap {
174  9 [label="bar\n(an irep_idt)", URL="\ref irep_idt"];
175  10 [label="a goto_programt", URL="\ref goto_programt"];
176  9->10 [label="maps to"];
177 
178  5 [label="main\n(an irep_idt)", URL="\ref irep_idt"];
179 
180  7 [label="foo\n(an irep_idt)", URL="\ref irep_idt"];
181  8 [label="a goto_programt", URL="\ref goto_programt"];
182  7->8 [label="maps to"];
183 
184  subgraph cluster_goto_program {
185  11 [shape=none, label="a\ngoto_programt", URL="\ref goto_programt"];
186  12 [label="instructions\n(a list of instructiont)"];
187  }
188  5 -> 11 [lhead=cluster_goto_program, label="maps to:"];
189 
190  }
191 
192  12 -> target1 [lhead=cluster_instructions];
193 
194  subgraph cluster_instructions {
195  subgraph cluster_ins1{
196  code1 [label="code", URL="\ref codet"];
197  target1 [label="target"];
198  }
199 
200  target1 -> target2 [color=white,lhead=cluster_ins2,
201  ltail=cluster_ins1];
202 
203  subgraph cluster_ins2{
204  code2 [label="code", URL="\ref codet"];
205  target2 [label="target"];
206  }
207 
208  target2 -> target3 [color=white,lhead=cluster_ins3,
209  ltail=cluster_ins2];
210 
211  subgraph cluster_ins3{
212  code3 [label="code", URL="\ref codet"];
213  target3 [label="target"];
214  }
215 
216  target3 -> target4 [color=white,lhead=cluster_ins4,
217  ltail=cluster_ins3];
218 
219  subgraph cluster_ins4{
220  code4 [label="code", URL="\ref codet"];
221  target4 [label="target"];
222  }
223 
224  }
225 
226  subgraph cluster_decl {
227  decl1 [label="type:\ncode_declt", URL="\ref code_declt",
228  shape=none];
229  subgraph cluster_decl_in{
230  cluster_decl_in_1 [label="symbol()", shape=none];
231  cluster_decl_in_2 [label="x"];
232  cluster_decl_in_1 -> cluster_decl_in_2 [color=white];
233  }
234  decl1 -> cluster_decl_in_1 [lhead="cluster_decl_in", color=white];
235  }
236  code1 -> decl1 [lhead=cluster_decl];
237 
238  subgraph cluster_assign1 {
239  assign1 [label="type:\ncode_assignt", URL="\ref code_assignt",
240  shape=none];
241  subgraph cluster_assign1_in{
242  cluster_assign1_in_1 [label="lhs()", shape=none];
243  cluster_assign1_in_2 [label="x"];
244  cluster_assign1_in_1 -> cluster_assign1_in_2 [color=white];
245 
246  cluster_assign1_in_3 [label="rhs()", shape=none];
247  cluster_assign1_in_4 [label="5"];
248  cluster_assign1_in_3 -> cluster_assign1_in_4 [color=white];
249  }
250  assign1 -> cluster_assign1_in_1 [lhead="cluster_assign1_in", color=white];
251  }
252  code2 -> assign1 [lhead=cluster_assign1];
253 
254  subgraph cluster_if {
255  if [label="type:\ncode_ifthenelset", URL="\ref code_ifthenelset",
256  shape=none];
257  if_body [label="..."];
258  if -> if_body [color=white];
259  }
260  code3 -> if [lhead=cluster_if];
261 
262  subgraph cluster_assign2 {
263  assign2 [label="type:\ncode_assignt", URL="\ref code_assignt",
264  shape=none];
265  subgraph cluster_assign2_in{
266  cluster_assign2_in_1 [label="lhs()", shape=none];
267  cluster_assign2_in_2 [label="x"];
268  cluster_assign2_in_1 -> cluster_assign2_in_2 [color=white];
269 
270  cluster_assign2_in_3 [label="rhs()", shape=none];
271  cluster_assign2_in_4 [label="9"];
272  cluster_assign2_in_3 -> cluster_assign2_in_4 [color=white];
273  }
274  assign2 -> cluster_assign2_in_1 [lhead="cluster_assign2_in", color=white];
275  }
276  code4 -> assign2 [lhead=cluster_assign2];
277 
278 }
279 \enddot
280 
281 This is not the final form of the goto-functions, since the lists of
282 instructions will be 'normalized' in the next step (Instrumentation),
283 which removes some instructions and adds targets to others.
284 
285 Note that goto_programt and goto_functionst are each template
286 instantiations; they are currently the *only* specialization of
287 goto_program_templatet and goto_functions_templatet, respectively. This
288 means that the generated Doxygen documentation can be somewhat obtuse
289 about the actual types of things, and is unable to generate links to the
290 correct classes. Note that the
291 \ref goto_programt::instructiont::code "code" member of a
292 goto_programt's instruction has type \ref codet (its type in the
293 goto_program_templatet documentation is given as "codeT", as this is the
294 name of the template's type parameter); similarly, the type of a guard
295 of an instruction is \ref guardt.
296 
297 ---
298 \section instrumentation Instrumentation
299 
300 In the \ref goto-programs directory.
301 
302 **Key classes:**
303 * goto_programt
304 * goto_functionst
305 * \ref goto_programt::instructiont
306 
307 \dot
308 digraph G {
309  node [shape=box];
310  rankdir="LR";
311  1 [shape=none, label=""];
312  2 [label="goto conversion"];
313  3 [shape=none, label=""];
314  1 -> 2 [label="goto-programs, goto-functions, symbol table"];
315  2 -> 3 [label="transformed goto-programs"];
316 }
317 \enddot
318 
319 This stage applies several transformations to the goto-programs from the
320 previous stage:
321 
322 * The diagram in the previous stage showed a goto_programt with four
323  instructions, but real programs usually yield instruction lists that
324  are littered with \ref code_skipt "skip" statements. The
325  instrumentation stage removes the majority of these.
326 
327 * Function pointers are removed. They are turned into switch statements
328  (but see the next point; switch statements are further transformed).
329 
330 * Compound blocks are eliminated. There are several subclasses of
331  \ref codet that count as 'compound blocks;' therefore, later stages in
332  the CBMC pipeline that switch over the codet subtype of a particular
333  instruction should not need to consider these types. In particular:
334 
335  * code_ifthenelset is turned into GOTOs. In particular, the bodies of
336  the conditionals are flattened into lists of instructions, inline
337  with the rest of the instruction in the goto_programt. The guard of
338  the conditional is placed onto the
339  \ref goto_programt::instructiont::guard "guard" member of
340  an instruction whose code member is of type \ref code_gotot. The
341  \ref goto_programt::instructiont::targets "targets" member
342  of that instruction points to the appropriate branch of the
343  conditional. (Note that although instructions have a list of
344  targets, in practice an instruction should only ever have at most
345  one target; you should check this invariant with an assertion if you
346  rely on it).
347 
348  The order of instructions in a list of instructions---as well as the
349  targets of GOTOs---are both displayed as arrows when viewing a
350  goto-program as a Graphviz DOT file with `goto-instrument --dot`.
351  The semantics of a goto-program is: the next instruction is the next
352  instruction in the list, unless the current instruction has a
353  target; in that case, check the guard of the instruction, and jump
354  to the target if the guard evaluates to true.
355 
356  * switch statements, for and while loops, and try-catches are also
357  transformed into lists of instructions guarded by GOTOs.
358 
359  * \ref code_blockt "code blocks" are transformed into lists of
360  instructions.
361 
362 * \ref code_returnt "return statements" are transformed into
363  (unconditional) GOTOs whose target is the \ref END_FUNCTION
364  instruction. Each goto_programt should have precisely one such
365  instruction. Note the presence of \ref code_deadt, which has a
366  \ref code_deadt::symbol() "symbol()" member. Deads mark symbols that
367  have just gone out of scope; typically, a GOTO that jumps to an
368  END_FUNCTION instruction is preceded by a series of deads. Deads also
369  follow sequences of instructions that were part of the body of a
370  block (loop, conditional etc.) if there were symbols declared in that
371  block.
372 
373 This stage concludes the *analysis-independent* program transformations.