PL
1. The K summarizer
[PL, K] Last update: 8/4/2022, 3:15:07 PM
The vision of the current K framework is that it is a unifying formal semantics framework for all programming languages. Given any programming language L, we can define its formal semantics in K, from which all the language tools (like a parser, an interpreter, a deductive verifier, a model checker, etc.) of L will be automatically generated. This way, we only need to implement language tools generically, once and for all, and then instantiate them by a formal programming language semantics.
The K Summarizer pushes the above idea even further. It allows us to extract from a given language semantics and a given program in that language, a new semantics that is the K semantics of the program in language . In other words, is how we would implement the same program directly in K if we didn't have . The key idea is to summarize statically all possible known (symbolic) behaviors of program and leave for runtime only what cannot be inferred statically. The result of that is a control-flow graph of the program , where all the basic blocks (i.e., straight-line executions/code) are pre-computed and decisions only need to be made at certain choice points such as conditional statements, while-loops, etc.
Compared to the original semantics of , the new semantics is much simpler. Indeed, the entire C semantics can have thousands of semantic rules but the semantics of the following C program sum.c
that computes the sum from 1 to the input n
while(n >= 0) {
s += n;
n--;
}
can be implemented using the following two rules:
rule <k> while(n >= 0) { s += n; n-- } => .K </k>
<state> s |-> S,
n |-> N
</state>
requires notBool N >=Int 0
rule <k> while(n >= 0) { s += n; n-- } => while(n >= 0) { s += n; n-- } </k>
<state> s |-> (S => S +Int N),
n |-> (N => N -Int 1)
</state>
requires N >=Int 0
If we only care about the </state>
cell, we can simplify the configurations even further and have the entire semantics of sum.c
implemented in one rule:
rule <n, sum> => <n -Int 1, sum +Int n> requires n >Int 0
At a high level, the new semantics implements the control-flow graph of where each edge is an all-path reachability rule that summarizes a basic block. When branching occurs, we may split the state on the branch condition and continue execution from there.
This generalizes what compilers do, and also the classic "partial evaluation" approach, where a compiler is seen as a partial evaluation of an interpreter with the program (but not program input). The critical insight in our context is to do this at the level of a formal semantics, instead of a particular ad-hoc implementation of an interpreter.
The K summarize is currently under active development. At this point it is an umbrella project for three different things (all important):
- A web interface for the K prover and for K symbolic execution that allows visualizing K executions and manipulating them.
- A new interactive approach to using the K prover, where we expose a command-line tool that brings us some light interactivity to the K prover.
- The semantics-based compilation (SBC) project [link], which has the goal that we stated above to take a programming language , a program in that language, and produce a new programming language for that one program.
For anyone who wants to learn more about the K summarize we recommend the 3-hour K summarizer workshop recording where we covered the history, the current status, the theoretical foundation, and the future plans about the project.
Resources
[video] The 3-hour K summarizer workshop recording
[related project] Semantic-based compilation
Contact
Everett Hildenbrandt [github]
Nishant Rodrigues (nishant2@illinois.edu)
Xiaohong Chen (xc3@illinois.edu)
2. Semantics-based compilation
[PL, Logic, K] Last update: 8/3/2022, 8:21:37 PM
(The following project description is based on an old article Open Problems and Challenges in 2017. See The K Summarizer for the latest development of the project.)
An ideal language framework should allow us to generate compilers from language definitions. Specifically, we hope that soon the K framework will have the capability to take a language semantics and a program in that language, and generate an efficient binary for the program.
Conceptually, K already provides the ingredients needed for such a semantics-based compiler. One way to approach the problem is to use symbolic execution on all non-recursive non-iterative program fragments and calculate the mathematical summary of the fragment as a (potentially large) K rule; the left-hand side of that rule will contain the fragment. For example, consider the trivial IMP language and a while loop whose body contains no other while loops. Then we can conceptually replace the while loop body with a new statement, say stmt17093
, whose semantics is given with a K rule that symbolically accumulates the semantics of the individual statements that composed stmt17093
. We can do the same for all non-loop fragments of code, until we eventually obtain a program containing only while statements whose bodies are special statement constants like stmt17093
, each with its own K semantics.
Once the above is achieved, there are two more important components left. One is to translate the while statements into jump statements in the backend language, say LLVM. In the long term, this should be inferred automatically from the K semantics of the while statement. In the short term, we can get the user involved by asking them to provide a translation for the while loop (yes, this is not nice, but hey, getting a compiler for your language is a big deal). The other component is to translate the K rules for statements like stmt17093
into efficient target language code. We believe this is an orthogonal issue, which we want to have efficiently implemented in K anyway, as part of our effort on providing a fast execution engine.
Resources
[related project] K summarizer
Contact
Everett Hildenbrandt [github]
Nishant Rodrigues (nishant2@illinois.edu)
Xiaohong Chen (xc3@illinois.edu)
3. Specifying medical guidelines in K
[PL, K] Last update: 8/5/2022, 1:21:45 AM
MediK is a K-based domain-specific language (DSL) for expressing medical guidelines as concurrently executing finite state machines (FSMs). Hospitals and medical associations often publish flowchart based Clinical best-Practice Guidelines to codify standard treatment for conditions such as Cardiac Arrest. In MediK, we systematically expresses these guidelines, along with other relevant data, such as the state of the patient's organ systems, and instructions to/from Healthcare Professionals, as FSMs that interact via passing events. Since MediK guidelines are executable, they can be used in a hospital setting as correct-by-construction guidance systems that assist Healthcare Professionals. For example, consider this simple model of a patient's cardiovascular system, and some treatment:
machine CardiovascularSystem {
init state CardiovascularLoop {
// Code to execute on entry
entry(heartRateSensor) {
// Read sensor data
if (heartRateSensor.rate < 10) {
// declare an emergency
broadcast CardioEmergency;
}
sleep(10);
goto CardiovascularLoop;
}
}
machine Treatment receives CardioEmergency {
...
on CardioEmergency do {
// handle emergency
}
}
interface HeartRateSensor {
var rate;
}
In the example above, the CardiovascularSystem
FSM starts in the state CardiovascularLoop
, and uses a reference to a HeartRateSensor
FSM to obtain the patient's heart rate. If the obtained heart rate is too low, the CardiovascularSystem
FSM broadcasts a CardioEmergency
event. Note that the heart rate sensor itself is not implemented in MediK. In medical guidance systems, it is common to interact with various external agents such as sensors and databases. We handle such actors by treating them as FSMs with transition systems external to MediK. In our example, the HeartRateSensor is declared as interface
instead of a machine
, signifying that the transition system is implemented elsewhere. This allows us to use the existing event passing mechanism to interact with external components.
While K suppports I/O through "stdin" and "stdout" attributes on cells, the I/O mechanism itself is synchronous, as reading from or writing to an I/O cell is blocking, i.e., no other rule can apply until the read or write is complete. Synchronous I/O can be a limitation for languages like MediK, where rules responsible for execution must continue to apply while waiting for I/O to be completed. This necessitates the need for a generic asynchronous I/O mechanism in K that allows rules for I/O to not block the application of other rules.
Resources
[project] MediK [link]
Contact
Manasvi Saxena [github] (msaxena2@illinois.edu)
4. Intermediate verification languages (IVL) in K
[PL, K] Last update: 8/18/2022, 1:28:52 AM
Many programming languages provide verification tools via translation to the Boogie [link] intermediate verification language (IVL). Boogie provides powerful and efficient tools that allow the development of high-quality, easy-to-use annotation-based verifiers, and allows for modular verification of large software.
However, the translation from the target programming language to Boogie is essentially another compiler for the language, and, besides the additional manpower needed for developing this tranlation, it also introduces scope for divergence between the actual language behaviour, severely reducing the confidence in their correctness.
We believe that a solution to this extra complexity introduced by the translation is to use a semantics-based approach to language development. More specifically, we define the formal semantics of Boogie in K and let K fulfil many of the use cases of Boogie within this semantic-based approach. To achieve that, we need to define the following Boogie-powered features in K:
- loop invariants and function contracts specified in the programming language's concrete syntax,
- quantification over expressions,
- ghost variables,
- clear and easy-to-read error messages.
The goal of the project is a formal semantics of Boogie in K that supports all the above features.
Resources
[project] Boogie semantics in K [github]
Contact
Nishant Rodrigues (nishantjr@gmail.com)
5. ExeBench for RV-Match
[PL] Last update: 8/29/2022, 3:53:52 PM
Summary
ExeBench is the largest known database of compilable and executable C functions. By using a type-inferencer, the authors are able to "fill in" missing dependencies and make individual functions compile in isolation. Additionally, the dataset contains input-output pairs that specify how each program should behave (i.e. a set of unit tests).
Ideas
- Use the UB checker in RV-Match to try and quantify how frequent it is (by category etc.) in "real" code.
- Their idea is to use ExeBench as a training set for ML applications; cleaning out UB from that dataset (and vice-versa) would be an interesting task.
- Construction of a large correctness test suite for
kcc
.
Contacts
@Baltoli is in contact with Jordi Armengol.
Dataset
The dataset is hosted on HuggingFace.
Paper
Jordi Armengol-Estapé, Jackson Woodruff, Alexander Brauckmann, José Wesley de Souza Magalhães, and Michael F. P. O'Boyle. 2022. ExeBench: an ML-scale dataset of executable C functions. In Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming (MAPS 2022). Association for Computing Machinery, New York, NY, USA, 50–59. https://doi.org/10.1145/3520312.3534867
Solved
None