Skip to content

Commit 8c4290c

Browse files
committed
Updated readme
1 parent 327755e commit 8c4290c

File tree

3 files changed

+94
-37
lines changed

3 files changed

+94
-37
lines changed

README.md

Lines changed: 94 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -1,51 +1,108 @@
1-
# Universal Symbolic Virtual Machine
1+
# What is USVM?
22

3-
**Universal Symbolic Virtual Machine**, or **USVM**, is an ultimately powerful _language-agnostic_ core for implementing
4-
custom symbolic execution based products.
3+
USVM is a powerful symbolic execution core designed for analysis of programs in multiple programming languages. Symbolic execution is known to be a very precise but resource demanding technique. USVM makes it faster, more stable, and versatile.
54

6-
## How we got here
5+
USVM main features include
76

8-
USVM is the result of years of experience designing symbolic execution engines for various programming languages. Many
9-
engines have a lot in common — so why don't we extract and polish it? Many of them still feature different advantages
10-
— so why don't we unify them to get the most out of symbolic execution?
7+
* extensible and highly optimized symbolic memory model
8+
* optimized constraint management to reduce SMT solver workload
9+
* improved symbolic models for containers (`Map`, `Set`, `List`, etc.)
10+
* targeted symbolic execution
11+
* solving type constraints with no SMT solver involved
12+
* bit-precise reasoning
13+
* forward and backward symbolic exploration
1114

12-
USVM abstracts language primitives into the generic ones, so, with a simple DSL provided, you are free to implement an
13-
interpreter for your language. USVM integrates the particular symbolic execution enhancements into one compact form
14-
— an
15-
efficient and configurable
16-
core with a unified API.
15+
# How can I use it?
1716

18-
## Key features
17+
With USVM, you can achieve completely automated
18+
* [static code analysis](#taint-analysis-with-usvm),
19+
* [unit test cases generation](#usvm-for-unit-test-generation),
20+
* [targeted fuzzing and more symbolic execution based solutions](#using-usvm-to-confirm-sarif-reports).
1921

20-
What does it mean to be an _efficient_ core?
22+
Right now, we have ready-to-be-used implementation for [Java](https://github.com/UnitTestBot/usvm/tree/main/usvm-jvm) and experimental implementation for [Python](https://github.com/UnitTestBot/usvm/tree/tochilinak/python/usvm-python).
2123

22-
The USVM core redefines language primitives, so they become symbolic and language-independent, and provides us
23-
with a range of benefits:
24-
* optimized constraint management to reduce SMT solver workload;
25-
* extensible symbolic memory model;
26-
* forward and backward symbolic exploration;
27-
* improved symbolic models for containers (`map`, `set`, `list`, etc.);
28-
* solving type constraints with no SMT solver involved;
29-
* bit-precise reasoning.
24+
# Taint analysis with USVM
3025

31-
## Language-specific implementations
26+
USVM supports interprocedural condition-sensitive taint analysis of JVM bytecode. For instance, it is able to automatically find the following SQL injection:
27+
![True positive sample](./docs/assets/images/injection.png)
3228

33-
We are now developing a user-friendly USVM API, so that you can easily adapt the core to analyzing programs
34-
in the required language.
29+
By default, USVM is able to find other problems:
30+
* null reference exceptions,
31+
* out-of-bounds access for collections,
32+
* integer overflows,
33+
* division by zero.
3534

36-
For now, we have already implemented symbolic interpreters for _Java_ and _Python_ (the latter is an experimental but
37-
working example). And the interpreters for _Go_ and _JavaScript_ are under development, so stay tuned.
35+
You can also extend its analysis rules by [writing custom checkers](#writing-custom-checkers).
3836

39-
These ready-to-use symbolic interpreters are for the managed languages mostly, but you have everything you need to
40-
analyze programs in whatever language — whether it is _C++_ or an exotic (or even custom) one.
4137

42-
## Applicable scope
38+
You can run USVM in your repo CI by configuring the [ByteFlow](https://github.com/UnitTestBot/byteflow) runner.
4339

44-
Symbolic execution is known to be a powerful but slow and demanding technique. USVM makes it faster, more stable, and
45-
versatile.
40+
## About condition-sensitive analysis
4641

47-
You can build custom symbolic execution engines with the USVM core inside to show better performance in
48-
* test generation,
49-
* static analysis,
50-
* verification
51-
* targeted fuzzing,</br>and more symbolic execution based solutions.
42+
If we modify the above program a little, things change drastically:
43+
![False positive sample](./docs/assets/images/injection_fp.png)
44+
45+
All interprodecural dataflow analysers we've tried report the similar warning for this program. However, this is false alarm: untrusted data is read only in development mode (that is, when `production` field is false), but the real database query happens only in production mode.
46+
47+
The reason why the existing analysers are wrong is the lack of condition-sensitive analysis: they simply do not understand that untrusted data is emitted only under conditions that prevent program from getting into `checkUserIsAdminProd` method.
48+
49+
The major reason for this is that condition-sensitive analysis is complex and expensive. USVM makes condition-sensitive analysis robust and scalable. In particular, USVM does not report warning in this program.
50+
51+
You can run the this example online in our [demo repository](https://github.com/unitTestBot/byteflow/security/code-scanning).
52+
53+
# Writing custom checkers
54+
55+
USVM allows to customize its behaviour by writing custom analysis rules. To achieve this, USVM can share its internal analysis states into the attached *interpreter observers*. So the first step to write a custom checker is to implement [`JcInterpreterObserver`](https://github.com/UnitTestBot/usvm/blob/b6ed4682063f1ff6008b3f3c8aa15be663706c74/usvm-jvm/src/main/kotlin/org/usvm/machine/JcInterpreterObserver.kt) interface. This observer can be attached to symbolic machine [in its constructor](https://github.com/UnitTestBot/usvm/blob/b6ed4682063f1ff6008b3f3c8aa15be663706c74/usvm-jvm/src/main/kotlin/org/usvm/machine/JcMachine.kt#L35C17-L35C36).
56+
57+
Now, before every instruction gets symbolically executed, the symbolic engine will notify the observer about the next instruction. For instance, if the engine has reached the `throw` instruction, the attached observer will recieve the corresponding event:
58+
```kotlin
59+
fun onThrowStatement(simpleValueResolver: JcSimpleValueResolver, stmt: JcThrowInst, stepScope: JcStepScope)
60+
```
61+
62+
Here, `stmt` represents the instruction which was proven to be reachable. If, for example, your analysis looks for reachability of `HorrificException`, you can look into type of `stmt.throwable`.
63+
64+
The internal state of the analysis is stored into `stepScope`. In fact, your checker gets the representation of the whole program state in the branch that reaches `stmt`. You can query the arbitrary data stored into state or even modify it (allocate new information or modify the existing) using `stepScope.calcOnState`. For example, to allocate new memory fragment in program state, you can write
65+
```kotlin
66+
stepScope.calcOnState { memory.allocateConcreteRef() }
67+
```
68+
69+
You can compute the validity of arbitrary logical predicates on state. Warning: this can cause queries to SMT solver, which can be time and memory demanding. To query the validity, you can write
70+
```kotlin
71+
stepScope.calcOnState {
72+
clone().assert(your condition)?.let {
73+
... handling case when condition is satisfiable ...
74+
}
75+
}
76+
```
77+
78+
To form and report warnings in SARIF format, use [built-in reporters](https://github.com/UnitTestBot/jacodb/blob/e61c2fa41533a2f0a39fad0beb220c3350987345/jacodb-analysis/src/main/kotlin/org/jacodb/analysis/sarif/DataClasses.kt#L137).
79+
80+
You can browse the [existing checkers](https://github.com/UnitTestBot/usvm/blob/b6ed4682063f1ff6008b3f3c8aa15be663706c74/usvm-jvm/src/main/kotlin/org/usvm/api/targets/TaintAnalysis.kt) for more examples and details.
81+
82+
# Using USVM to confirm SARIF reports
83+
84+
In lots of cases, the exising static code analysers [report false alarms](#about-condition-sensitive-analysis). USVM has ability to confirm or reject the reported warnings.
85+
86+
To run USVM in trace reproduction mode, configure one of the [analyses](https://github.com/UnitTestBot/usvm/blob/saloed/usvm-demo/usvm-jvm/src/main/kotlin/org/usvm/api/targets/TaintAnalysis.kt) and [pass a set of traces](https://github.com/UnitTestBot/usvm/blob/a1e931e5e51f463ed4a33009cee1ffa01cd375bd/usvm-jvm/src/main/kotlin/org/usvm/api/targets/TaintAnalysis.kt#L29) into [JcMachine](https://github.com/UnitTestBot/usvm/blob/main/usvm-jvm/src/main/kotlin/org/usvm/machine/JcMachine.kt).
87+
88+
Also, this process can be customized by a [rich set of options](https://github.com/UnitTestBot/usvm/blob/main/usvm-util/src/main/kotlin/org/usvm/UMachineOptions.kt).
89+
90+
# USVM for unit test generation
91+
92+
USVM has ability to discover all possible behaviours of a program. This is a key feature used in white-box test generation engines. In future, USVM will be the default code analysis engine in [UnitTestBot for Java](https://github.com/UnitTestBot/utbotjava).
93+
94+
95+
# Other languages support in USVM
96+
97+
[USVM.Core](https://github.com/UnitTestBot/usvm/tree/main/usvm-core) is a framework which provides highly optimized primitives for symbolic execution:
98+
* construction and manipulation with symbolic expressions (based on [KSMT](https://github.com/UnitTestBot/ksmt) platform);
99+
* advanced modeling of [memory operations](https://github.com/UnitTestBot/usvm/tree/main/usvm-core/src/main/kotlin/org/usvm/memory);
100+
* efficient [constraint managenent](https://github.com/UnitTestBot/usvm/tree/main/usvm-core/src/main/kotlin/org/usvm/constraints) and [constraint solving](https://github.com/UnitTestBot/usvm/tree/main/usvm-core/src/main/kotlin/org/usvm/solver);
101+
* rich set of [search strategies](https://github.com/UnitTestBot/usvm/tree/main/usvm-core/src/main/kotlin/org/usvm/ps) in giant branching spaces;
102+
* special support of [standard collections](https://github.com/UnitTestBot/usvm/tree/main/usvm-core/src/main/kotlin/org/usvm/collection);
103+
* special support of [type system](https://github.com/UnitTestBot/usvm/tree/main/usvm-core/src/main/kotlin/org/usvm/types) constraints;
104+
* collection and reporting of [statistics](https://github.com/UnitTestBot/usvm/tree/main/usvm-core/src/main/kotlin/org/usvm/statistics).
105+
106+
Thus, USVM.Core implements common primitives used in programming languages. This makes much easier instantiating USVM for new programming language: in fact, to support a programming language, you only need to write its interpreter in terms of operations provided by USVM.Core.
107+
108+
If you want to support a new language, please take a look at [sample language support](https://github.com/UnitTestBot/usvm/tree/main/usvm-sample-language) in USVM.

docs/assets/images/injection.png

121 KB
Loading

docs/assets/images/injection_fp.png

149 KB
Loading

0 commit comments

Comments
 (0)