|
1 |
| -# Universal Symbolic Virtual Machine |
| 1 | +# What is USVM? |
2 | 2 |
|
3 |
| -**Universal Symbolic Virtual Machine**, or **USVM**, is an ultimately powerful _language-agnostic_ core for implementing |
4 |
| -custom symbolic execution based products. |
| 3 | +USVM is a powerful symbolic execution core designed for analysis of programs in multiple programming languages. Symbolic execution is known to be a very precise but resource demanding technique. USVM makes it faster, more stable, and versatile. |
5 | 4 |
|
6 |
| -## How we got here |
| 5 | +USVM main features include |
7 | 6 |
|
8 |
| -USVM is the result of years of experience designing symbolic execution engines for various programming languages. Many |
9 |
| -engines have a lot in common — so why don't we extract and polish it? Many of them still feature different advantages |
10 |
| -— so why don't we unify them to get the most out of symbolic execution? |
| 7 | +* extensible and highly optimized symbolic memory model |
| 8 | +* optimized constraint management to reduce SMT solver workload |
| 9 | +* improved symbolic models for containers (`Map`, `Set`, `List`, etc.) |
| 10 | +* targeted symbolic execution |
| 11 | +* solving type constraints with no SMT solver involved |
| 12 | +* bit-precise reasoning |
| 13 | +* forward and backward symbolic exploration |
11 | 14 |
|
12 |
| -USVM abstracts language primitives into the generic ones, so, with a simple DSL provided, you are free to implement an |
13 |
| -interpreter for your language. USVM integrates the particular symbolic execution enhancements into one compact form |
14 |
| -— an |
15 |
| -efficient and configurable |
16 |
| -core with a unified API. |
| 15 | +# How can I use it? |
17 | 16 |
|
18 |
| -## Key features |
| 17 | +With USVM, you can achieve completely automated |
| 18 | +* [static code analysis](#taint-analysis-with-usvm), |
| 19 | +* [unit test cases generation](#usvm-for-unit-test-generation), |
| 20 | +* [targeted fuzzing and more symbolic execution based solutions](#using-usvm-to-confirm-sarif-reports). |
19 | 21 |
|
20 |
| -What does it mean to be an _efficient_ core? |
| 22 | +Right now, we have ready-to-be-used implementation for [Java](https://github.com/UnitTestBot/usvm/tree/main/usvm-jvm) and experimental implementation for [Python](https://github.com/UnitTestBot/usvm/tree/tochilinak/python/usvm-python). |
21 | 23 |
|
22 |
| -The USVM core redefines language primitives, so they become symbolic and language-independent, and provides us |
23 |
| -with a range of benefits: |
24 |
| -* optimized constraint management to reduce SMT solver workload; |
25 |
| -* extensible symbolic memory model; |
26 |
| -* forward and backward symbolic exploration; |
27 |
| -* improved symbolic models for containers (`map`, `set`, `list`, etc.); |
28 |
| -* solving type constraints with no SMT solver involved; |
29 |
| -* bit-precise reasoning. |
| 24 | +# Taint analysis with USVM |
30 | 25 |
|
31 |
| -## Language-specific implementations |
| 26 | +USVM supports interprocedural condition-sensitive taint analysis of JVM bytecode. For instance, it is able to automatically find the following SQL injection: |
| 27 | + |
32 | 28 |
|
33 |
| -We are now developing a user-friendly USVM API, so that you can easily adapt the core to analyzing programs |
34 |
| -in the required language. |
| 29 | +By default, USVM is able to find other problems: |
| 30 | +* null reference exceptions, |
| 31 | +* out-of-bounds access for collections, |
| 32 | +* integer overflows, |
| 33 | +* division by zero. |
35 | 34 |
|
36 |
| -For now, we have already implemented symbolic interpreters for _Java_ and _Python_ (the latter is an experimental but |
37 |
| -working example). And the interpreters for _Go_ and _JavaScript_ are under development, so stay tuned. |
| 35 | +You can also extend its analysis rules by [writing custom checkers](#writing-custom-checkers). |
38 | 36 |
|
39 |
| -These ready-to-use symbolic interpreters are for the managed languages mostly, but you have everything you need to |
40 |
| -analyze programs in whatever language — whether it is _C++_ or an exotic (or even custom) one. |
41 | 37 |
|
42 |
| -## Applicable scope |
| 38 | +You can run USVM in your repo CI by configuring the [ByteFlow](https://github.com/UnitTestBot/byteflow) runner. |
43 | 39 |
|
44 |
| -Symbolic execution is known to be a powerful but slow and demanding technique. USVM makes it faster, more stable, and |
45 |
| -versatile. |
| 40 | +## About condition-sensitive analysis |
46 | 41 |
|
47 |
| -You can build custom symbolic execution engines with the USVM core inside to show better performance in |
48 |
| -* test generation, |
49 |
| -* static analysis, |
50 |
| -* verification |
51 |
| -* targeted fuzzing,</br>and more symbolic execution based solutions. |
| 42 | +If we modify the above program a little, things change drastically: |
| 43 | + |
| 44 | + |
| 45 | +All interprodecural dataflow analysers we've tried report the similar warning for this program. However, this is false alarm: untrusted data is read only in development mode (that is, when `production` field is false), but the real database query happens only in production mode. |
| 46 | + |
| 47 | +The reason why the existing analysers are wrong is the lack of condition-sensitive analysis: they simply do not understand that untrusted data is emitted only under conditions that prevent program from getting into `checkUserIsAdminProd` method. |
| 48 | + |
| 49 | +The major reason for this is that condition-sensitive analysis is complex and expensive. USVM makes condition-sensitive analysis robust and scalable. In particular, USVM does not report warning in this program. |
| 50 | + |
| 51 | +You can run the this example online in our [demo repository](https://github.com/unitTestBot/byteflow/security/code-scanning). |
| 52 | + |
| 53 | +# Writing custom checkers |
| 54 | + |
| 55 | +USVM allows to customize its behaviour by writing custom analysis rules. To achieve this, USVM can share its internal analysis states into the attached *interpreter observers*. So the first step to write a custom checker is to implement [`JcInterpreterObserver`](https://github.com/UnitTestBot/usvm/blob/b6ed4682063f1ff6008b3f3c8aa15be663706c74/usvm-jvm/src/main/kotlin/org/usvm/machine/JcInterpreterObserver.kt) interface. This observer can be attached to symbolic machine [in its constructor](https://github.com/UnitTestBot/usvm/blob/b6ed4682063f1ff6008b3f3c8aa15be663706c74/usvm-jvm/src/main/kotlin/org/usvm/machine/JcMachine.kt#L35C17-L35C36). |
| 56 | + |
| 57 | +Now, before every instruction gets symbolically executed, the symbolic engine will notify the observer about the next instruction. For instance, if the engine has reached the `throw` instruction, the attached observer will recieve the corresponding event: |
| 58 | +```kotlin |
| 59 | +fun onThrowStatement(simpleValueResolver: JcSimpleValueResolver, stmt: JcThrowInst, stepScope: JcStepScope) |
| 60 | +``` |
| 61 | + |
| 62 | +Here, `stmt` represents the instruction which was proven to be reachable. If, for example, your analysis looks for reachability of `HorrificException`, you can look into type of `stmt.throwable`. |
| 63 | + |
| 64 | +The internal state of the analysis is stored into `stepScope`. In fact, your checker gets the representation of the whole program state in the branch that reaches `stmt`. You can query the arbitrary data stored into state or even modify it (allocate new information or modify the existing) using `stepScope.calcOnState`. For example, to allocate new memory fragment in program state, you can write |
| 65 | +```kotlin |
| 66 | +stepScope.calcOnState { memory.allocateConcreteRef() } |
| 67 | +``` |
| 68 | + |
| 69 | +You can compute the validity of arbitrary logical predicates on state. Warning: this can cause queries to SMT solver, which can be time and memory demanding. To query the validity, you can write |
| 70 | +```kotlin |
| 71 | +stepScope.calcOnState { |
| 72 | + clone().assert(your condition)?.let { |
| 73 | + ... handling case when condition is satisfiable ... |
| 74 | + } |
| 75 | +} |
| 76 | +``` |
| 77 | + |
| 78 | +To form and report warnings in SARIF format, use [built-in reporters](https://github.com/UnitTestBot/jacodb/blob/e61c2fa41533a2f0a39fad0beb220c3350987345/jacodb-analysis/src/main/kotlin/org/jacodb/analysis/sarif/DataClasses.kt#L137). |
| 79 | + |
| 80 | +You can browse the [existing checkers](https://github.com/UnitTestBot/usvm/blob/b6ed4682063f1ff6008b3f3c8aa15be663706c74/usvm-jvm/src/main/kotlin/org/usvm/api/targets/TaintAnalysis.kt) for more examples and details. |
| 81 | + |
| 82 | +# Using USVM to confirm SARIF reports |
| 83 | + |
| 84 | +In lots of cases, the exising static code analysers [report false alarms](#about-condition-sensitive-analysis). USVM has ability to confirm or reject the reported warnings. |
| 85 | + |
| 86 | +To run USVM in trace reproduction mode, configure one of the [analyses](https://github.com/UnitTestBot/usvm/blob/saloed/usvm-demo/usvm-jvm/src/main/kotlin/org/usvm/api/targets/TaintAnalysis.kt) and [pass a set of traces](https://github.com/UnitTestBot/usvm/blob/a1e931e5e51f463ed4a33009cee1ffa01cd375bd/usvm-jvm/src/main/kotlin/org/usvm/api/targets/TaintAnalysis.kt#L29) into [JcMachine](https://github.com/UnitTestBot/usvm/blob/main/usvm-jvm/src/main/kotlin/org/usvm/machine/JcMachine.kt). |
| 87 | + |
| 88 | +Also, this process can be customized by a [rich set of options](https://github.com/UnitTestBot/usvm/blob/main/usvm-util/src/main/kotlin/org/usvm/UMachineOptions.kt). |
| 89 | + |
| 90 | +# USVM for unit test generation |
| 91 | + |
| 92 | +USVM has ability to discover all possible behaviours of a program. This is a key feature used in white-box test generation engines. In future, USVM will be the default code analysis engine in [UnitTestBot for Java](https://github.com/UnitTestBot/utbotjava). |
| 93 | + |
| 94 | + |
| 95 | +# Other languages support in USVM |
| 96 | + |
| 97 | +[USVM.Core](https://github.com/UnitTestBot/usvm/tree/main/usvm-core) is a framework which provides highly optimized primitives for symbolic execution: |
| 98 | +* construction and manipulation with symbolic expressions (based on [KSMT](https://github.com/UnitTestBot/ksmt) platform); |
| 99 | +* advanced modeling of [memory operations](https://github.com/UnitTestBot/usvm/tree/main/usvm-core/src/main/kotlin/org/usvm/memory); |
| 100 | +* efficient [constraint managenent](https://github.com/UnitTestBot/usvm/tree/main/usvm-core/src/main/kotlin/org/usvm/constraints) and [constraint solving](https://github.com/UnitTestBot/usvm/tree/main/usvm-core/src/main/kotlin/org/usvm/solver); |
| 101 | +* rich set of [search strategies](https://github.com/UnitTestBot/usvm/tree/main/usvm-core/src/main/kotlin/org/usvm/ps) in giant branching spaces; |
| 102 | +* special support of [standard collections](https://github.com/UnitTestBot/usvm/tree/main/usvm-core/src/main/kotlin/org/usvm/collection); |
| 103 | +* special support of [type system](https://github.com/UnitTestBot/usvm/tree/main/usvm-core/src/main/kotlin/org/usvm/types) constraints; |
| 104 | +* collection and reporting of [statistics](https://github.com/UnitTestBot/usvm/tree/main/usvm-core/src/main/kotlin/org/usvm/statistics). |
| 105 | + |
| 106 | +Thus, USVM.Core implements common primitives used in programming languages. This makes much easier instantiating USVM for new programming language: in fact, to support a programming language, you only need to write its interpreter in terms of operations provided by USVM.Core. |
| 107 | + |
| 108 | +If you want to support a new language, please take a look at [sample language support](https://github.com/UnitTestBot/usvm/tree/main/usvm-sample-language) in USVM. |
0 commit comments