Skip to content

Commit 3482b12

Browse files
committed
Add few more pages about compiler plugin
1 parent e0a25a5 commit 3482b12

File tree

3 files changed

+179
-0
lines changed

3 files changed

+179
-0
lines changed

docs/StardustDocs/d.tree

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,9 @@
4444
<toc-element topic="DataRow.md"/>
4545
</toc-element>
4646
<toc-element topic="Compiler-Plugin.md">
47+
<toc-element topic="staticInterpretation.md"/>
4748
<toc-element topic="dataSchema.md"/>
49+
<toc-element topic="compilerPluginExamples.md"/>
4850
</toc-element>
4951
<toc-element topic="nanAndNa.md"/>
5052
<toc-element topic="numberUnification.md"/>
Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
[//]: # (title: Compiler plugin examples)
2+
3+
This page provides a few examples that you can copy directly to your project.
4+
[Schema info](staticInterpretation.md#schema-info) will be a convenient way to observe the result of different operations.
5+
6+
### Example 1
7+
8+
```kotlin
9+
import org.jetbrains.kotlinx.dataframe.api.*
10+
11+
fun main() {
12+
val df = dataFrameOf("location", "income")(
13+
"mall", "2.49",
14+
"university", "2.99",
15+
"university", "1.49",
16+
"school", "0.99",
17+
"hospital", "2.99",
18+
"university", "0.49",
19+
"hospital", "1.49",
20+
"mall", "0.99",
21+
"hospital", "0.49",
22+
)
23+
24+
df
25+
.convert { income }.with { it.toDouble() }
26+
.groupBy { location }.aggregate {
27+
income.toList() into "allTransactions"
28+
sumOf { income } into "totalIncome"
29+
}.forEach {
30+
println(location)
31+
println("totalIncome = $totalIncome")
32+
}
33+
}
34+
```
35+
36+
### Example 2
37+
38+
```kotlin
39+
import org.jetbrains.kotlinx.dataframe.api.*
40+
import org.jetbrains.kotlinx.dataframe.io.*
41+
42+
enum class State {
43+
Idle, Productive, Maintenance
44+
}
45+
46+
class Event(val toolId: String, val state: State, val timestamp: Long)
47+
48+
fun main() {
49+
val tool1 = "tool_1"
50+
val tool2 = "tool_2"
51+
val tool3 = "tool_3"
52+
53+
val events = listOf(
54+
Event(tool1, State.Idle, 0),
55+
Event(tool1, State.Productive, 5),
56+
Event(tool2, State.Idle, 0),
57+
Event(tool2, State.Maintenance, 10),
58+
Event(tool2, State.Idle, 20),
59+
Event(tool3, State.Idle, 0),
60+
Event(tool3, State.Productive, 25),
61+
).toDataFrame()
62+
63+
val lastTimestamp = events.maxOf { timestamp }
64+
65+
val groupBy = events
66+
.groupBy { toolId }
67+
.sortBy { timestamp }
68+
.add("stateDuration") {
69+
(next()?.timestamp ?: lastTimestamp) - timestamp
70+
}
71+
72+
groupBy.updateGroups {
73+
val allStates = State.entries.toDataFrame {
74+
"state" from { it }
75+
}
76+
77+
val df = allStates.leftJoin(it) { state }
78+
.fillNulls { stateDuration }
79+
.with { -1 }
80+
81+
df.groupBy { state }.sumFor { stateDuration }
82+
}
83+
.toDataFrame()
84+
.toStandaloneHtml()
85+
.openInBrowser()
86+
}
87+
```
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# Static interpretation of DataFrame API
2+
3+
Plugin evaluates dataframe operations, given compile-time known arguments such as constant String, resolved types, property access calls.
4+
It updates the return type of the function call to provide properties that match column names and types.
5+
The goal is to reflect the result of operations you apply to dataframe in types and have convenient typed API
6+
7+
```kotlin
8+
val weatherData = dataFrameOf(
9+
"time" to columnOf(0, 1, 2, 4, 5, 7, 8, 9),
10+
"temperature" to columnOf(12.0, 14.2, 15.1, 15.9, 17.9, 15.6, 14.2, 24.3),
11+
"humidity" to columnOf(0.5, 0.32, 0.11, 0.89, 0.68, 0.57, 0.56, 0.5)
12+
)
13+
14+
weatherData.filter { temperature > 15.0 }.print()
15+
```
16+
17+
## Schema info
18+
19+
The schema of DataFrame, as the compiler plugin sees it,
20+
is displayed when you hover on an expression or variable:
21+
22+
![image.png](schema_info.png)
23+
24+
This is a way to tell what properties are available.
25+
For expressions with several operations, you can see how DataFrame changes at each step.
26+
27+
## Visibility of the generated code
28+
29+
Generated code itself is very similar to @DataSchema declarations in nature.
30+
Take this expression as an example:
31+
32+
```kotlin
33+
fun main() {
34+
val df: /* DataFrame<DataFrameOf_39> */ = dataFrameOf("col" to columnOf(42))
35+
}
36+
```
37+
38+
It produces two additional local classes:
39+
40+
```kotlin
41+
// Represents data schema
42+
class DataFrameOf_39 {
43+
val a: Int
44+
}
45+
46+
// Injected to implicit receiver scope of `main` function
47+
class Scope {
48+
val DataRow<DataFrameOf_39>.a: Int
49+
val ColumnsScope<DataFrameOf_39>.a: DataColumn<Int>
50+
}
51+
```
52+
53+
You can read about the code transformation pipeline in [more detail](https://youtrack.jetbrains.com/issue/KT-65859).
54+
55+
The fact that generated classes are anonymous local types limits their scope to the private scope of the file.
56+
It means you can do this:
57+
58+
```kotlin
59+
private fun create(i: Int) = dataFrameOf("number" to columnOf(i))
60+
.first()
61+
62+
fun main() {
63+
val row = create(42)
64+
println(row.number)
65+
}
66+
```
67+
68+
But you cannot refer to these classes from your code, have them appear in the explicit type of the variable or as parameter of a function.
69+
70+
## Scope of compiler plugin
71+
72+
Compiler plugin aims to cover all functions where the result of the operation depends only on input schema and arguments that can be resolved at compile time.
73+
In the library, such functions are annotated with `@Refine` or `@Interpretable`.
74+
75+
There are functions that are not supported:
76+
`pivot`, `parse`, `read`, `ColumnSelectionDsl.filter`, etc. — operations where the resulting schema depends on data, so it's out of the scope
77+
`gather`, `split`, `implode`, some CS DSL functions — they will be supported in the future release
78+
79+
In Gradle projects it means that sometimes you'd need to provide [data schema](dataSchema.md) or fall back to String API.
80+
81+
In Kotlin Notebook, the compiler plugin complements the built-in code generator that updates types or variables after cell execution.
82+
83+
```kotlin
84+
val df = DataFrame.read("...")
85+
```
86+
87+
In the next cell you can add, convert, remove, aggregate columns and expect that schema will be updated accordingly,
88+
without having to split your pipeline into multiple steps and trigger notebook code generation.
89+
90+

0 commit comments

Comments
 (0)