Skip to content

Commit 56eb565

Browse files
authored
Merge pull request #556 from joemfb/performance-sample
adds performance sample
2 parents e172e5d + adf0a51 commit 56eb565

File tree

2 files changed

+416
-0
lines changed

2 files changed

+416
-0
lines changed

examples/performance-sample/README.md

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
## data-hub performance sample
2+
3+
This example automates the entire setup and scaffolding of a data-hub,
4+
complete with:
5+
6+
- entity creation
7+
- input flow creation
8+
- sample-data retrieval
9+
- data ingestion
10+
- harmonization flow creation
11+
- harmonization
12+
13+
Using the geonames cities5000 (top world cities by population) as our data
14+
source, we create `input-json` and `input-xml` entities, with input flows for
15+
each, and ingest the sample data using MLCP and our input flows, creating JSON
16+
and XML instances of every city. We then create four harmonization flows
17+
representing the cartesian product of data formats (XML, JSON) and code
18+
formats (XQY, SJS), and run each one.
19+
20+
This lets us easily analyze and compare the performance of the default,
21+
scaffolded harmonization flows across data types. This example can also serve
22+
as a reference for data-hub build automation.
23+
24+
### getting started
25+
26+
To get started, copy `build.gradle` into an empty directory and setup a new
27+
data-hub:
28+
29+
```
30+
gradle hubInit
31+
```
32+
33+
In `gradle.properties`, set `mlUsername` and `mlPassword` to your MarkLogic admin account, and check that the other settings are appropriate for your
34+
environment.
35+
36+
You can alternatively set environment-specific properties in
37+
`gradle-$ENV.properties`, and invoke `gradle` with `-PenvironmentName=$ENV`.
38+
39+
### scaffold, ingest, and harmonize
40+
41+
There's an uber-task to handle creating entities and input flows, retrieving
42+
and ingesting data, and creating and running the harmonization flows:
43+
44+
```
45+
gradle doAll
46+
```
47+
48+
Alternately, you can run these steps separately:
49+
50+
```
51+
gradle mlDeploy
52+
gradle createEntityInput
53+
gradle loadInputData
54+
gradle allHarmonizeFlows
55+
```
56+
57+
### profile
58+
59+
There are two profiling mechanisms available in this project. The first is the
60+
built-in gradle profiler:
61+
62+
```
63+
gradle --profile doAll
64+
```
65+
66+
This will write an HTML profile report to
67+
`./build/reports/profile/profile-$DATETIME.html`.
68+
69+
There's also a custom profiling class that prints per-task execution time to
70+
the terminal:
71+
72+
```
73+
gradle -Pprofile doAll
74+
```
75+
76+
Example output:
77+
78+
```
79+
BUILD SUCCESSFUL in 6m 8s
80+
21 actionable tasks: 21 executed
81+
Task timings:
82+
2.742s :hubPreInstallCheck
83+
0.001s :mlDeleteModuleTimestampsFile
84+
0.004s :mlPrepareRestApiDependencies
85+
88.791s :mlDeployApp
86+
0.000s :mlPostDeploy
87+
0.000s :mlDeploy
88+
0.004s :createJsonEntity
89+
0.002s :createInputJsonFlow
90+
0.001s :createXmlEntity
91+
0.002s :createInputXmlFlow
92+
0.000s :createEntityInput
93+
0.005s :createHarmonizeJsonSjs
94+
0.003s :createHarmonizeJsonXqy
95+
0.002s :createHarmonizeXmlSjs
96+
0.002s :createHarmonizeXmlXqy
97+
0.000s :createHarmonizeFlows
98+
0.418s :getInputData
99+
3.595s :mlLoadModules
100+
52.553s :loadJson
101+
52.685s :loadXml
102+
0.003s :loadInputData
103+
51.452s :runHarmonizeJsonSjs
104+
29.802s :runHarmonizeJsonXqy
105+
55.021s :runHarmonizeXmlSjs
106+
30.660s :runHarmonizeXmlXqy
107+
0.000s :allHarmonizeFlows
108+
0.000s :doAll
109+
```

0 commit comments

Comments
 (0)