Skip to content

Commit 7b544b3

Browse files
author
Drew Kerrigan
committed
Adding setup instructions
1 parent 824abc1 commit 7b544b3

File tree

4 files changed

+107
-2
lines changed

4 files changed

+107
-2
lines changed

README.md

Lines changed: 107 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,116 @@ This project contains a custom processor for [Apache Nifi](https://nifi.apache.o
44

55
## Installation
66

7-
TODO: Add Instructions.
7+
Create a directory for custom nifi processors if it doesn't exist:
8+
9+
```
10+
mkdir -p $NIFI_HOME/nars/lib1
11+
```
12+
13+
Copy `nifi-stanfordcorenlp-nar-1.0.nar` to `$NIFI_HOME/nars/lib1`
14+
15+
Update your `$NIFI_HOME/conf/nifi.properties` to include the following:
16+
17+
```
18+
nifi.nar.library.directory.lib1=/nars/lib1
19+
```
20+
21+
Update your `$NIFI_HOME/conf/bootstrap.conf` with increased memory needed for CoreNLP:
22+
23+
```
24+
# JVM memory settings
25+
#java.arg.2=-Xms512m
26+
#java.arg.3=-Xmx512m
27+
java.arg.2=-Xms4g
28+
java.arg.3=-Xmx4g
29+
```
30+
31+
Start nifi as usual
32+
33+
```
34+
nifi start
35+
```
36+
37+
For more information on how to configure custom nar files, visit the [Nifi Documentation](https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#core-properties-br).
838

939
## Usage
1040

11-
TODO: Add Instructions.
41+
### Create Input File
42+
43+
Create an input folder in a location that Nifi has read access, and add a file called `1.json` there with the contents:
44+
45+
```
46+
{
47+
"title": "Worldwide film production company opens ABQ facility",
48+
"content": "ALBUQUERQUE, N.M. — A worldwide film production company is expanding to Albuquerque, according to Albuquerque Business First. The company, Production Resource Group, has worked on various movie productions including \"House of Cards.\" They plan to move into a 6,000-square-foot warehouse space in northeast Albuquerque, located at 5821 Midway Park Blvd. NE.\n\nFor more information, click here.",
49+
}
50+
```
51+
52+
### Add Input / Output Processors
53+
54+
In this example I'm using the standard `GetFile` and `PutFile` processors. They should be configured with input / output folders that Nifi can read / write.
55+
56+
### Add Custom Processor
57+
58+
![Add Processor](./doc/add_processor.png "Add Processor")
59+
60+
Assuming the nar file was configured properly in the installation step, you should be able to search for `Core` or `NLP` to find the `StanfordCoreNLPProcessor`.
61+
62+
### Configure Properties
63+
64+
![Configure Properties](./doc/configure_local.png "Configure Properties")
65+
66+
In the Properties tab of the processor configuration, you will need to configure the entity types that you want to extract from the input as a comma separated list. Here is a brief summary of the configuration options:
67+
68+
* `Entity Types`
69+
* Description: Lowercase comma separated list of NER tags to extract from text, such as: `location,organization`.
70+
* Valid Values: `person,location,organization,misc,money,number,ordinal,percent,date,time,duration,set,email,url,city,state_or_province,country,nationality,religion,title,ideology,criminal_charge,cause_of_death`.
71+
* Notes: when using the `location` entity type, the following other types are automatically also grouped under it: `city,country,state_or_province`. For more information about ner tags, visit the [CoreNLP NER Docs](https://stanfordnlp.github.io/CoreNLP/ner.html#description).
72+
* `JSONPath`
73+
* Description: The [JSON Path](https://github.com/json-path) from incoming flow file to extract for analyzing, such as: `$.['title','content']`.
74+
* Notes: If left blank, the flow file will be treated as plain text.
75+
* `StanfordCoreNLP Props as JSON`
76+
* Description: Properties to configure the StanfordCoreNLP object or StanfordCoreNLPClient object as JSON, such as: `{"threads": 1}`.
77+
* `StanfordCoreNLPClient Host`
78+
* Description: StanfordCoreNLPClient host address, such as: `http://localhost`
79+
* Notes:
80+
* If left blank, all processing will be performed locally. This requires nifi to be configured with additional memory as noted in the installation section.
81+
* An external Stanford CoreNLP server can be run with docker: `docker run -p 9000:9000 --name coreNLP --rm -i -t isslab/corenlp:2018-10-05`.
82+
* `StanfordCoreNLPClient Port`
83+
* Description: StanfordCoreNLPClient port, such as: `9000`.
84+
* `StanfordCoreNLPClient API Key`
85+
* Description: StanfordCoreNLPClient API Key for servers that have authentication configured, not required.
86+
* `StanfordCoreNLPClient API Secret`
87+
* Description: StanfordCoreNLPClient API Secret for servers that have authentication configured, not required.
88+
89+
### Add Connections
90+
91+
![Add Connections](./doc/running.png "Add Connections")
92+
93+
After the `StanfordCoreNLPProcessor` is configured, connect it with inputs and outputs. The processor can terminate with `success` or `failure` relationships.
94+
95+
### Verify Output
96+
97+
Start all of the processors in your Nifi flow. In the configured output folder for the `PutFile` component, you should see the updated json file:
98+
99+
```
100+
{
101+
"title": "Worldwide film production company opens ABQ facility",
102+
"content": "ALBUQUERQUE, N.M. — A worldwide film production company is expanding to Albuquerque, according to Albuquerque Business First. The company, Production Resource Group, has worked on various movie productions including \"House of Cards.\" They plan to move into a 6,000-square-foot warehouse space in northeast Albuquerque, located at 5821 Midway Park Blvd. NE.\n\nFor more information, click here.",
103+
"organization": [
104+
"Albuquerque Business First",
105+
"Production Resource Group"
106+
],
107+
"location": [
108+
"ALBUQUERQUE",
109+
"N.M.",
110+
"Albuquerque",
111+
"Albuquerque",
112+
"Midway Park Blvd.",
113+
"NE"
114+
]
115+
}
116+
```
12117

13118
## Build
14119

doc/add_processor.png

854 KB
Loading

doc/configure_local.png

917 KB
Loading

doc/running.png

1.03 MB
Loading

0 commit comments

Comments
 (0)