You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+107-2Lines changed: 107 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,11 +4,116 @@ This project contains a custom processor for [Apache Nifi](https://nifi.apache.o
4
4
5
5
## Installation
6
6
7
-
TODO: Add Instructions.
7
+
Create a directory for custom nifi processors if it doesn't exist:
8
+
9
+
```
10
+
mkdir -p $NIFI_HOME/nars/lib1
11
+
```
12
+
13
+
Copy `nifi-stanfordcorenlp-nar-1.0.nar` to `$NIFI_HOME/nars/lib1`
14
+
15
+
Update your `$NIFI_HOME/conf/nifi.properties` to include the following:
16
+
17
+
```
18
+
nifi.nar.library.directory.lib1=/nars/lib1
19
+
```
20
+
21
+
Update your `$NIFI_HOME/conf/bootstrap.conf` with increased memory needed for CoreNLP:
22
+
23
+
```
24
+
# JVM memory settings
25
+
#java.arg.2=-Xms512m
26
+
#java.arg.3=-Xmx512m
27
+
java.arg.2=-Xms4g
28
+
java.arg.3=-Xmx4g
29
+
```
30
+
31
+
Start nifi as usual
32
+
33
+
```
34
+
nifi start
35
+
```
36
+
37
+
For more information on how to configure custom nar files, visit the [Nifi Documentation](https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#core-properties-br).
8
38
9
39
## Usage
10
40
11
-
TODO: Add Instructions.
41
+
### Create Input File
42
+
43
+
Create an input folder in a location that Nifi has read access, and add a file called `1.json` there with the contents:
44
+
45
+
```
46
+
{
47
+
"title": "Worldwide film production company opens ABQ facility",
48
+
"content": "ALBUQUERQUE, N.M. — A worldwide film production company is expanding to Albuquerque, according to Albuquerque Business First. The company, Production Resource Group, has worked on various movie productions including \"House of Cards.\" They plan to move into a 6,000-square-foot warehouse space in northeast Albuquerque, located at 5821 Midway Park Blvd. NE.\n\nFor more information, click here.",
49
+
}
50
+
```
51
+
52
+
### Add Input / Output Processors
53
+
54
+
In this example I'm using the standard `GetFile` and `PutFile` processors. They should be configured with input / output folders that Nifi can read / write.
Assuming the nar file was configured properly in the installation step, you should be able to search for `Core` or `NLP` to find the `StanfordCoreNLPProcessor`.
In the Properties tab of the processor configuration, you will need to configure the entity types that you want to extract from the input as a comma separated list. Here is a brief summary of the configuration options:
67
+
68
+
*`Entity Types`
69
+
* Description: Lowercase comma separated list of NER tags to extract from text, such as: `location,organization`.
* Notes: when using the `location` entity type, the following other types are automatically also grouped under it: `city,country,state_or_province`. For more information about ner tags, visit the [CoreNLP NER Docs](https://stanfordnlp.github.io/CoreNLP/ner.html#description).
72
+
*`JSONPath`
73
+
* Description: The [JSON Path](https://github.com/json-path) from incoming flow file to extract for analyzing, such as: `$.['title','content']`.
74
+
* Notes: If left blank, the flow file will be treated as plain text.
75
+
*`StanfordCoreNLP Props as JSON`
76
+
* Description: Properties to configure the StanfordCoreNLP object or StanfordCoreNLPClient object as JSON, such as: `{"threads": 1}`.
77
+
*`StanfordCoreNLPClient Host`
78
+
* Description: StanfordCoreNLPClient host address, such as: `http://localhost`
79
+
* Notes:
80
+
* If left blank, all processing will be performed locally. This requires nifi to be configured with additional memory as noted in the installation section.
81
+
* An external Stanford CoreNLP server can be run with docker: `docker run -p 9000:9000 --name coreNLP --rm -i -t isslab/corenlp:2018-10-05`.
82
+
*`StanfordCoreNLPClient Port`
83
+
* Description: StanfordCoreNLPClient port, such as: `9000`.
84
+
*`StanfordCoreNLPClient API Key`
85
+
* Description: StanfordCoreNLPClient API Key for servers that have authentication configured, not required.
86
+
*`StanfordCoreNLPClient API Secret`
87
+
* Description: StanfordCoreNLPClient API Secret for servers that have authentication configured, not required.
After the `StanfordCoreNLPProcessor` is configured, connect it with inputs and outputs. The processor can terminate with `success` or `failure` relationships.
94
+
95
+
### Verify Output
96
+
97
+
Start all of the processors in your Nifi flow. In the configured output folder for the `PutFile` component, you should see the updated json file:
98
+
99
+
```
100
+
{
101
+
"title": "Worldwide film production company opens ABQ facility",
102
+
"content": "ALBUQUERQUE, N.M. — A worldwide film production company is expanding to Albuquerque, according to Albuquerque Business First. The company, Production Resource Group, has worked on various movie productions including \"House of Cards.\" They plan to move into a 6,000-square-foot warehouse space in northeast Albuquerque, located at 5821 Midway Park Blvd. NE.\n\nFor more information, click here.",
0 commit comments