Skip to content

pipeline: filters: lookup added new filter page #1953

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 21 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,7 @@
* [Grep](pipeline/filters/grep.md)
* [Kubernetes](pipeline/filters/kubernetes.md)
* [Log to Metrics](pipeline/filters/log_to_metrics.md)
* [Lookup](pipeline/filters/lookup.md)
* [Lua](pipeline/filters/lua.md)
* [Parser](pipeline/filters/parser.md)
* [Record Modifier](pipeline/filters/record-modifier.md)
Expand Down
135 changes: 135 additions & 0 deletions pipeline/filters/lookup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# Lookup

The _Lookup_ plugin searches for a specified key in a CSV file and, if it finds a match, adds a new key to the output record whose value matches the corresponding CSV value.

## Configuration parameters

The plugin supports the following configuration parameters:

| Key | Description | Default |
| :-- | :---------- | :------ |
| `file` | The CSV file that the Lookup filter will use as a lookup table. This file must contain one column of keys and one column of values. The Lookup filter treats the first row as a header and ignores these values. Supports quoted fields and escaped quotes. | _none_ |
| `lookup_key` | Specifies the record key whose value search for in the CSV file's first column. Supports [record accessor](../administration/configuring-fluent-bit/classic-mode/record-accessor) syntax. | _none_ |
| `result_key` | If a CSV entry whose value matches the value of `lookup_key` is found, specifies the name of the new key to add to the output record. This new key uses the corresponding value from the second column of the CSV file in the same row where `lookup_key` was found. | _none_ |
| `ignore_case` | Specifies whether to ignore case when searching for `lookup_key`. If `true`, searches are case-insensitive. If `false`, searches are case-sensitive. Possible values: `true`, `false`. | `false` |

## Example configuration

{% tabs %}
{% tab title="fluent-bit.yaml" %}

```yaml
parsers:
- name: json
format: json

pipeline:
inputs:
- name: tail
tag: test
path: devices.log
read_from_head: true
parser: json

filters:
- name: lookup
match: test
file: device-bu.csv
lookup_key: $hostname
result_key: business_line
ignore_case: true

outputs:
- name: stdout
match: test
```

{% endtab %}
{% tab title="fluent-bit.conf" %}

```text
[PARSER]
Name json
Format json

[INPUT]
Name tail
Tag test
Path devices.log
Read_from_head On
Parser json

[FILTER]
Name lookup
Match test
File device-bu.csv
Lookup_key $hostname
Result_key business_line
Ignore_case On

[OUTPUT]
Name stdout
Match test
```

{% endtab %}
{% endtabs %}

The previous configuration reads log records from `devices.log` that includes the following values in the `hostname` field:

```shell
{"hostname": "server-prod-001"}
{"hostname": "Server-Prod-001"}
{"hostname": "db-test-abc"}
{"hostname": 123}
{"hostname": true}
{"hostname": " host with space "}
{"hostname": "quoted \"host\""}
{"hostname": "unknown-host"}
{}
{"hostname": [1,2,3]}
{"hostname": {"sub": "val"}}
{"hostname": " "}
```

Because `hostname` was set as the `lookup_key`, the Lookup filter uses the value of each `hostname` key within the record to search for matching values in the first column of the CSV file.

```text
host,department
server-prod-001,Finance
db-test-abc,Engineering
db-test-abc,Marketing
web-frontend-xyz,Marketing
app-backend-123,Operations
"legacy-system true","Legacy IT"
" host with space ","Infrastructure"
"quoted ""host""", "R&D"
no-match-host,Should Not Appear
```

When the filter finds a match, it adds a new key with the name specified by `result_key` and a value from the second column of the CSV file of the row where `lookup_key` was found.

This results in output that resembles the following:

```shell
{"hostname"=>"server-prod-001", "business_line"=>"Finance"}
{"hostname"=>"Server-Prod-001", "business_line"=>"Finance"}
{"hostname"=>"db-test-abc", "business_line"=>"Marketing"}
{"hostname"=>123}
{"hostname"=>true}
{"hostname"=>" host with space ", "business_line"=>"Infrastructure"}
{"hostname"=>"quoted "host"", "business_line"=>"R&D"}
{"hostname"=>"unknown-host"}
{}
{"hostname"=>[1, 2, 3]}
{"hostname"=>{"sub"=>"val"}}
```

## CSV import


Fluent Bit creates an in-memory key/value lookup table from the CSV file that you provide. The first column of this CSV is always treated as a key, and its second column as a value. Any other columns are ignored.

This filter is intended for static datasets. After Fluent Bit loads the CSV file, it won't reload that file, which means the filter's lookup table won't update to reflect any changes.

This filter doesn't support multiline values in CSV files.