Skip to content

[Feature] Document based ingest routing #63798

Closed
@ruflin

Description

@ruflin

Problem

On the Ingest side many different dataset are available. Today our story of building integrations assumes that a single source only produces one data streams. One log file contains the access logs for nginx and other one contains the error logs for nginx and they are not mixed. This allows us to send each to different data streams with a specific ingest pipeline which knows how to process the events.

Unfortunately this is the ideal case and there are multiple scenarios where a single source can contain a mix of datasets. This happens in the logs and the metrics use case. A few examples:

  • Docker logs: Docker only allows to log to stderr or stdout. If there are more than 2 log files, the data will be mixed together.
  • Prometheus metrics: A single prometheus collector might collect metrics from many prometheus endpoints
  • Kinesis: Getting data out of kinesis mixes many different datasets
  • Syslog: Many services send their data a single syslog input

The routing of an event to the right destination could be done on the edge or centrally. This feature request is around having centrally managed routing as it would work well in the context of the integrations installed by Fleet. Having it centrally has the advantage that when new integrations are installed or updated, no updates of all the Agents are needed to add new routing information and it also works well in the case of standalone agents.

Story

In the following I’m going through a story on how I could imagine document based ingest routing to work. This is NOT a proposal for the API or how it should be designed, but it should help to better explain on how it would be used from Fleet.

The following story uses syslog as a data sink input and all data is sent to syslog.

Add routing rules

A user gets started with Elastic Agent by enrolling into Fleet. The user starts collecting logs from syslog and all the logs go into logs-generic-default. The services that send logs to syslog are nginx and mysql. Based on some ML detection rules, Fleet recognises that the logs being shipped to logs-generic-default contain logs for nginx and mysql. It requests the user to install these two packages which the user happily does.

On the Fleet side, this installs all the assets like dashboards, ingest pipelines and templates for nginx and mysql. But on the config side nothing changes as it still just pulls in data from syslog. Each package contains some ingest routing rules. These routing rules are added to logs-generic-* to be applied to all data which lands there. Each package can contain a list of conditions. For nginx the following happens:

PUT _routing/logs-generic-*
{
  “id”: “nginx”
  “condition”: “ctx.input.type”==”syslog” && contains(“file.path”,”nginx”)
  // Data should be routed to the nginx data stream within the same namespace
  “target”: “logs-nginx-${data_stream.namespace}”
}

As soon as the above routing rule is added, all documents coming into logs-generic-* are matched against this new rule. If no match is found, the documents are ingested into logs-generic-default as before. If a match is found, the documents are routed to the target data stream. If the user now also installs the mysql package, one more condition is added to the routing table for logs-generic-*:

PUT _routing/logs-generic-*
{
  “id”: “mysql”
  “condition”: “ctx.input.type”==”syslog” && contains(“file.path”,”mysql”)
  “target”: “logs-mysql-${data_stream.namespace}”
}

The nginx package did not only contain routing rules for logs-generic-* but also for logs-nginx-* to split up access and error logs:

PUT _routing/logs-nginx-*
{
  “id”: “nginx.access”
  // This is a condition that a certain grok pattern matches, example is just made up and not "correct" syntax
  “condition”: grok(“ctx.message”, %{TIMESTAMP_ISO8601}) == true
  “target”: “logs-nginx.access-${data_stream.namespace}”
}

PUT _routing/logs-nginx-*
{
  “id”: “nginx.error”
  “condition”: grok(“ctx.message”, %{IP}) == true)
  “target”: “logs-nginx.error-${data_stream.namespace}”
}

All the data that is forwarded to logs-nginx-* is now going through these additional routing conditions. If no condition matches, data goes into logs-nginx-*. As logs-nginx.access-default and logs-nginx.error-default already contain an ingest pipeline as part of the mapping, the processing of the event will happen as expected. This works exactly as when data would have been sent to logs-nginx.acess-* data streams directly.

In some cases, it might be needed that logs-nginx-* data streams already contain an ingest pipeline to do some preprocessing on all events. This preprocessing could be needed to simplify the conditions. An other use case with pipelines is that the logs-nginx-* data streams want to add a final pipeline that should be run in the end.

Remove routing rules

The user now has fully processed nginx and mysql logs. At some stage, the user decides to remove the nginx package as the usage of nginx service has stopped and it is not needed anymore. Removing a package means removing all ingest pipelines, templates and in this case also routing rules:

DELETE _routing/logs-generic-*
{
  “id”: “nginx”
}
DELETE _routing/logs-nginx-*
{
  “id”: “nginx.access”
}
DELETE _routing/logs-nginx-*
{
  “id”: “nginx.error”
}

This deletes the routing rule for nginx. If any future data for nginx comes in, it will just be routed to “logs-generic-default” again as no rule matches.

Additional notes

It might be, that some routing rules will need a priority to make sure they are applied before others. In case of two rules applying to a single data, the first or last one should win.

It is possible today to do some basic routing with ingest pipeline by rewriting the _index field in an ingest pipeline. But this requires to modify ingest pipelines every time a package is installed / removed. An initial idea was to simplify this by potentially having support for multiple ingest pipelines per data stream so only pipelines need to be added or removed instead of modified but this seems more like a workaround.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions