Restrict Indexing To Child Streams When Streams Is Enabled #132011

lukewhiting · 2025-07-28T13:14:54Z

This PR prevents indexing into child streams when streams mode is enabled.

In this case, we define a child stream as any index matching logs.* and restrictions apply both to direct indexing via put to index or bulk along with indirect indexing attempts via pipelines using reroute, script or other processors that change the target index or routing.

Deletes are still permitted from these child streams but updates such as _query_by_update will be prevented.

Example

Input

logs redirects to logs.abc.def via reroute processor on default pipeline
bad-index redirects to logs.abc via a script processor changing ctx._index

PUT {{host}}/_bulk
Content-Type: application/json

{ "create":{"_index": "logs" } } 
{ "@timestamp": "2099-05-06T16:21:15.000Z", "message": "192.0.2.42 - - [06/May/2099:16:21:15 +0000] \"GET /images/bg1.jpg HTTP/1.0\" 200 24736" }
{ "create":{"_index": "logs.abc" } }
{ "@timestamp": "2099-05-06T16:25:42.000Z", "message": "192.0.2.255 - - [06/May/2099:16:25:42 +0000] \"GET /favicon.ico HTTP/1.0\" 200 3638" }
{ "create":{"_index": "bad-index" } }
{ "@timestamp": "2099-05-06T16:21:15.000Z", "message": "192.0.2.42 - - [06/May/2099:16:21:15 +0000] \"GET /images/bg.jpg HTTP/1.0\" 200 24736" }

Output

{
  "errors": true,
  "took": 200,
  "ingest_took": 0,
  "items": [
    {
      "create": {
        "_index": "logs.abc.def",
        "_id": "wmsjUZgBpF-FKxj59Ma4",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 2,
          "successful": 1,
          "failed": 0
        },
        "_seq_no": 0,
        "_primary_term": 1,
        "status": 201
      }
    },
    {
      "create": {
        "_index": "logs.abc",
        "_id": "auto-generated",
        "status": 400,
        "failure_store": "not_enabled",
        "error": {
          "type": "illegal_argument_exception",
          "reason": "Direct writes to child streams are prohibited. Index directly into the [logs] stream instead"
        }
      }
    },
    {
      "create": {
        "_index": "logs.abc",
        "_id": "auto-generated",
        "status": 400,
        "error": {
          "type": "illegal_argument_exception",
          "reason": "Pipelines can't re-route documents to child streams, but pipeline [pipeline1] tried to reroute this document from index [bad-index] to index [logs.abc]. Reroute history: bad-index"
        }
      }
    }
  ]
}

Fixes ES-11941

elasticsearchmachine · 2025-07-28T13:15:19Z

Hi @lukewhiting, I've created a changelog YAML for you.

elasticsearchmachine · 2025-07-28T13:15:19Z

Pinging @elastic/es-data-management (Team:Data Management)

lukewhiting · 2025-07-28T13:15:23Z

Requested review from @jbaiera as this touches the failure store

…serving ingest time taken

… to not depend on same instance assertions This prevents issues when wrapping responses during ingest

… is enabled in the cluster

…streams

…eams

szybia

guess don't have a lot of context on a more higher-level to approve, but code lgtm!

left a few small suggestions and Qs for learning

...ams/src/yamlRestTest/resources/rest-api-spec/test/streams/logs/20_substream_restrictions.yml

modules/streams/build.gradle

server/src/main/java/org/elasticsearch/action/bulk/BulkRequestModifier.java

server/src/main/java/org/elasticsearch/action/bulk/BulkResponse.java

server/src/main/java/org/elasticsearch/action/bulk/TransportAbstractBulkAction.java

server/src/main/java/org/elasticsearch/common/streams/StreamType.java

server/src/main/java/org/elasticsearch/action/bulk/TransportAbstractBulkAction.java

server/src/main/java/org/elasticsearch/common/streams/StreamType.java

modules/streams/build.gradle

Co-authored-by: Szymon Bialkowski <szybia@tuta.io>

Copilot

Pull Request Overview

This PR implements restrictions on direct indexing to child streams when streams mode is enabled, specifically preventing writes to indices matching the logs.* pattern while allowing operations through the parent logs stream.

Adds validation logic to prevent direct writes to child streams via bulk operations and single document indexing
Introduces pipeline-level validation to prevent rerouting documents to child streams through ingest processors
Allows delete operations on child streams while blocking create/update operations

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`StreamType.java`	New enum defining stream types with validation methods for enabled streams and child stream matching
`TransportAbstractBulkAction.java`	Adds pre-pipeline validation to reject direct writes to child streams
`IngestService.java`	Implements pipeline validation to prevent rerouting documents to child streams
`BulkRequestModifier.java`	Refactors listener wrapping methods to support flexible ingest time calculation
`BulkResponse.java`	Adds equals and hashCode methods for proper response comparison
`TransportBulkActionIngestTests.java`	Updates test to use concrete BulkResponse instead of mock for equality testing
`20_substream_restrictions.yml`	Comprehensive integration tests covering various restriction scenarios
`StreamsYamlTestSuiteIT.java`	Adds required modules for integration testing
`build.gradle`	Includes additional REST API endpoints and test dependencies

server/src/main/java/org/elasticsearch/ingest/IngestService.java

server/src/main/java/org/elasticsearch/action/bulk/TransportAbstractBulkAction.java

server/src/main/java/org/elasticsearch/common/streams/StreamType.java

server/src/main/java/org/elasticsearch/ingest/IngestService.java

szybia

lgtm!

but someone more experienced should have the final approve 🚀

elasticsearchmachine · 2025-07-29T10:20:58Z

Hi @lukewhiting, I've created a changelog YAML for you.

masseyke · 2025-07-30T14:39:53Z

server/src/main/java/org/elasticsearch/ingest/IngestService.java

@@ -1238,6 +1239,28 @@ private void executePipelines(
                        return; // document failed!
                    }

+                    for (StreamType streamType : StreamType.getEnabledStreamTypesForProject(project)) {
+                        if (streamType.matchesStreamPrefix(newIndex)
+                            && ingestDocument.getIndexHistory().contains(streamType.getStreamName()) == false) {


I think this would let me through if I rerouted from say logs to logs.abc.def. That is, it allows me to write to any descendent stream, not just a direct child stream. I assume that's OK, right? The real goal is to prevent things from outside of the stream writing to child streams.

Correct but yeah, I don't think it's in scope to enforce the hierarchy in ES. At least not at this stage.

server/src/main/java/org/elasticsearch/action/bulk/BulkRequestModifier.java

masseyke

I left a couple of comments, but LGTM.

…port-changes' into es-11941-streams-logs-bulk-transport-changes

jbaiera

Left some comments. There's a couple easy to miss things that might need to be addressed for failure store, and I left a few small questions and suggestions but otherwise it's looking good. Marking as approved for once the important things are addressed.

jbaiera · 2025-08-01T03:14:42Z

server/src/main/java/org/elasticsearch/action/bulk/TransportAbstractBulkAction.java

+        BulkRequestModifier bulkRequestModifier = new BulkRequestModifier(bulkRequest);
+
+        for (StreamType streamType : StreamType.getEnabledStreamTypesForProject(projectMetadata)) {
+            for (int i = 0; i < bulkRequest.requests.size(); i++) {


You're iterating using the locally available bulk request, but you're dereferencing the bulk request from the request modifier to get the documents. This assumes that the request modifier will never make any changes to its internal state. I think we can avoid that kind of snag if we iterate over the request items and check each stream type per document instead of iterating over the request items multiple times for each eventual stream type.

I looked around where we use bulk request modifier in other places and it's actually an iterable itself. In the IngestService we just iterate over it like a regular iterable, and maintain a slot counter separately. Might read more clearly if we do that here too.

Have switched this to use the bulk modifier as an iterator and moved the stream types to be the inner iterator so best of both worlds :-)

jbaiera · 2025-08-01T03:23:25Z

server/src/main/java/org/elasticsearch/action/bulk/TransportAbstractBulkAction.java

+                            + streamType.getStreamName()
+                            + "] stream instead"
+                    );
+                    Boolean failureStoreEnabled = resolveFailureStore(req.index(), projectMetadata, threadPool.absoluteTimeInMillis());


Another thing that may need to be checked here is whether or not the failure store feature is present on every node. We check that in IngestService.wrapResolverWithFeatureCheck and in TransportBulkAction.executeBulk

I'm not 100% sure what you mean here but I think this isn't needed as at this level we are just marking stuff for failure store and anything marked at this stage will still go through those checks in TransportBulkAction?

Those checks later on are only made when deciding if a failed document should be sent to the failure store. Once a document has been marked for failure store (like what we're doing here) we don't actually run the check anymore.

I'm not 100% sure what you mean here

I would check in to those linked methods to see how they make use of the feature service to ensure every node in the cluster knows what a failure store is before trying to mark a document to be sent to it.

jbaiera · 2025-08-01T03:27:12Z

server/src/main/java/org/elasticsearch/action/bulk/TransportAbstractBulkAction.java

+                    if (Boolean.TRUE.equals(failureStoreEnabled)) {
+                        bulkRequestModifier.markItemForFailureStore(i, req.index(), e);
+                    } else {
+                        bulkRequestModifier.markItemAsFailed(i, e, IndexDocFailureStoreStatus.NOT_ENABLED);


Is the failure store status of NOT_ENABLED always correct here? a null value in the failureStoreEnabled variable means the document "doesn't correspond to a data stream" in which case the status should be NA. I recognize that streams might make that impossible, so if that is the case, let's add an assert statement here to make sure.

I'm not sure we can actually make a definitive determination here... For now I have switched it to "Unknown" which I think is a better option?

I'm not sure we can actually make a definitive determination here.

Looking at the doc for resolveFailureStore method

return true if this is not a simulation, and the given index name corresponds to a data stream with a failure store, or if it matches a template that has a data stream failure store enabled, or if it matches a data stream template with no failure store option specified and the name matches the cluster setting to enable the failure store. Returns false if the index name corresponds to a data stream, but it doesn't have the failure store enabled by one of those conditions. Returns null when it doesn't correspond to a data stream.

Based on that I think the logic should be:

if (feature enabled on all nodes): if (resolveFailureStore returned true): mark for failure store else if (resolveFailureStore returned false): fail document - mark as NOT_ENABLED status else if (resolveFailureStore returned null): fail document - mark as NA status else: fail document - mark as NA status

jbaiera · 2025-08-01T03:44:42Z

server/src/main/java/org/elasticsearch/action/bulk/BulkRequestModifier.java

-                    response.getTook().getMillis(),
-                    ingestTookInMillis,
+                    response.getTookInMillis(),
+                    ingestTimeProviderFunction.apply(response),


Nit: And this comment is easily something we can ignore for the sake of readability, but if we use the variant where the time provider function just passes along the original time, we're wrapping the listener just to reconstruct the same response effectively. I wonder if there's a way we could refactor this to avoid that. Though, if it's messy maybe we just move on with our lives without doing so.

So I have refactored this to have short circuit logic in each of the overloaded methods. This means we quick return if no items are modified.

jbaiera · 2025-08-01T03:50:15Z

server/src/main/java/org/elasticsearch/action/bulk/BulkResponse.java

@@ -166,4 +168,19 @@ public Iterator<? extends ToXContent> toXContentChunked(ToXContent.Params params
            return builder.startArray(ITEMS);
        }), Iterators.forArray(responses), Iterators.<ToXContent>single((builder, p) -> builder.endArray().endObject()));
    }
+
+    @Override
+    public boolean equals(Object o) {


Are we adding this for completeness sake or are we using it somewhere? Checking bulk response equality seems like something that could be unintentionally expensive for a large or complicated request.

Ahhh this was needed to fix a unit test which relied on a sameInstance assertion which became invalid after we started wrapping everything at the higher level. However it's now no longer required with the short circuit wrapping logic added here: #132011 (comment) so have reverted the change as it goes back to being the same instance if wrapped with no modifications.

jbaiera · 2025-08-01T04:02:02Z

server/src/main/java/org/elasticsearch/ingest/IngestService.java

+                                            "Pipelines can't re-route documents to child streams, but pipeline [%s] tried to reroute "
+                                                + "this document from index [%s] to index [%s]. Reroute history: %s",


Nit: the term "reroute" - That reads to me to mean using the reroute processor, but I think you can get to here via any method of changing the index name. Also maybe we should elaborate on what we mean by child stream.

A rough suggestion like

Suggested change

"Pipelines can't re-route documents to child streams, but pipeline [%s] tried to reroute "

+ "this document from index [%s] to index [%s]. Reroute history: %s",

"pipeline [%s] can't change the target index (from [%s] to [%s] child stream [%s]) for document [%s]. History: [%s]",

Or along those lines? e.g. (from [my-index-name] to [logs] child stream [logs.nginx.prod])

Updated the message although omitted the document as this error is rendered in line with the document ID / slot.

jbaiera · 2025-08-01T04:05:39Z

server/src/main/java/org/elasticsearch/ingest/IngestService.java

+                                            pipelineId,
+                                            originalIndex,
+                                            newIndex,
+                                            String.join(" -> ", ingestDocument.getIndexHistory())


Nit: I like the style of the arrow separator, but I think a comma separated list is more aligned with our log message style and perhaps a tad easier to parse if ever needed.

Suggested change

String.join(" -> ", ingestDocument.getIndexHistory())

String.join(", ", ingestDocument.getIndexHistory())

Switched. On a personal level I much prefer the -> for readability but you're right, consistency is more important here :-)

jbaiera · 2025-08-01T04:14:18Z

server/src/main/java/org/elasticsearch/ingest/IngestService.java

+                            exceptionHandler.accept(
+                                new IngestPipelineException(
+                                    pipelineId,
+                                    new IllegalArgumentException(


The other exceptions like this are all IllegalStateException, should we follow suit?

I don't think so... IllegalStateException would cause a 500 code which isn't reflective of the situation in this case. The user has made an error that is correctable (Index into the parent not the child) so I think IllegalArgumentException and the 400 code it returns is a better option.

lukewhiting requested review from masseyke, jbaiera and Copilot July 28, 2025 13:14

lukewhiting added >feature :Data Management/Data streams Data streams and their lifecycles v9.2.0 labels Jul 28, 2025

elasticsearchmachine added the Team:Data Management Meta label for data/management team label Jul 28, 2025

This comment was marked as outdated.

Sign in to view

github-actions bot deployed to docs-preview July 28, 2025 13:17 View deployment

lukewhiting added 6 commits July 28, 2025 14:19

Update BulkRequestModifier to allow wrapping multiple times while pre…

c0f1a12

…serving ingest time taken

Modify BulkResponse to have an equals method and update ingest test's…

f8fa32b

… to not depend on same instance assertions This prevents issues when wrapping responses during ingest

Add new StreamType enum along with logic to check if that stream type…

34082ef

… is enabled in the cluster

Modify IngestService to prevent documents being re-routed into child …

2457383

…streams

Modify TransportAbstractBulkAction to prevent indexing into child str…

a2eab2a

…eams

Additional tests for new indexing restrictions

1a861cd

lukewhiting force-pushed the es-11941-streams-logs-bulk-transport-changes branch from 5936fd2 to 1a861cd Compare July 28, 2025 13:19

Merge branch 'main' into es-11941-streams-logs-bulk-transport-changes

5faf7fb

szybia reviewed Jul 28, 2025

View reviewed changes

lukewhiting and others added 3 commits July 29, 2025 09:41

Apply suggestion from @szybia

1c4225b

Co-authored-by: Szymon Bialkowski <szybia@tuta.io>

Apply suggestions from code review

fbfd61b

Co-authored-by: Szymon Bialkowski <szybia@tuta.io>

Additional PR changes and cleanup

78cf0ef

lukewhiting requested a review from Copilot July 29, 2025 09:07

Copilot AI reviewed Jul 29, 2025

View reviewed changes

Additional PR changes to improve performance and readability further

5e1d615

szybia reviewed Jul 29, 2025

View reviewed changes

lukewhiting added >enhancement and removed >feature labels Jul 29, 2025

Update docs/changelog/132011.yaml

387b4e3

github-actions bot deployed to docs-preview July 29, 2025 10:21 View deployment

masseyke reviewed Jul 30, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/action/bulk/BulkRequestModifier.java Show resolved Hide resolved

masseyke approved these changes Jul 30, 2025

View reviewed changes

lukewhiting added 3 commits July 31, 2025 09:24

Added additional documentation on bulk modifier wrap methods

d54b7b9

Merge remote-tracking branch 'origin/es-11941-streams-logs-bulk-trans…

e5581aa

…port-changes' into es-11941-streams-logs-bulk-transport-changes

Merge branch 'main' into es-11941-streams-logs-bulk-transport-changes

a7e6f7a

jbaiera approved these changes Aug 1, 2025

View reviewed changes

PR Changes

3569947

		"Pipelines can't re-route documents to child streams, but pipeline [%s] tried to reroute "
		+ "this document from index [%s] to index [%s]. Reroute history: %s",

	"Pipelines can't re-route documents to child streams, but pipeline [%s] tried to reroute "
	+ "this document from index [%s] to index [%s]. Reroute history: %s",
	"pipeline [%s] can't change the target index (from [%s] to [%s] child stream [%s]) for document [%s]. History: [%s]",

	String.join(" -> ", ingestDocument.getIndexHistory())
	String.join(", ", ingestDocument.getIndexHistory())

Restrict Indexing To Child Streams When Streams Is Enabled #132011

Are you sure you want to change the base?

Restrict Indexing To Child Streams When Streams Is Enabled #132011

Conversation

lukewhiting commented Jul 28, 2025

Example

Input

Output

Uh oh!

elasticsearchmachine commented Jul 28, 2025

Uh oh!

elasticsearchmachine commented Jul 28, 2025

Uh oh!

lukewhiting commented Jul 28, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

szybia left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

szybia left a comment

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Jul 29, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

masseyke left a comment

Choose a reason for hiding this comment

Uh oh!

jbaiera left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

jbaiera Aug 1, 2025 •

edited

Loading