Skip to content

Commit 7be000d

Browse files
committed
Address to review comments. Specify the scheme supported by native-s3-fs.
1 parent 026ab11 commit 7be000d

1 file changed

Lines changed: 18 additions & 4 deletions

File tree

  • docs/content/docs/deployment/filesystems

docs/content/docs/deployment/filesystems/s3.md

Lines changed: 18 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ Note that these examples are *not* exhaustive and you can use S3 in other places
6868

6969
Flink provides three independent S3 filesystem implementations, each with different trade-offs:
7070

71-
- **Native S3 FileSystem** (`flink-s3-fs-native`): A drop-in replacement built on AWS SDK v2 with minimal dependencies. **Experimental** in Flink 2.3.
71+
- **Native S3 FileSystem** (`flink-s3-fs-native`): Built directly on AWS SDK v2 with async I/O and parallel transfers, this implementation supports both checkpointing and the FileSystem sink. [Benchmarks](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=406620396) show ~2x higher checkpoint throughput (~200 MB/s vs ~90 MB/s) compared to the Presto implementation at state sizes up to 15 GB. **Experimental** in Flink 2.3; the API and behavior may change in future releases.
7272
- **Presto S3 FileSystem** (`flink-s3-fs-presto`): Based on Presto project code, recommended for checkpointing.
7373
- **Hadoop S3 FileSystem** (`flink-s3-fs-hadoop`): Based on Hadoop project code, has FileSystem sink support.
7474

@@ -108,7 +108,7 @@ presto.s3.credentials-provider: org.apache.flink.fs.s3.common.token.DynamicTempo
108108
109109
### Configure Non-S3 Endpoint
110110
111-
The S3 filesystems also support using S3 compliant object stores such as [IBM's Cloud Object Storage](https://www.ibm.com/cloud/object-storage) and [MinIO](https://min.io/).
111+
The S3 filesystems also support using S3 compliant object stores such as [IBM's Cloud Object Storage](https://www.ibm.com/cloud/object-storage) and [Cloudflare R2](https://developers.cloudflare.com/r2/).
112112
To do so, configure your endpoint in [Flink configuration file]({{< ref "docs/deployment/config#flink-configuration-file" >}}):
113113
114114
```yaml
@@ -117,7 +117,7 @@ s3.endpoint: your-endpoint-hostname
117117
118118
### Configure Path Style Access
119119
120-
Some S3 compliant object stores might not have virtual host style addressing enabled by default, for example when using Standalone MinIO for testing purpose. In such cases, you will have to provide the property to enable path style access in [Flink configuration file]({{< ref "docs/deployment/config#flink-configuration-file" >}}):
120+
Some S3 compliant object stores might not have virtual host style addressing enabled by default. In such cases, you will have to provide the property to enable path style access in [Flink configuration file]({{< ref "docs/deployment/config#flink-configuration-file" >}}):
121121
122122
```yaml
123123
s3.path-style-access: true
@@ -131,7 +131,7 @@ s3.path-style-access: true
131131
**Experimental**: The Native S3 FileSystem implementation is experimental in Flink 2.3. While functionally complete, it should not yet be used in production environments. Please use Presto or Hadoop implementations for production deployments.
132132
{{< /hint >}}
133133
134-
The Native S3 FileSystem is a pure-Java implementation built on the AWS SDK v2. It requires no additional dependencies and provides a drop-in replacement for the Presto and Hadoop implementations.
134+
The Native S3 FileSystem is a pure-Java implementation built on the AWS SDK v2. It is registered under the schemes *s3://* and *s3a://*. It requires no additional dependencies and provides a drop-in replacement for the Presto and Hadoop implementations.
135135
136136
#### Setup
137137
@@ -295,11 +295,25 @@ Hadoop configuration keys are automatically translated. For example, `fs.s3a.con
295295

296296
## Using Multiple S3 Implementations
297297

298+
All three S3 implementations register as handlers for the *s3://* scheme. Additionally, each implementation supports alternative schemes:
299+
300+
| Implementation | Schemes |
301+
|---------------|---------|
302+
| Native S3 | *s3://*, *s3a://* |
303+
| Presto | *s3://*, *s3p://* |
304+
| Hadoop | *s3://*, *s3a://* |
305+
306+
Only one implementation can handle a given scheme at a time. The Native S3 implementation has the lowest priority, so when another implementation is present, it will take precedence for the *s3://* scheme.
307+
298308
You can use multiple S3 implementations simultaneously by leveraging their different URI schemes. For example, if a job uses the [FileSystem]({{< ref "docs/connectors/datastream/filesystem" >}}) sink (Hadoop-only) but Presto for checkpointing:
299309

300310
- Use *s3a://* scheme for the sink (Hadoop)
301311
- Use *s3p://* scheme for checkpointing (Presto)
302312

313+
{{< hint info >}}
314+
The Native S3 implementation does not introduce a new URI scheme. It reuses the existing *s3://* and *s3a://* schemes. To use it alongside the Hadoop implementation, ensure only the Native S3 plugin JAR is in the `plugins` directory (i.e., do not have both `flink-s3-fs-native` and `flink-s3-fs-hadoop` plugins loaded simultaneously for the same scheme).
315+
{{< /hint >}}
316+
303317
---
304318

305319
## Advanced Features

0 commit comments

Comments
 (0)