You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/content/docs/deployment/filesystems/s3.md
+18-4Lines changed: 18 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -68,7 +68,7 @@ Note that these examples are *not* exhaustive and you can use S3 in other places
68
68
69
69
Flink provides three independent S3 filesystem implementations, each with different trade-offs:
70
70
71
-
-**Native S3 FileSystem** (`flink-s3-fs-native`): A drop-in replacement built on AWS SDK v2 with minimal dependencies. **Experimental** in Flink 2.3.
71
+
-**Native S3 FileSystem** (`flink-s3-fs-native`): Built directly on AWS SDK v2 with async I/O and parallel transfers, this implementation supports both checkpointing and the FileSystem sink. [Benchmarks](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=406620396) show ~2x higher checkpoint throughput (~200 MB/s vs ~90 MB/s) compared to the Presto implementation at state sizes up to 15 GB. **Experimental** in Flink 2.3; the API and behavior may change in future releases.
72
72
-**Presto S3 FileSystem** (`flink-s3-fs-presto`): Based on Presto project code, recommended for checkpointing.
73
73
-**Hadoop S3 FileSystem** (`flink-s3-fs-hadoop`): Based on Hadoop project code, has FileSystem sink support.
The S3 filesystems also support using S3 compliant object stores such as [IBM's Cloud Object Storage](https://www.ibm.com/cloud/object-storage) and [MinIO](https://min.io/).
111
+
The S3 filesystems also support using S3 compliant object stores such as [IBM's Cloud Object Storage](https://www.ibm.com/cloud/object-storage) and [Cloudflare R2](https://developers.cloudflare.com/r2/).
112
112
To do so, configure your endpoint in [Flink configuration file]({{< ref "docs/deployment/config#flink-configuration-file" >}}):
Some S3 compliant object stores might not have virtual host style addressing enabled by default, for example when using Standalone MinIO for testing purpose. In such cases, you will have to provide the property to enable path style access in [Flink configuration file]({{< ref "docs/deployment/config#flink-configuration-file" >}}):
120
+
Some S3 compliant object stores might not have virtual host style addressing enabled by default. In such cases, you will have to provide the property to enable path style access in [Flink configuration file]({{< ref "docs/deployment/config#flink-configuration-file" >}}):
121
121
122
122
```yaml
123
123
s3.path-style-access: true
@@ -131,7 +131,7 @@ s3.path-style-access: true
131
131
**Experimental**: The Native S3 FileSystem implementation is experimental in Flink 2.3. While functionally complete, it should not yet be used in production environments. Please use Presto or Hadoop implementations for production deployments.
132
132
{{< /hint >}}
133
133
134
-
The Native S3 FileSystem is a pure-Java implementation built on the AWS SDK v2. It requires no additional dependencies and provides a drop-in replacement for the Presto and Hadoop implementations.
134
+
The Native S3 FileSystem is a pure-Java implementation built on the AWS SDK v2. It is registered under the schemes *s3://* and *s3a://*. It requires no additional dependencies and provides a drop-in replacement for the Presto and Hadoop implementations.
135
135
136
136
#### Setup
137
137
@@ -295,11 +295,25 @@ Hadoop configuration keys are automatically translated. For example, `fs.s3a.con
295
295
296
296
## Using Multiple S3 Implementations
297
297
298
+
All three S3 implementations register as handlers for the *s3://* scheme. Additionally, each implementation supports alternative schemes:
299
+
300
+
| Implementation | Schemes |
301
+
|---------------|---------|
302
+
| Native S3 | *s3://*, *s3a://* |
303
+
| Presto | *s3://*, *s3p://* |
304
+
| Hadoop | *s3://*, *s3a://* |
305
+
306
+
Only one implementation can handle a given scheme at a time. The Native S3 implementation has the lowest priority, so when another implementation is present, it will take precedence for the *s3://* scheme.
307
+
298
308
You can use multiple S3 implementations simultaneously by leveraging their different URI schemes. For example, if a job uses the [FileSystem]({{< ref "docs/connectors/datastream/filesystem" >}}) sink (Hadoop-only) but Presto for checkpointing:
299
309
300
310
- Use *s3a://* scheme for the sink (Hadoop)
301
311
- Use *s3p://* scheme for checkpointing (Presto)
302
312
313
+
{{< hint info >}}
314
+
The Native S3 implementation does not introduce a new URI scheme. It reuses the existing *s3://* and *s3a://* schemes. To use it alongside the Hadoop implementation, ensure only the Native S3 plugin JAR is in the `plugins` directory (i.e., do not have both `flink-s3-fs-native` and `flink-s3-fs-hadoop` plugins loaded simultaneously for the same scheme).
0 commit comments