Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 17 additions & 11 deletions client-s3/README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,25 @@
# client-s3

Backfila client backend implementation to backfill all s3 objects in a bucket that match a prefix.
You specify a bucket via `getBucket`. You specify the prefix either statically by `staticPrefix` or by
overriding `getPrefix`. Each s3 object is a separate partition. You must define a record strategy to
divide up the s3 object into batches.
You specify a bucket via `getBucket`.
You specify the prefix either statically by `staticPrefix` or by overriding `getPrefix`.
Each s3 object is a separate partition.
Note that this means that each S3 object is run in parallel.
You must define a record strategy to divide up the s3 object into batches.
* Object names must be less than 45 characters after the prefix.

You must ensure that the bucket is available to the service using this client. The s3 calls are made
from this service.
You must ensure that the bucket is available to the service using this client.
The s3 calls are made from this service.

The partition name will be the object name after the prefix. The range values reported are the byte seek
in the s3 object. The record counts are also reported in bytes rather than strict records. Scan size is also in bytes.
The partition name will be the object name after the prefix.
The range values reported are the byte seek in the s3 object.
The record counts are also reported in bytes rather than strict records.
Scan size is also in bytes.

In tests install the `FakeS3Module` and fill `FakeS3Service` with your test files. You have two options,
either add each file in code or point `FakeS3Service` to a resource path and it will load everything
under that path. Use `S3CdnModule` in real instances and provide an `AmazonS3` annotated with `@ForS3Backend`.
In tests install the `FakeS3Module` and fill `FakeS3Service` with your test files.
You have two options, either add each file in code or point `FakeS3Service` to a resource path and it will load everything under that path.
Use `S3CdnModule` in real instances and provide an `AmazonS3` annotated with `@ForS3Backend`.

The code is the source of truth for this client. Keep that in mind. Always refer to the code for implementation details.
The code is the source of truth for this client.
Keep that in mind.
Always refer to the code for implementation details.
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ class S3DatasourceBackfillOperator<R : Any, P : Any>(
require(fileKeys.isNotEmpty()) {
"No files found for bucket:${backfill.getBucket(config)} prefix:$pathPrefix. At least one file must exist."
}
// We limit to 100 files since each file is run in parallel.
require(fileKeys.size <= 100) {
"Listing files matching the prefix contains ${fileKeys.size} which is more than 100 files. " +
"Check your prefix. First 3 file keys ${fileKeys.slice(0..2)}"
Expand Down
6 changes: 6 additions & 0 deletions client/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,9 @@

Contains the general, public client definitions for Backfila so clients and the service can
communicate. Never use this directly, instead use the specialized clients.

Java/Kotlin
Client - public API for communicating with Backfila, Customers depend on this for general backfila features (Parameters, dry run, logging)
Client-<service framework> - provides the base interaction with backfila for your service framework(Logging setup, client service, registration on startup). You install one of these in your real implementation. backfila-embedded is your test implementation.
Client-base - base functionality that all downstream datasource clients need (implementions of common features, parameters, Operator caching). Provides and SPI for those clients to satisfy in order to get this base functionality. This is private. Customers cannot depend on this. Only datasource clients depend on this.
client-<specific datasouce> - The specific datasouce Backfila implementation. This is where the ergonomics of working with your particular datasource exist