diff --git a/.prettierignore b/.prettierignore
new file mode 100644
index 0000000..a0c9239
--- /dev/null
+++ b/.prettierignore
@@ -0,0 +1,9 @@
+# We are using https://standardjs.com coding style. It is unfortunately incompatible with Prettier;
+# see https://github.com/standard/standard/issues/996 and the linked issues.
+# Standard does not handle Markdown files though, so we want to use Prettier formatting.
+# The goal of the configuration below is to configure Prettier to lint only Markdown files.
+**/*.*
+!**/*.md
+
+# Let's keep LICENSE.md in the same formatting as we use in other PL repositories
+LICENSE.md
diff --git a/.prettierrc.yaml b/.prettierrc.yaml
new file mode 100644
index 0000000..282cbf2
--- /dev/null
+++ b/.prettierrc.yaml
@@ -0,0 +1,2 @@
+printWidth: 80
+proseWrap: always
diff --git a/README.md b/README.md
index 9f055dd..4f05fc8 100644
--- a/README.md
+++ b/README.md
@@ -1,3 +1,5 @@
 # piece-indexer
 
 A lightweight IPNI node mapping Filecoin PieceCID → payload block CID.
+
+- [Design doc](./docs/design.md)
diff --git a/docs/design.md b/docs/design.md
new file mode 100644
index 0000000..b970a6b
--- /dev/null
+++ b/docs/design.md
@@ -0,0 +1,456 @@
+# Design Doc: Piece Indexer
+
+## Introduction
+
+### Context
+
+Filecoin defines a `Piece` as the main unit of negotiation for data that users
+_store_ on the Filecoin network. This is reflected in the on-chain metadata
+field `PieceCID`.
+
+On the other hand, content is _retrieved_ from Filecoin using the CID of the
+requested payload.
+
+This dichotomy poses a challenge for retrieval checkers like Spark: for a given
+deal storing some PieceCID, what payload CID to request when testing the
+retrieval?
+
+Spark v1 relies on StorageMarket's DealProposal metadata `Label`, which is often
+(but not always!) set by the client to the root CID of the payload stored.
+
+In the first half of 2024, the Filecoin network added support for DDO (Direct
+Data Onboarding) deals. These deals don't have any DealProposal, there is no
+`Label` field. Only `PieceCID`. We need to find a different solution to support
+these deals.
+
+In the longer term, we want IPNI to provide a reverse lookup functionality that
+will allow clients like Spark to request a sample of payload block CIDs
+associated with the given ContextID or PieceCID, see the
+[design proposal for InterPlanetary Piece Index](https://docs.google.com/document/d/1jhvP48ccUltmCr4xmquTnbwfTSD7LbO1i1OVil04T2w).
+
+To cover the gap until the InterPlanetary Piece Index is live, we want to
+implement a lightweight solution that can serve as a stepping stone between
+Label-based CID discovery and full Piece Index sampling.
+
+### High-Level Idea
+
+Let's implement a lightweight IPNI ingester that will process the advertisements
+from Filecoin SPs and extract the list of `(ProviderID, PieceCID, PayloadCID)`
+entries. Store these entries in a Postgres database. Provide a REST API endpoint
+accepting `(ProviderID, PieceCID)` and returning a single `PayloadCID`.
+
+### Terminology
+
+- **Indexer:** A network node that keeps a mappings of multihashes to provider
+  records. Example: https://cid.contact
+
+- **Index Provider (a.k.a Publisher):** An entity that publishes advertisements
+  and index data to an indexer. It is usually, but not always, the same as the
+  data provider. Example: a [Boost](https://boost.filecoin.io) instance operated
+  by a Storage Provider.
+
+Quoting from
+[IPNI spec](https://github.com/ipni/specs/blob/90648bca4749ef912b2d18f221514bc26b5bef0a/IPNI.md#terminology):
+
+- **Advertisement**: A record available from a publisher that contains, a link
+  to a chain of multihash blocks, the CID of the previous advertisement, and
+  provider-specific content metadata that is referenced by all the multihashes
+  in the linked multihash blocks. The provider data is identified by a key
+  called a context ID.
+
+- **Announce Message**: A message that informs indexers about the availability
+  of an advertisement. This is usually sent via gossip pubsub, but can also be
+  sent via HTTP. An announce message contains the advertisement CID it is
+  announcing, which allows indexers to ignore the announce if they have already
+  indexed the advertisement. The publisher's address is included in the announce
+  to tell indexers where to retrieve the advertisement from.
+
+- **Context ID**: A key that, for a provider, uniquely identifies content
+  metadata. This allows content metadata to be updated or delete on the indexer
+  without having to refer to it using the multihashes that map to it.
+
+- **Metadata**: Provider-specific data that a retrieval client gets from an
+  indexer query and passed to the provider when retrieving content. This
+  metadata is used by the provider to identify and find specific content and
+  deliver that content via the protocol (e.g. graphsync) specified in the
+  metadata.
+
+- **Provider**: Also called a Storage Provider, this is the entity from which
+  content can be retrieved by a retrieval client. When multihashes are looked up
+  on an indexer, the responses contain provider that provide the content
+  referenced by the multihashes. A provider is identified by a libp2p peer ID.
+
+- **Publisher**: This is an entity that publishes advertisements and index data
+  to an indexer. It is usually, but not always, the same as the data provider. A
+  publisher is identified by a libp2p peer ID.
+
+### Notes
+
+**System Components**
+
+There are two components in this design:
+
+- A deal tracker component observing StorageMarket & DDO deals to build a list
+  of deals eligible for Spark retrieval testing. We define deal as a tuple
+  `(PieceCID, MinerID, ClientID)`.
+
+- A piece indexer observing IPNI announcements to build an index mapping
+  PieceCIDs to PayloadCIDs.
+
+When Spark builds a list of tasks for the current round, it will ask the deal
+tracker for 1000 active deals. This ensures we test retrievals for active deals
+only.
+
+When Spark checker tests retrieval, it will first consult the piece indexer to
+convert deal's PieceCID to a payload CID to retrieve.
+
+**Expired Deals**
+
+Pieces are immutable. If we receive an advertisement saying that a payload block
+CID was found in a piece CID, then this information remains valid forever, even
+after the SP advertise that they are no longer storing that block. This means
+our indexer can ignore `IsRm` advertisements.
+
+It's ok if the piece indexer stores data for expired deals, because Spark is not
+going to ask for that data. (Of course, there is the cost of storing data we
+don't need, but we don't have to deal with that yet.)
+
+**Payload CIDs Are Scoped to Providers**
+
+The indexer protocol does not provide any guarantees about the list of CIDs
+advertised for the same Piece CID. Different SPs can advertise different lists
+(e.g. the entries can be ordered differently) or can even cheat and submit CIDs
+that are not part of the Piece. Our indexer must scope the information to each
+index provider (each Filecoin SP).
+
+### Anatomy of IPNI Advertisements
+
+Quoting from
+[Ingestion](https://github.com/ipni/specs/blob/90648bca4749ef912b2d18f221514bc26b5bef0a/IPNI.md#ingestion):
+
+> The indexer reads the advertisement chain starting from the head, reading
+> previous advertisements until a previously seen advertisement, or the end of
+> the chain, is reached. The advertisements and their entries are then processed
+> in order from earliest to head.
+
+An Advertisement has several properties (see
+[the full spec](https://github.com/ipni/specs/blob/90648bca4749ef912b2d18f221514bc26b5bef0a/IPNI.md#advertisements)),
+we need the following ones:
+
+- **`PreviousID`** is the CID of the previous advertisement, and is empty for
+  the 'genesis'.
+
+- **`Metadata`** represents additional opaque data. The metadata for Graphsync
+  retrievals includes the PieceCID that we are looking for.
+
+- **`Entries`** is a link to a data structure that contains the advertised
+  multihashes. For our purposes, it's enough to take the first entry and ignore
+  the rest.
+
+Advertisements are made available for consumption by indexer nodes as a set of
+files that can be fetched via HTTP.
+
+Quoting from
+[Advertisement Transfer](https://github.com/ipni/specs/blob/90648bca4749ef912b2d18f221514bc26b5bef0a/IPNI.md#advertisement-transfer):
+
+> All IPNI HTTP requests use the IPNI URL path prefix, `/ipni/v1/ad/`. Indexers
+> and advertisement publishers implicitly use and expect this prefix to precede
+> the requested resource.
+>
+> The IPLD objects of advertisements and entries are represented as files named
+> by their CIDs in an HTTP directory. These files are immutable, so can be
+> safely cached or stored on CDNs. To fetch an advertisement or entries file by
+> CID, the request made by the indexer to the publisher is
+> `GET /ipni/v1/ad/{CID}`.
+
+The IPNI instance running at https://cid.contact provides an API returning the
+list of all index providers from which cid.contact have received announcements:
+
+https://cid.contact/providers
+
+The response provides all the metadata we need to download the advertisements.
+For each index provider, the response includes:
+
+- `Publisher.Addrs` describes where we can contact SP's index provider to
+  retrieve content for CIDs, e.g., advertisements.
+- `LastAdvertisement` contains the CID of the head advertisement from the SP
+
+## Proposed Design
+
+### Ingestion
+
+Ingesting announcements from all Storage Providers is the most complex
+component. For each storage provider, we need to periodically check for the
+latest advertisement head and process the chain from head until we find an
+advertisement we have already processed before. The chain can be very long,
+therefore we need to account for the cases when the service restarts or a new
+head is published before we finish processing the chain.
+
+#### Proposed algorithm
+
+**Persisted state (per provider)**
+
+Use the following per-provider state persisted in the database:
+
+- `provider_id` - Primary key.
+
+- `provider_address` - Provider's address where we can fetch advertisements
+  from.
+
+- `last_head` - The CID of the head where we started the previous walk (the last
+  walk that has already finished). All advertisements from `last_head` to the
+  end of the chain have already been processed.
+
+- `next_head` - The CID of the most recent head seen by cid.contact. This is
+  where we need to start the next walk from.
+
+  > **Note:** The initial walk will take a long time to complete. While we are
+  > walking the "old" chain, new advertisements (new heads) will be announced to
+  > IPNI.
+  >
+  > - `next_head` is the latest head announced to IPNI
+  > - `head` is the advertisement where the current walk-in-progress started
+  >
+  > I suppose we don't need to keep track of `next_head`. When the current walk
+  > finishes, we will wait up to one minute until we make another request to
+  > cid.contact to find what are the latest heads for each SPs.\_
+  >
+  > In the current proposal, when the current walk finishes, we can immediately
+  > continue with walking from the `next_head`.
+
+- `head` - The CID of the head advertisement we started the current walk from.
+  We update this value whenever we start a new walk.
+
+- `tail` - The CID of the next advertisement in the chain that we need to
+  process in the current walk.
+
+We must always walk the chain all the way to the genesis or to the entry we have
+already seen & processed.
+
+The current walk starts from `head` and walks up to `last_head`. When the
+current walk reaches `last_head`, we need to set `last_head ← head` so that the
+next walk knows where to stop.
+
+`next_head` is updated every minute when we query cid.contact for the latest
+heads. If the walk takes longer than a minute to finish, then `next_head` will
+change and we cannot use it for `last_head`.
+
+Here is how the state looks like in the middle of a walk:
+
+```
+next_head --> [ ] -\
+               ↓    |
+              ...   | entries announced after we started the current walk
+               ↓    |
+              [ ] -/
+               ↓
+     head --> [ ] -\
+               ↓    |
+              ...   | entries visited in this walk
+               ↓    |
+              [ ] -/
+               ↓
+     tail --> [ ] -\
+               ↓    |
+              ...   | entries NOT visited yet
+               ↓    |
+              [ ] -/
+               ↓
+last_head --> [ ] -\
+               ↓    |
+              ...   | entries visited in the previous walks
+               ↓    |
+              [ ] -/
+               ↓
+             (null)
+```
+
+**Track latest advertisements**
+
+Every minute, run the following high-level loop:
+
+1. Fetch the list of providers and their latest advertisements (heads) from
+   https://cid.contact/providers. (This is **one** HTTP request.)
+
+2. Fetch the state of all providers from our database. (This is **one** SQL
+   query.)
+
+3. Update the state of each provider as described below, using the name
+   `LastAdvertisement` for the CID of the latest advertisement provided in the
+   response from cid.contact. (This is a small bit of computation plus **one**
+   SQL query.)
+
+> **Note:** Instead of running the loop every minute, we can introduce a
+> one-minute delay between the iterations instead. It should not matter too much
+> in practice, though. I expect each iteration to finish within one minute:
+>
+> - The three steps outlined above require only three interactions over
+>   HTTP/SQL.
+> - The long chain walks are executed in the background and don't block this
+>   loop.
+
+For each provider listed in the response:
+
+1. If `last_head` is not set, then we need to start the ingestion from scratch.
+   Update the state as follows and start the chain walker:
+
+   ```
+   last_head := new_head
+   next_head := new_head
+   head := new_head
+   tail := new_head
+   ```
+
+2. If `LastAdvertisement` is the same as `next_head`, then there was no change
+   since we checked the head last time and we are done.
+
+3. If `tail` is not null, then there is an ongoing walk of the chain we need to
+   finish before we can ingest new advertisements. Update the state as follows
+   and abort.
+
+   ```
+   next_head := LastAdvertisement
+   ```
+
+4. `tail` is null, which means we have finished ingesting all advertisements
+   from `head` to the end of the chain. Update the state as follows and start
+   the chain walker.
+
+   ```
+   next_head := new_head
+   head := new_head
+   tail := new_head
+   ```
+
+**Walk advertisement chains (in background)**
+
+The chain-walking algorithm runs in the background and loops over the following
+steps:
+
+1. If ` tail == last_head || tail == null`, then we finished the walk. Update
+   the state as follows:
+
+   ```
+   last_head := head
+   head := null
+   tail := null
+   ```
+
+   If `next_head != last_head` then start a new walk by updating the state as
+   follows:
+
+   ```
+   head := next_head
+   tail := next_head
+   ```
+
+2. Otherwise take a step to the next item in the chain:
+
+   1. Fetch the advertisement identified by `tail` from the index provider.
+   2. Process the metadata and entries to extract one `(PieceCID, PayloadCID)`
+      entry.
+   3. Update the `tail` field using the `PreviousID` field from the
+      advertisement.
+
+   ```
+   tail := PreviousID
+   ```
+
+#### Handling the Scale
+
+At the time of writing this document, cid.contact was tracking 322 index
+providers. From Sparks' measurements, we know there are an additional 843
+storage providers that don't advertise to IPNI. The number of storage providers
+grows over time, our system must be prepared to ingest advertisements from
+thousands of providers.
+
+Each storage/index provider produces tens of thousands to hundreds of thousands
+of advertisements (in total). The initial ingestion run will take a while to
+complete. We also must be careful to not overload the SP by sending too many
+requests.
+
+The design outlined in the previous section divides the ingestion process into
+small steps that can be scheduled and executed independently. This allows us to
+avoid the complexity of managing long-running per-provider tasks and instead
+repeatedly execute one step of the process.
+
+Loop 1: Every minute, fetch the latest provider information from cid.contact and
+update the persisted state as outlined above.
+
+Loop 2: Discover walks in progress and make one step in each walk.
+
+1. Find all provider state records where `tail != null`.
+
+2. For each provider, execute one step as described above. We can execute these
+   steps in parallel. Since each parallel job will query a different provider,
+   we are not going to overload any single provider.
+
+3. Optionally, we can introduce a small delay before the next iteration. I think
+   we won't need it because the time to execute SQL queries should create enough
+   delay.
+
+### REST API
+
+Implement the following endpoint that will be called by Spark checker nodes. The
+endpoint will sign the response using the server's private key to allow
+spark-evaluate to verify the authenticity of results reported by checker nodes:
+
+```
+GET /sample/{providerId}/{pieceCid}?seed={seed}
+```
+
+Response in JSON format, when the piece was found:
+
+```json
+{
+  "samples": ["exactly one CID of a payload block advertised for PieceCID"],
+  "pubkey": "server's public key",
+  "signature": "signature over dag-json{providerId,pieceCid,seed,samples}"
+}
+```
+
+Response in JSON format, when the piece or the provider was not found:
+
+```json
+{
+  "error": "code - e.g. PROVIDER_NOT_FOUND or PIECE_NOT_FOUND",
+  "pubkey": "server's public key",
+  "signature": "signature over dag-json{providerId,pieceCid,seed,error}"
+}
+```
+
+In the initial version, the server will ignore the `seed` value and use it only
+as a nonce preventing replay attacks. Spark checker nodes will set the seed
+using the DRAND randomness string for the current Spark round.
+
+In the future, when IPNI implements the proposed reverse-index sampling
+endpoint, the seed will be used to pick the samples at random. See the
+[IPNI Multihash Sampling API proposal](https://github.com/ipni/xedni/blob/526f90f5a6001cb50b52e6376f8877163f8018af/openapi.yaml)
+
+### Observability
+
+We need visibility into the status of ingestion for any given provider. Some
+providers don't advertise at all, some may have misconfigured integration with
+IPNI, we need to understand why our index does not include any data for a
+provider.
+
+Let's enhance the state table with another column describing the ingestion
+status as a free-form string and implement a new REST API endpoint to query the
+ingestion status.
+
+```
+GET /ingestion-status/{providerId}
+```
+
+Response in JSON format:
+
+```json
+{
+  "providerId": "state.provider_id",
+  "providerAddress": "state.provider_address",
+  "ingestionStatus": "state.ingestion_status",
+  "lastHeadWalkedFrom": "state.last_head",
+  "piecesIndexed": 123
+  // ^^ number of (PieceCID, PayloadCID) records found for this provider
+}
+```
diff --git a/package-lock.json b/package-lock.json
index 4d76240..23f22ed 100644
--- a/package-lock.json
+++ b/package-lock.json
@@ -1,13 +1,31 @@
 {
-  "name": "piece-indexer",
+  "name": "@filecoin-station/piece-indexer",
   "version": "0.1.0",
   "lockfileVersion": 3,
   "requires": true,
   "packages": {
     "": {
-      "name": "piece-indexer",
+      "name": "@filecoin-station/piece-indexer",
       "version": "0.1.0",
-      "license": "(Apache-2.0 AND MIT)"
+      "license": "(Apache-2.0 AND MIT)",
+      "devDependencies": {
+        "prettier": "^3.3.2"
+      }
+    },
+    "node_modules/prettier": {
+      "version": "3.3.2",
+      "resolved": "https://registry.npmjs.org/prettier/-/prettier-3.3.2.tgz",
+      "integrity": "sha512-rAVeHYMcv8ATV5d508CFdn+8/pHPpXeIid1DdrPwXnaAdH7cqjVbpJaT5eq4yRAFU/lsbwYwSF/n5iNrdJHPQA==",
+      "dev": true,
+      "bin": {
+        "prettier": "bin/prettier.cjs"
+      },
+      "engines": {
+        "node": ">=14"
+      },
+      "funding": {
+        "url": "https://github.com/prettier/prettier?sponsor=1"
+      }
     }
   }
 }
diff --git a/package.json b/package.json
index 0e9b370..7d8e0e0 100644
--- a/package.json
+++ b/package.json
@@ -5,6 +5,8 @@
   "type": "module",
   "main": "index.js",
   "scripts": {
+    "lint": "prettier --check .",
+    "lint:fix": "prettier --write .",
     "test": "echo \"Error: no test specified\" && exit 1"
   },
   "repository": {
@@ -16,5 +18,8 @@
   "bugs": {
     "url": "https://github.com/filecoin-station/piece-indexer/issues"
   },
-  "homepage": "https://github.com/filecoin-station/piece-indexer#readme"
+  "homepage": "https://github.com/filecoin-station/piece-indexer#readme",
+  "devDependencies": {
+    "prettier": "^3.3.2"
+  }
 }