Skip to content

Commit 0e69c4f

Browse files
committed
add proposal for gc enhancement
Signed-off-by: wang yan <wangyan@vmware.com>
1 parent a4208b1 commit 0e69c4f

File tree

7 files changed

+138
-0
lines changed

7 files changed

+138
-0
lines changed
54.7 KB
Loading
54.7 KB
Loading
81.1 KB
Loading
34.4 KB
Loading
96.3 KB
Loading
33 KB
Loading
Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
Proposal: Garbage Collection Performance Enhancement
2+
3+
Author: Yan Wang
4+
5+
Discussion: goharbor/harbor#12948 goharbor/harbor#19986
6+
7+
## Abstract
8+
9+
During manifest deletion, Harbor performs additional cleanup of tag links in the backend registry storage through the Distribution v2 API. In cloud storage environments (e.g., S3), deleting these tag links is slow and negatively impacts GC performance.
10+
This proposal introduces an optimization to avoid persisting tags in the backend, enabling faster garbage collection cycles.
11+
12+
## Motivation
13+
14+
Harbor currently proxies manifest and tag pushes to the backend registry (Distribution). This results in both manifest blobs and tag link files being written to the underlying storage.
15+
During garbage collection, each tag link must be removed explicitly via a Distribution API call, which is slow in object storage setups. For example, S3 may take multiple seconds per tag deletion.
16+
17+
## Solution
18+
19+
This proposal presents three options for improving GC performance, with Option 3 as the preferred and proposed solution.
20+
21+
### Option 1: Skip Tag Deletion
22+
23+
Introduce a user-configurable option to skip tag deletion in the backend when removing manifests. This would eliminate the API call that slows down GC but leaves orphaned tag files in the backend.
24+
25+
#### Pros: Simple to implement, immediate performance gain.
26+
#### Cons: Leaves tag files behind, may cause confusion or inconsistencies for users browsing storage directly.
27+
28+
### Option 2: Batch Tag Deletion via Upstream Patch
29+
30+
Backport the upstream change from Distribution to support batch deletion of tags (introduced in later versions of the registry). This allows multiple tag links to be deleted in one API call.
31+
32+
#### Pros: Compatible with current Distribution behavior, improves performance.
33+
#### Cons: Limited gain, still depends on Distribution API and backend performance.
34+
35+
### Option 3: Do Not Land Tag Files in Backend (Proposed)
36+
37+
Modify Harbor's proxy logic to avoid writing tag link files to the backend. When a manifest with a tag is pushed:
38+
39+
Harbor Core will extract the tag and persist it in its own database.
40+
The proxy request to the backend registry is rewritten to use the manifest's digest instead of the tag (PUT /v2/<repo>/manifests/<digest>).
41+
42+
#### Pros: Eliminates backend tag file overhead completely, simplifies GC logic, significant performance gain.
43+
#### Cons: Existing tag files will not be deleted unless explicitly handled.
44+
45+
## Data Flow Diagrams
46+
47+
Push with Tag (current behavior):
48+
![Data Flow Diagram with Tag](../images/gc-perf/push_with_tag_flow.png)
49+
50+
Push with Digest (Proposed Behavior):
51+
52+
![Data Flow Diagram with Digest](../images/gc-perf/push_with_digest_flow.png)
53+
54+
We would see the Tag is not landed in the background storage.
55+
56+
## Main Points
57+
58+
Tags are persisted only in Harbor’s database, not in backend storage.
59+
60+
On artifact pull/push, Harbor proxies the request with digest instead of tag.
61+
62+
No more tag deletion API calls to backend are needed during GC, the tag deletion becomes instant (DB-level operation only).
63+
64+
GC becomes faster, especially in object storage.
65+
66+
## Non Goals
67+
68+
This proposal does not attempt to clean up previously written tag link files.
69+
70+
## Compatibility and Consistency
71+
72+
No breaking changes; tag permissions are enforced at the API level.
73+
74+
Harbor will ensure consistency between tags and digests at the DB level.
75+
76+
Harbor CLI, APIs, and UI will continue to function as expected.
77+
78+
## Oci Object Background
79+
80+
I will take the hello-world:latest as an example to demonstrate the issue and solution.
81+
82+
After I push the image into Harbor, there are blobs, manifests & tag links are generated in the storage side (current behavior).
83+
84+
![Proxy Request with Tag](../images/gc-perf/push_with_tag.png)
85+
86+
After I removed this artifact from harbor(either via UI or API) and perform a GC, harbor will remove those layers and tag links.
87+
The performance bottleneck occurs during the tag deletion phase. Harbor relies on Distribution’s native tag deletion logic, which invokes the underlying storage driver to traverse all tags — a process that becomes slow, especially on object storage systems where directory traversal is costly.
88+
89+
![Proxy Request with Tag](../images/gc-perf/gc_after.png)
90+
91+
GC logs (With the call of distribution API to delete tags):
92+
93+
2025-07-21T10:00:40Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:419]: [ad65aa8b-fdc7-4d84-a6ee-f2113fe85cc4][1/3] delete blob from storage: sha256:6d3e4188a38af91b0c1577b9e88c53368926b2fe0e1fb985d6e8a70040520c4d
94+
2025-07-21T10:00:40Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:448]: [ad65aa8b-fdc7-4d84-a6ee-f2113fe85cc4][1/3] delete blob record from database: 2, sha256:6d3e4188a38af91b0c1577b9e88c53368926b2fe0e1fb985d6e8a70040520c4d
95+
2025-07-21T10:00:40Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:419]: [ad65aa8b-fdc7-4d84-a6ee-f2113fe85cc4][2/3] delete blob from storage: sha256:14d59e6670a4d8e5c7219244632954350f4ab9d11cab29f3f52429097260a9e3
96+
2025-07-21T10:00:40Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:448]: [ad65aa8b-fdc7-4d84-a6ee-f2113fe85cc4][2/3] delete blob record from database: 1, sha256:14d59e6670a4d8e5c7219244632954350f4ab9d11cab29f3f52429097260a9e3
97+
2025-07-21T10:00:40Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:336]: [ad65aa8b-fdc7-4d84-a6ee-f2113fe85cc4][3/3] delete the manifest with registry v2 API: library/hello-world, application/vnd.docker.distribution.manifest.v2+json, sha256:ec06ff94ef8731492058cbe21bc15fb87ec0b98afc20961955200e7e70203c67
98+
2025-07-21T10:00:40Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:365]: [ad65aa8b-fdc7-4d84-a6ee-f2113fe85cc4][3/3] delete manifest from storage: sha256:ec06ff94ef8731492058cbe21bc15fb87ec0b98afc20961955200e7e70203c67
99+
2025-07-21T10:00:40Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:393]: [ad65aa8b-fdc7-4d84-a6ee-f2113fe85cc4][3/3] delete artifact blob record from database: 1, library/hello-world, sha256:ec06ff94ef8731492058cbe21bc15fb87ec0b98afc20961955200e7e70203c67
100+
2025-07-21T10:00:40Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:401]: [ad65aa8b-fdc7-4d84-a6ee-f2113fe85cc4][3/3] delete artifact trash record from database: 1, library/hello-world, sha256:ec06ff94ef8731492058cbe21bc15fb87ec0b98afc20961955200e7e70203c67
101+
2025-07-21T10:00:40Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:419]: [ad65aa8b-fdc7-4d84-a6ee-f2113fe85cc4][3/3] delete blob from storage: sha256:ec06ff94ef8731492058cbe21bc15fb87ec0b98afc20961955200e7e70203c67
102+
2025-07-21T10:00:40Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:448]: [ad65aa8b-fdc7-4d84-a6ee-f2113fe85cc4][3/3] delete blob record from database: 3, sha256:ec06ff94ef8731492058cbe21bc15fb87ec0b98afc20961955200e7e70203c67
103+
2025-07-21T10:00:40Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:477]: 2 blobs and 1 manifests are actually deleted
104+
105+
After enabling the new behavior and pushing the image again, only blobs and manifests are written; no tag link is created.
106+
107+
![Proxy Request with Tag](../images/gc-perf/push_with_digest.png)
108+
109+
After I removed this artifact from harbor(either via UI or API) and perform a GC, harbor will remove layers, no tag links to delete.
110+
111+
![Proxy Request with Tag](../images/gc-perf/gc_after.png)
112+
113+
GC logs (Without the call of distribution API to delete tags):
114+
115+
2025-07-21T07:56:08Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:393]: [e5dad442-c1e9-44c9-bea9-ab0c660e69a7][1/3] delete blob from storage: sha256:6d3e4188a38af91b0c1577b9e88c53368926b2fe0e1fb985d6e8a70040520c4d
116+
2025-07-21T07:56:08Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:422]: [e5dad442-c1e9-44c9-bea9-ab0c660e69a7][1/3] delete blob record from database: 2, sha256:6d3e4188a38af91b0c1577b9e88c53368926b2fe0e1fb985d6e8a70040520c4d
117+
2025-07-21T07:56:08Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:393]: [e5dad442-c1e9-44c9-bea9-ab0c660e69a7][2/3] delete blob from storage: sha256:14d59e6670a4d8e5c7219244632954350f4ab9d11cab29f3f52429097260a9e3
118+
2025-07-21T07:56:08Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:422]: [e5dad442-c1e9-44c9-bea9-ab0c660e69a7][2/3] delete blob record from database: 1, sha256:14d59e6670a4d8e5c7219244632954350f4ab9d11cab29f3f52429097260a9e3
119+
2025-07-21T07:56:08Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:339]: [e5dad442-c1e9-44c9-bea9-ab0c660e69a7][3/3] delete manifest from storage: sha256:ec06ff94ef8731492058cbe21bc15fb87ec0b98afc20961955200e7e70203c67
120+
2025-07-21T07:56:08Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:367]: [e5dad442-c1e9-44c9-bea9-ab0c660e69a7][3/3] delete artifact blob record from database: 1, library/hello-world, sha256:ec06ff94ef8731492058cbe21bc15fb87ec0b98afc20961955200e7e70203c67
121+
2025-07-21T07:56:08Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:375]: [e5dad442-c1e9-44c9-bea9-ab0c660e69a7][3/3] delete artifact trash record from database: 1, library/hello-world, sha256:ec06ff94ef8731492058cbe21bc15fb87ec0b98afc20961955200e7e70203c67
122+
2025-07-21T07:56:08Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:393]: [e5dad442-c1e9-44c9-bea9-ab0c660e69a7][3/3] delete blob from storage: sha256:ec06ff94ef8731492058cbe21bc15fb87ec0b98afc20961955200e7e70203c67
123+
2025-07-21T07:56:08Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:422]: [e5dad442-c1e9-44c9-bea9-ab0c660e69a7][3/3] delete blob record from database: 3, sha256:ec06ff94ef8731492058cbe21bc15fb87ec0b98afc20961955200e7e70203c67
124+
2025-07-21T07:56:08Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:451]: 2 blobs and 1 manifests are actually deleted
125+
126+
## Side Effects
127+
128+
Tags already persisted in backend storage before enabling this feature will remain. These can be considered orphaned tag links.
129+
130+
The existence of such orphaned tag files in the backend is harmless but may lead to minor disk clutter.
131+
132+
## Future Work
133+
134+
Code impletation bases on the proposed mentioned above.
135+
136+
Add a cleanup tool to remove orphaned tag link files (optional).
137+
138+
Benchmark GC performance in a real S3 environment before and after the change.

0 commit comments

Comments
 (0)