From dd14907a70c47c5d329d7062c3d945a80bcf8dc0 Mon Sep 17 00:00:00 2001 From: Erik Seliger Date: Tue, 13 May 2025 13:00:43 +0200 Subject: [PATCH] Cleanup documentation around symbols and repo-updater Those two services are removed from our architecture, and workloads are moved to worker and searcher. This PR removes (hopefully) all remaining references to those services in our docs. Test plan: Review. --- docs/admin/architecture.mdx | 6 +- docs/admin/code_hosts/rate_limits.mdx | 2 +- docs/admin/config/postgres-conf.mdx | 1 - docs/admin/config/private-network.mdx | 9 +- .../deploy/docker-compose/configuration.mdx | 4 +- .../deploy/docker-compose/operations.mdx | 6 -- docs/admin/deploy/kubernetes/configure.mdx | 20 +---- docs/admin/deploy/kubernetes/index.mdx | 2 - .../deploy/kubernetes/kustomize/migrate.mdx | 4 +- docs/admin/deploy/kubernetes/operations.mdx | 2 - docs/admin/deploy/kubernetes/scale.mdx | 4 +- docs/admin/deploy/kubernetes/troubleshoot.mdx | 20 ++--- docs/admin/deploy/scale.mdx | 85 ++----------------- docs/admin/how-to/monorepo-issues.mdx | 7 +- docs/admin/how-to/update_repo_failure.mdx | 2 +- docs/admin/observability/logs.mdx | 2 +- docs/admin/observability/troubleshooting.mdx | 10 +-- docs/admin/pprof.mdx | 2 - docs/admin/repo/add.mdx | 2 +- docs/admin/repo/update_frequency.mdx | 2 +- docs/admin/troubleshooting.mdx | 2 +- docs/code-search/code-navigation/rockskip.mdx | 28 +++--- .../search_based_code_navigation.mdx | 6 +- docs/code-search/types/symbol.mdx | 12 +-- public/llms.txt | 2 - 25 files changed, 65 insertions(+), 177 deletions(-) diff --git a/docs/admin/architecture.mdx b/docs/admin/architecture.mdx index af5eb56c8..2a02bad86 100644 --- a/docs/admin/architecture.mdx +++ b/docs/admin/architecture.mdx @@ -11,11 +11,11 @@ At its core, Sourcegraph maintains a persistent cache of all repositories that are connected to it. It is persistent because this data is critical for Sourcegraph to function. Still, it is ultimately a cache because the code host is the source of truth, and our cache is eventually consistent. - `gitserver` is the sharded service that stores repositories and makes them accessible to other Sourcegraph services -- `repo-updater` is the singleton service responsible for ensuring all repositories in gitserver are as up-to-date as possible while respecting code host rate limits. It is also responsible for syncing repository metadata from the code host that is stored in the repo table of our Postgres database +- `worker` is responsible for ensuring all repositories in gitserver are as up-to-date as possible while respecting code host rate limits. It is also responsible for syncing repository metadata from the code host that is stored in the repo table of our Postgres database ## Permission syncing -Repository permissions are mirrored from code hosts to Sourcegraph by default. This builds the foundation of Sourcegraph authorization for repositories to ensure users see consistent content on code hosts. Currently, the background permissions syncer resides in the repo-updater. +Repository permissions are mirrored from code hosts to Sourcegraph by default. This builds the foundation of Sourcegraph authorization for repositories to ensure users see consistent content on code hosts. Currently, the background permissions syncer resides in the `worker`. Learn more in the [Permission Syncing docs](/admin/permissions/syncing) @@ -94,7 +94,7 @@ You can learn more in the [Code Insights](/code_insights) docs. - Exhaustive search (with `count:all/count:999999` operator) - Historical search (= unindexed search, currently) - Commit search to find historical commits to search over -- Repository Syncing: The code insights backend has direct dependencies on `gitserver` and `repo-updater` +- Repository Syncing: The code insights backend has a direct dependency on `gitserver` - Permission syncing: The code insights backend depends on synced repository permissions for access control - Settings cascade: - Insights and dashboard configuration are stored in user, organization, and global settings. This will change in the future and is planned to be moved to the database diff --git a/docs/admin/code_hosts/rate_limits.mdx b/docs/admin/code_hosts/rate_limits.mdx index c93f14ba4..e7bd888a5 100644 --- a/docs/admin/code_hosts/rate_limits.mdx +++ b/docs/admin/code_hosts/rate_limits.mdx @@ -52,7 +52,7 @@ Requests to the configured code host will be staggered as to not exceed `"reques - For Sourcegraph `<=3.38`, if rate limiting is configured more than once for the same code host instance, the most restrictive limit will be used. - For Sourcegraph >=3.39, rate limiting should be enabled and configured for each individual code host connection. -To see the status of configured internal rate limits, visit **Site admin > Instrumentation > repo-updater > Rate Limiter State**. This page lists internal rate limits by code host, for example: +To see the status of configured internal rate limits, visit **Site admin > Instrumentation > worker > Rate Limiter State**. This page lists internal rate limits by code host, for example: ```json { diff --git a/docs/admin/config/postgres-conf.mdx b/docs/admin/config/postgres-conf.mdx index eb6b500c5..53de8efd2 100644 --- a/docs/admin/config/postgres-conf.mdx +++ b/docs/admin/config/postgres-conf.mdx @@ -53,7 +53,6 @@ The setting `max_connections` determines the number of active connections that c | --------------------------- | ------------------------------------------ | | `frontend` | `pgsql`, `codeintel-db`, `codeinsights-db` | | `gitserver` | `pgsql` | -| `repo-updater` | `pgsql` | | `precise-code-intel-worker` | `codeintel-db`, `pgsql` | | `worker` | `codeintel-db`, `pgsql`, `codeinsights-db` | diff --git a/docs/admin/config/private-network.mdx b/docs/admin/config/private-network.mdx index c2517daee..b0539bf63 100644 --- a/docs/admin/config/private-network.mdx +++ b/docs/admin/config/private-network.mdx @@ -19,7 +19,6 @@ services hosted within an organization's private network * Connecting to external [LLM providers](../../cody/capabilities/supported-models) with Cody - **gitserver**: Executes git commands against externally hosted [code hosts](../external_service) - **migrator**: Connects to Postgres instances (which may be [externally hosted](../external_services/postgres)) to process database migrations -- **repo-updater**: Communicates with [code hosts](../external_service) APIs to coordinate repository synchronization - **worker**: Sourcegraph [Worker](../workers) run various background jobs that may require establishing connections to services hosted within an organization's private network @@ -34,14 +33,14 @@ variables will depend on your Sourcegraph deployment method. Add the proxy environment variables to your Sourcegraph Helm chart [override file](https://github.com/sourcegraph/deploy-sourcegraph-helm/blob/main/charts/sourcegraph/values.yaml): ```yaml -executor|frontend|gitserver|migrator|repo-updater|worker: +executor|frontend|gitserver|migrator|worker: env: - name: HTTP_PROXY value: http://proxy.example.com:8080 - name: HTTPS_PROXY value: http://proxy.example.com:8080 - name: NO_PROXY - value: "blobstore,codeinsights-db,codeintel-db,sourcegraph-frontend-internal,sourcegraph-frontend,github-proxy,gitserver,grafana,indexed-search-indexer,indexed-search,jaeger-query,pgsql,precise-code-intel-worker,prometheus,redis-cache,redis-store,repo-updater,searcher,symbols,syntect-server,worker-executors,worker,cloud-sql-proxy,localhost,127.0.0.1,.svc,.svc.cluster.local,kubernetes.default.svc" + value: "blobstore,codeinsights-db,codeintel-db,sourcegraph-frontend-internal,sourcegraph-frontend,github-proxy,gitserver,grafana,indexed-search-indexer,indexed-search,jaeger-query,pgsql,precise-code-intel-worker,prometheus,redis-cache,redis-store,searcher,syntect-server,worker-executors,worker,cloud-sql-proxy,localhost,127.0.0.1,.svc,.svc.cluster.local,kubernetes.default.svc" ``` @@ -49,7 +48,7 @@ If the updated Sourcegraph pods fail to pass their readiness or health checks af ```yaml - name: NO_PROXY - value: "blobstore,codeinsights-db,codeintel-db,sourcegraph-frontend-internal,sourcegraph-frontend,github-proxy,gitserver,grafana,indexed-search-indexer,indexed-search,jaeger-query,pgsql,precise-code-intel-worker,prometheus,redis-cache,redis-store,repo-updater,searcher,symbols,syntect-server,worker-executors,worker,cloud-sql-proxy,localhost,127.0.0.1,.svc,.svc.cluster.local,kubernetes.default.svc,10.10.0.0/16,10.20.0.0/16" + value: "blobstore,codeinsights-db,codeintel-db,sourcegraph-frontend-internal,sourcegraph-frontend,github-proxy,gitserver,grafana,indexed-search-indexer,indexed-search,jaeger-query,pgsql,precise-code-intel-worker,prometheus,redis-cache,redis-store,searcher,syntect-server,worker-executors,worker,cloud-sql-proxy,localhost,127.0.0.1,.svc,.svc.cluster.local,kubernetes.default.svc,10.10.0.0/16,10.20.0.0/16" ``` @@ -62,7 +61,7 @@ services: environment: - HTTP_PROXY=http://proxy.example.com:8080 - HTTPS_PROXY=http://proxy.example.com:8080 - - NO_PROXY='blobstore,caddy,cadvisor,codeintel-db,codeintel-db-exporter,codeinsights-db,codeinsights-db-exporter,sourcegraph-frontend-0,sourcegraph-frontend-internal,gitserver-0,grafana,migrator,node-exporter,otel-collector,pgsql,pgsql-exporter,precise-code-intel-worker,prometheus,redis-cache,redis-store,repo-updater,searcher-0,symbols-0,syntect-server,worker,zoekt-indexserver-0,zoekt-webserver-0,localhost,127.0.0.1' + - NO_PROXY='blobstore,caddy,cadvisor,codeintel-db,codeintel-db-exporter,codeinsights-db,codeinsights-db-exporter,sourcegraph-frontend-0,sourcegraph-frontend-internal,gitserver-0,grafana,migrator,node-exporter,otel-collector,pgsql,pgsql-exporter,precise-code-intel-worker,prometheus,redis-cache,redis-store,searcher-0,syntect-server,worker,zoekt-indexserver-0,zoekt-webserver-0,localhost,127.0.0.1' ``` Failure to configure `NO_PROXY` correctly can cause the proxy configuration to interfere with diff --git a/docs/admin/deploy/docker-compose/configuration.mdx b/docs/admin/deploy/docker-compose/configuration.mdx index 2ff227f05..5c793b580 100644 --- a/docs/admin/deploy/docker-compose/configuration.mdx +++ b/docs/admin/deploy/docker-compose/configuration.mdx @@ -125,7 +125,7 @@ If you must use a `.netrc` file to store these credentials instead, follow the p ## Add replicas -When adding replicas for `gitserver`, `indexed-search`, `searcher`, or `symbols`, you must update the corresponding environment variable on each of the frontend services in your docker-compose.override.yaml file, `SRC_GIT_SERVERS`, `INDEXED_SEARCH_SERVERS`, `SEARCHER_URL`, and `SYMBOLS_URL` to the number of replicas for each respective service. Sourcegraph will then automatically infer the endpoints for each replica. +When adding replicas for `gitserver`, `indexed-search`, or `searcher`, you must update the corresponding environment variable on each of the frontend services in your docker-compose.override.yaml file, `SRC_GIT_SERVERS`, `INDEXED_SEARCH_SERVERS`, and `SEARCHER_URL` to the number of replicas for each respective service. Sourcegraph will then automatically infer the endpoints for each replica. ```yaml # docker-compose.override.yaml @@ -136,14 +136,12 @@ services: - 'SRC_GIT_SERVERS=2' - 'INDEXED_SEARCH_SERVERS=2' - 'SEARCHER_URL=1' - - 'SYMBOLS_URL=1' sourcegraph-frontend-internal: environment: - 'SRC_GIT_SERVERS=2' - 'INDEXED_SEARCH_SERVERS=2' - 'SEARCHER_URL=1' - - 'SYMBOLS_URL=1' ``` ## Shard gitserver diff --git a/docs/admin/deploy/docker-compose/operations.mdx b/docs/admin/deploy/docker-compose/operations.mdx index 54ba1aa00..87958c9bc 100644 --- a/docs/admin/deploy/docker-compose/operations.mdx +++ b/docs/admin/deploy/docker-compose/operations.mdx @@ -71,9 +71,7 @@ prometheus /bin/prom-wrapper Up query-runner /sbin/tini -- /usr/local/b ... Up redis-cache /sbin/tini -- redis-server ... Up 6379/tcp redis-store /sbin/tini -- redis-server ... Up 6379/tcp -repo-updater /sbin/tini -- /usr/local/b ... Up searcher-0 /sbin/tini -- /usr/local/b ... Up (healthy) -symbols-0 /sbin/tini -- /usr/local/b ... Up (healthy) 3184/tcp syntect-server sh -c /http-server-stabili ... Up (healthy) 9238/tcp worker /sbin/tini -- /usr/local/b ... Up 3189/tcp zoekt-indexserver-0 /sbin/tini -- zoekt-source ... Up @@ -151,9 +149,7 @@ prometheus /bin/prom-wrapper Up query-runner /sbin/tini -- /usr/local/b ... Up redis-cache /sbin/tini -- redis-server ... Up 6379/tcp redis-store /sbin/tini -- redis-server ... Up 6379/tcp -repo-updater /sbin/tini -- /usr/local/b ... Up searcher-0 /sbin/tini -- /usr/local/b ... Up (healthy) -symbols-0 /sbin/tini -- /usr/local/b ... Up (healthy) 3184/tcp syntect-server sh -c /http-server-stabili ... Up (healthy) 9238/tcp worker /sbin/tini -- /usr/local/b ... Up 3189/tcp zoekt-indexserver-0 /sbin/tini -- zoekt-source ... Up @@ -221,9 +217,7 @@ prometheus /bin/prom-wrapper Up query-runner /sbin/tini -- /usr/local/b ... Up redis-cache /sbin/tini -- redis-server ... Up 6379/tcp redis-store /sbin/tini -- redis-server ... Up 6379/tcp -repo-updater /sbin/tini -- /usr/local/b ... Up searcher-0 /sbin/tini -- /usr/local/b ... Up (healthy) -symbols-0 /sbin/tini -- /usr/local/b ... Up (healthy) 3184/tcp syntect-server sh -c /http-server-stabili ... Up (healthy) 9238/tcp worker /sbin/tini -- /usr/local/b ... Up 3189/tcp zoekt-indexserver-0 /sbin/tini -- zoekt-source ... Up diff --git a/docs/admin/deploy/kubernetes/configure.mdx b/docs/admin/deploy/kubernetes/configure.mdx index c66ec24ec..9e7bc80af 100644 --- a/docs/admin/deploy/kubernetes/configure.mdx +++ b/docs/admin/deploy/kubernetes/configure.mdx @@ -994,7 +994,7 @@ patches: You can update environment variables for **searcher** with `patches`. -For example, to update the value for `SEARCHER_CACHE_SIZE_MB`: +For example, to update the value for `SEARCHER_CACHE_SIZE_MB` and `SEARCHER_CACHE_SIZE_MB`: ```yaml # instances/$INSTANCE_NAME/kustomization.yaml @@ -1008,21 +1008,6 @@ For example, to update the value for `SEARCHER_CACHE_SIZE_MB`: value: name: SEARCHER_CACHE_SIZE_MB value: "50000" -``` - -### Symbols - -You can update environment variables for **searcher** with `patches`. - -For example, to update the value for `SYMBOLS_CACHE_SIZE_MB`: - -```yaml -# instances/$INSTANCE_NAME/kustomization.yaml - patches: - - target: - name: symbols - kind: StatefulSet|Deployment - patch: |- - op: replace path: /spec/template/spec/containers/0/env/0 value: @@ -1098,12 +1083,9 @@ Sourcegraph supports specifying an external Redis server with these environment When using an external Redis server, the corresponding environment variable must also be added to the following services: - - `sourcegraph-frontend` -- `repo-updater` - `gitserver` - `searcher` -- `symbols` - `worker` **Step 1**: Include the `services/redis` component in your components: diff --git a/docs/admin/deploy/kubernetes/index.mdx b/docs/admin/deploy/kubernetes/index.mdx index b1b1b1dd9..ef40ab5a8 100644 --- a/docs/admin/deploy/kubernetes/index.mdx +++ b/docs/admin/deploy/kubernetes/index.mdx @@ -944,11 +944,9 @@ Scale down `deployments` and `statefulSets` that access the database, _this step The following services must have their replicas scaled to 0: - Deployments (e.g., `kubectl scale deployment --replicas=0`) - precise-code-intel-worker -- repo-updater - searcher - sourcegraph-frontend - sourcegraph-frontend-internal -- symbols - worker - Stateful sets (e.g., `kubectl scale sts --replicas=0`): - gitserver diff --git a/docs/admin/deploy/kubernetes/kustomize/migrate.mdx b/docs/admin/deploy/kubernetes/kustomize/migrate.mdx index fdccce108..597a23ef6 100644 --- a/docs/admin/deploy/kubernetes/kustomize/migrate.mdx +++ b/docs/admin/deploy/kubernetes/kustomize/migrate.mdx @@ -15,7 +15,7 @@ Here are the benefits of the new base cluster with the new Kustomize setup compa - Streamlined resource allocation process: * Allocates resources based on the size of the instance * Optimized through load testing - * The searcher and symbols use StatefulSets and do not require ephemeral storage + * The searcher StatefulSet does not require ephemeral storage - Utilizes the Kubernetes-native tool Kustomize: * Built into kubectl * No additional scripting required @@ -192,6 +192,8 @@ If your instance was deployed using the non-privileged overlay, you can follow t ## Step 9: Build and review new manifests +> NOTE: Symbols has been removed in Sourcegraph 6.4. + `pgsql`, `codeinsights-db`, `searcher`, `symbols`, and `codeintel-db` have been changed from `Deployments` to `StatefulSets`. However, redeploying these services as StatefulSets should not affect your existing deployment as they are all configured to use the same PVCs. ### From Deployment to StatefulSet diff --git a/docs/admin/deploy/kubernetes/operations.mdx b/docs/admin/deploy/kubernetes/operations.mdx index ef345578a..cdbe52a2b 100644 --- a/docs/admin/deploy/kubernetes/operations.mdx +++ b/docs/admin/deploy/kubernetes/operations.mdx @@ -429,11 +429,9 @@ precise-code-intel-worker ClusterIP 10.72.11.102 3188/TC prometheus ClusterIP 10.72.12.201 30090/TCP 25h redis-cache ClusterIP 10.72.15.138 6379/TCP,9121/TCP 25h redis-store ClusterIP 10.72.4.162 6379/TCP,9121/TCP 25h -repo-updater ClusterIP 10.72.11.176 3182/TCP,6060/TCP 25h searcher ClusterIP None 3181/TCP,6060/TCP 23h sourcegraph-frontend ClusterIP 10.72.12.103 30080/TCP,6060/TCP 25h sourcegraph-frontend-internal ClusterIP 10.72.9.155 80/TCP 25h -symbols ClusterIP None 3184/TCP,6060/TCP 23h syntect-server ClusterIP 10.72.14.49 9238/TCP,6060/TCP 25h worker ClusterIP 10.72.7.72 3189/TCP,6060/TCP 25h ``` diff --git a/docs/admin/deploy/kubernetes/scale.mdx b/docs/admin/deploy/kubernetes/scale.mdx index cd7d16580..35093cb62 100644 --- a/docs/admin/deploy/kubernetes/scale.mdx +++ b/docs/admin/deploy/kubernetes/scale.mdx @@ -15,10 +15,9 @@ For production environments, we recommend allocate resources based on your [inst Here is a simplified list of the key parameters to tune when scaling Sourcegraph to many repositories: - `sourcegraph-frontend` CPU/memory resource allocations -- `searcher` replica count +- `searcher` replica count and CPU/memory resource allocations - `indexedSearch` replica count and CPU/memory resource allocations - `gitserver` replica count -- `symbols` replica count and CPU/memory resource allocations - `gitMaxConcurrentClones`, because `git clone` and `git fetch` operations are IO and CPU-intensive - `repoListUpdateInterval` (in minutes), because each interval triggers `git fetch` operations for all repositories @@ -38,7 +37,6 @@ Here is a simplified list of key parameters to tune when scaling Sourcegraph to - `sourcegraph-frontend` CPU/memory resource allocations - `searcher` CPU/memory resource allocations (allocate enough memory to hold all non-binary files in your repositories) - `indexedSearch` CPU/memory resource allocations (for the `zoekt-indexserver` pod, allocate enough memory to hold all non-binary files in your largest repository; for the `zoekt-webserver` pod, allocate enough memory to hold ~2.7x the size of all non-binary files in your repositories) -- `symbols` CPU/memory resource allocations - `gitserver` CPU/memory resource allocations (allocate enough memory to hold your Git packed bare repositories) --- diff --git a/docs/admin/deploy/kubernetes/troubleshoot.mdx b/docs/admin/deploy/kubernetes/troubleshoot.mdx index e7cd61940..c1f721ffa 100644 --- a/docs/admin/deploy/kubernetes/troubleshoot.mdx +++ b/docs/admin/deploy/kubernetes/troubleshoot.mdx @@ -126,34 +126,34 @@ This error occurs because Envoy, the proxy used by Istio, [drops proxied trailer In a service mesh like Istio, communication between services is secured using a feature called mutual Transport Layer Security (mTLS). mTLS relies on services communicating with each other using DNS names, rather than IP addresses, to identify the specific services or pods that the communication is intended for. -To illustrate this, consider the following examples of communication flows between the "frontend" component and the "symbols" component: +To illustrate this, consider the following examples of communication flows between the "frontend" component and the "searcher" component: Example 1: Approved Communication Flow -1. Frontend sends a request to `http://symbol_pod_ip:3184` +1. Frontend sends a request to `http://searcher_pod_ip:3184` 2. The Envoy sidecar intercepts the request -3. Envoy looks up the upstream service using the DNS name "symbols" -4. Envoy forwards the request to the symbols component +3. Envoy looks up the upstream service using the DNS name "searcher" +4. Envoy forwards the request to the searcher component Example 2: Disapproved Communication Flow -1. Frontend sends a request to `http://symbol_pod_ip:3184` +1. Frontend sends a request to `http://searcher_pod_ip:3184` 2. The Envoy sidecar intercepts the request -3. Envoy tries to look up the upstream service using the IP address `symbol_pod_ip` +3. Envoy tries to look up the upstream service using the IP address `searcher_pod_ip` 4. Envoy is unable to find the upstream service because it's an IP address not a DNS name -5. Envoy will not forward the request to the symbols component +5. Envoy will not forward the request to the searcher component > NOTE: When using mTLS, communication between services must be made using the DNS names of the services, rather than their IP addresses. This is to ensure that the service mesh can properly identify and secure the communication. -To resolve this issue, the solution is to redeploy the frontend after specifying the service address for symbols by setting the SYMBOLS_URL environment variable in frontend. +To resolve this issue, the solution is to redeploy the frontend after specifying the service address for searcher by setting the SEARCHER_URL environment variable in frontend. Please make sure the old frontend pods are removed. ```yaml -SYMBOLS_URL=http:symbols:3184 +SEARCHER_URL=http:searcher:3184 ``` -> WARNING: **This option is recommended only for symbols with a single replica**. Enabling this option will negatively impact the performance of the symbols service when it has multiple replicas, as it will no longer be able to distribute requests by repository/commit. +> WARNING: **This option is recommended only for searcher with a single replica**. Enabling this option will negatively impact the performance of the searcher service when it has multiple replicas, as it will no longer be able to distribute requests by repository/commit. #### Squirrel.LocalCodeIntel http status 502 diff --git a/docs/admin/deploy/scale.mdx b/docs/admin/deploy/scale.mdx index 0d674b73d..47a0a93d1 100644 --- a/docs/admin/deploy/scale.mdx +++ b/docs/admin/deploy/scale.mdx @@ -26,9 +26,7 @@ Here is a list of components you can find in a typical Sourcegraph deployment: | [`frontend`](/admin/deploy/scale#frontend) | Serves the web application, extensions, and graphQL services. Almost every service has a link back to the frontend, from which it gathers configuration updates. | | [`gitserver`](/admin/deploy/scale#gitserver) | Mirrors repositories from their code host. All other Sourcegraph services talk to gitserver when they need data from git. | | [`precise-code-intel`](/admin/deploy/scale#precise-code-intel) | Converts LSIF upload file into Postgres data. The entire index must be read into memory to be correlated. | -| [`repo-updater`](/admin/deploy/scale#repo-updater) | Tracks the state of repositories. It is responsible for automatically scheduling updates using gitserver and for synchronizing metadata between code hosts and external services. | -| [`searcher`](/admin/deploy/scale#searcher) | Provides on-demand un-indexed search for repositories. It fetches archives from gitserver and searches them with regexp. | -| [`symbols`](/admin/deploy/scale#symbols) | Indexes symbols in repositories using Ctags. | +| [`searcher`](/admin/deploy/scale#searcher) | Provides on-demand un-indexed search for repositories. It fetches archives from gitserver and searches them with regexp. Indexes symbols in repositories using Ctags. | | [`syntect-server`](/admin/deploy/scale#syntect-server) | An HTTP server that exposes the Rust Syntect syntax highlighting library for use by other services. | | [`worker`](/admin/deploy/scale#worker) | Runs a collection of background jobs periodically in response to internal requests and external events. It is currently janitorial and commit based. | | [`zoekt-indexserver`](/admin/deploy/scale#zoekt-indexserver) | Indexes all enabled repositories on Sourcegraph and keeps the indexes up to date. Lives inside the indexed-search pod in a Kubernetes deployment. | @@ -480,48 +478,12 @@ A Redis instance for storing short-term information such as user sessions. --- -### repo-updater - -``` -Repo-updater tracks the state of repositories. -It is responsible for automatically scheduling updates using gitserver. -It is also responsible for synchronizing metadata between code hosts and external services. -Services that desire updates or fetch must communicate with repo-updater instead of gitserver. -``` - -| Replica | | -|:------------|:--------------------------------------------------------| -| `Overview` | Singleton | -| `Factors` | - | -| `Guideline` | A Singleton service should not have more than 1 replica | - -| CPU | | -|:------------|:-----------------------------------------------------------------------------------------| -| `Overview` | Most operations are not CPU bound | -| `Factors` | Most of the syncing jobs are related more to internal and code host-specific rate limits | -| `Guideline` | - | - -| Memory | | -|:------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `Overview` | The queue of repositories that need to be updated is stored in memory. It uses an in-memory queue and is mostly network intensive as it makes API calls and processes and writes those newly available data to the pgsql database | -| `Factors` | Number of repositories | -| `Guideline` | This service is safe to restart at any time. The existing in-memory update queue is reset upon restart | -| | Not memory intensive | - -| Storage | | -|:------------|:---------------------------------------------------------------| -| `Overview` | A stateless service that directly writes to the pgsql database | -| `Factors` | - | -| `Guideline` | - | -| `Type` | None | - ---- - ### searcher ``` Provides on-demand unindexed search for repositories. -It relies on the OS file page cache to speed up future searches +It relies on the OS file page cache to speed up future searches. +Also the backend for symbols operations, indexing symbols in repositories using Ctags. ``` | Replica | | @@ -535,7 +497,7 @@ It relies on the OS file page cache to speed up future searches |:------------|:-----------------------------------------------------------------------------------------------| | `Overview` | Searcher is IO and CPU bound. It fetches archives from gitserver and searches them with regexp | | `Factors` | Number of active users | -| `Guideline` | More engaged users = more CPU | +| `Guideline` | More engaged users = more CPU. Scale with the size of repositories as well | | Memory | | |:------------|:----------------------------------------------------------------------| @@ -553,50 +515,13 @@ It relies on the OS file page cache to speed up future searches | | The most important thing is to ensure fast IO for storage | | | Add more disks or replicas if you have lots of unindexed searches | | | More disk space will help speed up future caches | +| | At least 20% more than the size of your largest repository | | `Type` | Ephemeral storage for Kubernetes deployments | | | The request size of the ephemeral storage is used as a limit for the zip cache | | | Non-persistent SSD for Docker Compose | For example, if you search all branches on all repositories, that translates into lots of concurrent unindexed requests. ---- - -### symbols - -``` -The backend for symbols operations. -Indexes symbols in repositories using Ctags. -``` - -| Replica | | -|:------------|:----------------------------------------------------------------------------| -| `Overview` | Process unindexed search | -| `Factors` | Number of active users | -| `Guideline` | More requests for distinct commits in different repositories = more replica | - - -| CPU | | -|:------------|:---------------------------------------------------------------------------------------| -| `Overview` | Runs Ctags over code, stores symbol data in SQLite (or codeintel-db if using Rockskip) | -| `Factors` | Size of all repositories | -| `Guideline` | Scale with the size of repositories | - - -| Memory | | -|:------------|:--------------------------------------------------------------------| -| `Overview` | Stores symbol data in SQLite and/or Postgres if Rockskip is enabled | -| `Factors` | Size of all repositories | -| `Guideline` | Scale with the size of repositories | - -| Storage | | -|:------------|:---------------------------------------------------------------------------------------------------------------------| -| `Overview` | Saves SQLite DBs as files on disk in LRU fashion and copies an old one to a new file when a user visits a new commit | -| `Factors` | Size of the largest repository | -| `Guideline` | At least 20% more than the size of your largest repository | -| | Using SSD is highly preferred | -| `Type` | Ephemeral storage for Kubernetes deployments | -| | Non-persistent SSD for Docker Compose | - If Rockskip is enabled, the symbols for repositories indexed by [Rockskip](/code-search/code-navigation/rockskip) are stored in codeintel-db instead. --- diff --git a/docs/admin/how-to/monorepo-issues.mdx b/docs/admin/how-to/monorepo-issues.mdx index d142b199f..76cfca301 100644 --- a/docs/admin/how-to/monorepo-issues.mdx +++ b/docs/admin/how-to/monorepo-issues.mdx @@ -9,18 +9,17 @@ The following bullets provide a general guidline to which service may require mo * `sourcegraph-frontend` CPU/memory resource allocations * `searcher` CPU/memory resource allocations (allocate enough memory to hold all non-binary files in your repositories) * `indexedSearch` CPU/memory resource allocations (for the `zoekt-indexserver` pod, allocate enough memory to hold all non-binary files in your largest repository; for the `zoekt-webserver` pod, allocate enough memory to hold ~2.7x the size of all non-binary files in your repositories) -* `symbols` CPU/memory resource allocations * `gitserver` CPU/memory resource allocations (allocate enough memory to hold your Git packed bare repositories) ## Symbols sidebar - Processing symbols ![Screen Shot 2021-11-15 at 12 35 07 AM](https://user-images.githubusercontent.com/13024338/141749036-95759cbe-abd5-4d78-91eb-618423d2f66c.png) -If you are regularly seeing the `Processing symbols is taking longer than expected. Try again in a while` warning in your sidebar, its likely that your symbols and/or gitserver services are underprovisioned and need more CPU/mem resources. +If you are regularly seeing the `Processing symbols is taking longer than expected. Try again in a while` warning in your sidebar, its likely that your searcher and/or gitserver services are underprovisioned and need more CPU/mem resources. -The [symbols sidebar](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/client/web/src/repo/RepoRevisionSidebarSymbols.tsx?L42) is dependent on the symbols and gitserver services. Upon opening the symbols sidebar, a search query is made to the GraphQL API to retrieve the symbols associated with the current git commit. You can read more about the [symbol search behavior and performance](/code-search/types/symbol#symbol-search-behavior-and-performance). +The [symbols sidebar](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/client/web/src/repo/RepoRevisionSidebarSymbols.tsx?L42) is dependent on the searcher and gitserver services. Upon opening the symbols sidebar, a search query is made to the GraphQL API to retrieve the symbols associated with the current git commit. You can read more about the [symbol search behavior and performance](/code-search/types/symbol#symbol-search-behavior-and-performance). -To address this concern, allocate more resources to the symbols service (to provide more processing power for indexing operations) and allocate more resources to the gitserver (to provide for the extra load associated with responding to fetch requests from symbols, and speed up sending the large repo). +To address this concern, allocate more resources to the searcher service (to provide more processing power for indexing operations) and allocate more resources to the gitserver (to provide for the extra load associated with responding to fetch requests from searcher, and speed up sending the large repo). Here's an example of a diff to improve symbols performance in a k8s deployment: diff --git a/docs/admin/how-to/update_repo_failure.mdx b/docs/admin/how-to/update_repo_failure.mdx index 73ac717e7..2a70615ca 100644 --- a/docs/admin/how-to/update_repo_failure.mdx +++ b/docs/admin/how-to/update_repo_failure.mdx @@ -12,7 +12,7 @@ External service updated, but we encountered a problem while validating the exte syncExternalService for service "GITHUB" with ID 15:context deadline exceeded ``` ## Troubleshooting Steps -You check logs from the Repo-Updater container and you should find the following below: +You check logs from the `worker` container and you should find the following below: ```status 401: Bad credentials``` ## Resolution diff --git a/docs/admin/observability/logs.mdx b/docs/admin/observability/logs.mdx index e2b031c5c..b69e86a49 100644 --- a/docs/admin/observability/logs.mdx +++ b/docs/admin/observability/logs.mdx @@ -62,7 +62,7 @@ We also include the following non-OpenTelemetry fields: A Sourcegraph service's log level can be configured for a specific `InstrumentationScope` and it's children. For example you can keep your log level at error, but turn on debug logs for a specific component. This is only used to increase verbosity. IE it can't be used to mute a scope. -This is configured by the environment variable `SRC_LOG_SCOPE_LEVEL`. It has the format `SCOPE_0=LEVEL_0,SCOPE_1=LEVEL_1,...`. For example to turn on debug logs for `service.UpdateScheduler` and `repoPurgeWorker` you would set the following on the `repo-updater` service: +This is configured by the environment variable `SRC_LOG_SCOPE_LEVEL`. It has the format `SCOPE_0=LEVEL_0,SCOPE_1=LEVEL_1,...`. For example to turn on debug logs for `service.UpdateScheduler` and `repoPurgeWorker` you would set the following on the `worker` service: ``` SRC_LOG_SCOPE_LEVEL=service.UpdateScheduler=debug,repoPurgeWorker=debug diff --git a/docs/admin/observability/troubleshooting.mdx b/docs/admin/observability/troubleshooting.mdx index d21bc7f4d..3a3aa15cc 100644 --- a/docs/admin/observability/troubleshooting.mdx +++ b/docs/admin/observability/troubleshooting.mdx @@ -40,8 +40,8 @@ environment. #### Scenario: no cloning, syncing, updating or deleting is happening Observed state: Sourcegraph instance does not react to any updates to code hosts and no cloning is happening. -The cause of this state could be repo-updater queries that are too large for the limits of the running Postgres DB. -One symptom is seeing a line like the one below in the repo-updater logs: +The cause of this state could be worker queries that are too large for the limits of the running Postgres DB. +One symptom is seeing a line like the one below in the worker logs: ```text t=2020-05-28T18:41:02+0000 lvl=eror msg=Syncer error="syncer.sync.store.upsert-repos: delete: driver: bad connection @@ -50,10 +50,10 @@ t=2020-05-28T18:41:02+0000 lvl=eror msg=Syncer error="syncer.sync.store.upsert-r or seeing the same error in the "Code host status panel" (Clicking the cloud icon). The fix is to increase the memory on Postgres DB which will increase certain Postgres-internal limits and will allow -the queries from repo-updater to go through. +the queries from worker to go through. -Another cause could be that the `repo-updater` is in a crash loop for some reason. If there are large numbers of repos -to be updated it could be from `Out of memory` errors. A fix here is to increase the memory for `repo-updater` instead. +Another cause could be that the `worker` is in a crash loop for some reason. If there are large numbers of repos +to be updated it could be from `Out of memory` errors. A fix here is to increase the memory for `worker` instead. ## General scenarios diff --git a/docs/admin/pprof.mdx b/docs/admin/pprof.mdx index 390c53d80..1b8245419 100644 --- a/docs/admin/pprof.mdx +++ b/docs/admin/pprof.mdx @@ -62,8 +62,6 @@ This is a table of Sourcegraph backend debug ports in the two deployment context | frontend | 6060 | 6063 | | gitserver | 6060 | 6068 | | searcher | 6060 | 6069 | -| symbols | 6060 | 6071 | -| repo-updater | 6060 | 6074 | | zoekt-indexserver | 6060 | 6072 | | zoekt-webserver | 6060 | 3070 | diff --git a/docs/admin/repo/add.mdx b/docs/admin/repo/add.mdx index c8fa860e8..e795c441d 100644 --- a/docs/admin/repo/add.mdx +++ b/docs/admin/repo/add.mdx @@ -22,7 +22,7 @@ If your repositories are not showing up, check the site admin **Repositories** page on Sourcegraph (and ensure you're logged in as an admin). If nothing informative is visible there, check for error messages related to communication with your code host's API in the logs from: -- [Docker Compose](/admin/deploy/docker-compose/) and [Kubernetes](/admin/deploy/kubernetes/): the logs from the `repo-updater` container/pod +- [Docker Compose](/admin/deploy/docker-compose/) and [Kubernetes](/admin/deploy/kubernetes/): the logs from the `worker` container/pod - [Single-container](/admin/deploy/docker-single-container/): the `sourcegraph/server` Docker container ### Repository not cloning or updating diff --git a/docs/admin/repo/update_frequency.mdx b/docs/admin/repo/update_frequency.mdx index 74bbca485..bef579963 100644 --- a/docs/admin/repo/update_frequency.mdx +++ b/docs/admin/repo/update_frequency.mdx @@ -36,4 +36,4 @@ You may also choose to disable automatic Git updates entirely and instead [confi - **Update Queue**: A priority queue of repositories to update. A worker continuously dequeues them and sends updates to gitserver. - **Sync jobs**: The current list of external service sync jobs, ordered by start date descending -Site admin: Go to **Site admin > Instrumentation (under Maintenance) > repo-updater > Repo Updater State** +Site admin: Go to **Site admin > Instrumentation (under Maintenance) > worker > Repo Updater State** diff --git a/docs/admin/troubleshooting.mdx b/docs/admin/troubleshooting.mdx index a481c827a..acf0b0f16 100644 --- a/docs/admin/troubleshooting.mdx +++ b/docs/admin/troubleshooting.mdx @@ -131,7 +131,7 @@ If you can get repository results when you explicitly include `repo:{your reposi - The repository is a fork repository (excluded from search results by default) and `fork:yes` is not specified in the search query. - The repository is an archived repository (excluded from search results by default) and `archived:yes` is not specified in the search query. -- There is an issue indexing the repository: check the logs of repo-updater and/or search-indexer. +- There is an issue indexing the repository: check the logs of worker and/or search-indexer. - The search index is unavailable for some reason: try the search query `repo: index:only`. If it returns no results, the repository has not been indexed. ### Sourcegraph is making unauthorized requests to the git server diff --git a/docs/code-search/code-navigation/rockskip.mdx b/docs/code-search/code-navigation/rockskip.mdx index d20564f32..e35289a73 100644 --- a/docs/code-search/code-navigation/rockskip.mdx +++ b/docs/code-search/code-navigation/rockskip.mdx @@ -18,13 +18,13 @@ You can always try Rockskip for a while and if it doesn't help then you can disa ## How do I enable Rockskip? -**Step 1:** Set environment variables on the `symbols` container: +**Step 1:** Set environment variables on the `searcher` container: For Docker Compose: ```yaml services: - symbols-0: + searcher-0: environment: # Enables Rockskip - USE_ROCKSKIP=true @@ -36,7 +36,7 @@ For Helm: ```yaml # overrides.yaml -symbols: +searcher: env: # Enables Rockskip USE_ROCKSKIP: @@ -49,12 +49,12 @@ symbols: For Kubernetes: ```yaml -# base/symbols/symbols.Deployment.yaml +# base/searcher/searcher.Deployment.yaml spec: template: spec: containers: - - name: symbols + - name: searcher env: # Enables Rockskip - name: USE_ROCKSKIP @@ -66,8 +66,8 @@ spec: For all deployments, make sure that: -- The `symbols` service has access to the codeintel DB -- The `symbols` service has the environment variables set +- The `searcher` service has access to the codeintel DB +- The `searcher` service has the environment variables set - The `codeintel-db` has a few extra GB of RAM **Step 2:** Kick off indexing @@ -98,21 +98,21 @@ Rockskip heavily relies on gitserver for data. Rockskip issues very long-running The easiest way to check the status of a single repository is to open the symbols sidebar and wait 5s for an error message to appear with the estimated time remaining. -For more info, the symbols container responds to GET requests on the `localhost:3184/status` endpoint with the following info: +For more info, the searcher container responds to GET requests on the `localhost:3184/status` endpoint with the following info: - Repository count - Size of the symbols table in Postgres - Most recently searched repositories - List of in-flight indexing and search requests -For Kubernetes, find the symbols pod and `exec` a `curl` command in it: +For Kubernetes, find the searcher pod and `exec` a `curl` command in it: ``` yaml -$ kubectl get pods | grep symbols -symbols-5ff7c67b57-mr5h4 +$ kubectl get pods | grep searcher +searcher-5ff7c67b57-mr5h4 -$ kubectl exec -ti symbols-5ff7c67b57-mr5h4 -- curl localhost:3184/status -This is the symbols service status page. +$ kubectl exec -ti searcher-5ff7c67b57-mr5h4 -- curl localhost:3184/status +This is the searcher service status page. Number of repositories: 1 Size of symbols table: 3253 MB @@ -127,7 +127,7 @@ progress 9.53% (indexed 49151 of 515574 commits), 36h55m18.227079912s remaining Tasks (14006.77s total, current AppendHop+): AppendHop+ 44.76% 49152x, InsertSymbol 18.67% 1997101x, AppendHop- 12.94% 49151x, UpdateSymbolHops 7.78% 825380x, parse 4.01% 369401x, GetCommitByHash 2.73% 515574x, get hops 2.39% 49152x, ArchiveEach 2.26% 98302x, GetSymbol 1.83% 325351x, CommitTx 1.26% 49151x, DeleteRedundant 0.79% 49151x, InsertCommit 0.30% 49152x, Log 0.28% 1x, RevList 0.00% 1x, iLock 0.00% 1x, idle 0.00% 1x, holding iLock ``` -In this example you can see there's 1 repository and the symbols service has indexed 9% of all commits with an ETA of 36H from now. There's also a breakdown of tasks that are part of Rockskip's internal workings mostly for Sourcegraph engineers, so you can ignore that. +In this example you can see there's 1 repository and the searcher service has indexed 9% of all commits with an ETA of 36H from now. There's also a breakdown of tasks that are part of Rockskip's internal workings mostly for Sourcegraph engineers, so you can ignore that. ## When is indexing triggered? diff --git a/docs/code-search/code-navigation/search_based_code_navigation.mdx b/docs/code-search/code-navigation/search_based_code_navigation.mdx index 1916a46c4..cd4d30b1b 100644 --- a/docs/code-search/code-navigation/search_based_code_navigation.mdx +++ b/docs/code-search/code-navigation/search_based_code_navigation.mdx @@ -23,15 +23,15 @@ Search-based Code Navigation also filters results by file extension and by impor ## What configuration settings can I apply? -The symbols container recognizes these environment variables: +The searcher container recognizes these environment variables: -| **Env Vars** | **Deafult** | **Details** | +| **Env Vars** | **Default** | **Details** | | ---------------------------- | ------------------------------------- | ------------------------------------------------------------------------------------------------- | | `CTAGS_COMMAND` | `universal-ctags` | Ctags command (should point to universal-ctags executable compiled with JSON and seccomp support) | | `CTAGS_PATTERN_LENGTH_LIMIT` | `250` | The maximum length of the patterns output by ctags | | `LOG_CTAGS_ERRORS` | `false` | Log ctags errors | | `SANITY_CHECK` | `false` | Check that go-sqlite3 works then exit 0 if it's ok or 1 if not | -| `SYMBOLS_CACHE_DIR` | `/tmp/symbols-cache` | Directory in which to store cached symbols | +| `CACHE_DIR` | `/tmp/symbols-cache` | Directory in which to store cached symbols | | `SYMBOLS_CACHE_SIZE_MB` | `100000` | Maximum size of the disk cache (in megabytes) | | `CTAGS_PROCESSES` | `strconv.Itoa(runtime.GOMAXPROCS(0))` | Number of concurrent parser processes to run | | `REQUEST_BUFFER_SIZE` | `8192` | Maximum size of buffered parser request channel | diff --git a/docs/code-search/types/symbol.mdx b/docs/code-search/types/symbol.mdx index 8a3e07388..ad7890dc2 100644 --- a/docs/code-search/types/symbol.mdx +++ b/docs/code-search/types/symbol.mdx @@ -21,13 +21,13 @@ The extracted `ctags` symbols are also used for the symbol sidebar, which catego Here is the query path for symbol searches: -- **Zoekt**: if [indexed search](/admin/search#indexed-search) is enabled and the search is for the tip commit of an indexed branch, then Zoekt will service the query and it should respond quickly. Zoekt indexes the default branch (usually `master` or `main`) and can be configured for [multi-branch indexing](/code-search/features#multi-branch-indexing-experimental). The high commit frequency of monorepos reduces the likelihood that Zoekt will be able to respond to symbol searches. Zoekt **eagerly** indexes by listening to repository updates, whereas the symbols service **lazily** indexes the commit being searched. -- **Symbols service with Rockskip enabled**: if [Rockskip](/code-search/code-navigation/rockskip) is enabled, it'll search for symbols stored in Postgres. After initial indexing, queries should be resolved quickly. -- **Symbols service with an index for the commit**: if the symbols service has already indexed this commit (i.e. someone has visited the commit before) then the query should be resolved quickly. Indexes are deleted in LRU fashion to remain under the configured maximum disk usage which [defaults to 100GB](/code-search/code-navigation/search_based_code_navigation#what-configuration-settings-can-i-apply). -- **Symbols service with an index for a different commit**: if the symbols service has already indexed a **different** commit in the same repository, then it will make a copy of the previous index on disk then run [ctags](https://github.com/universal-ctags/ctags#readme) on the files that changed between the two commits and update the symbols in the new index. This process takes roughly 20 seconds on a monorepo with 40M LOC and 400K files. -- **Symbols service without any indexes (cold start)**: if the symbols service has never seen this repository before, then it needs to run ctags on all symbols and construct the index from scratch. This process takes roughly 20 minutes on a monorepo with 40M LOC and 400K files. +- **Zoekt**: if [indexed search](/admin/search#indexed-search) is enabled and the search is for the tip commit of an indexed branch, then Zoekt will service the query and it should respond quickly. Zoekt indexes the default branch (usually `master` or `main`) and can be configured for [multi-branch indexing](/code-search/features#multi-branch-indexing-experimental). The high commit frequency of monorepos reduces the likelihood that Zoekt will be able to respond to symbol searches. Zoekt **eagerly** indexes by listening to repository updates, whereas the searcher service **lazily** indexes the commit being searched. +- **Searcher service with Rockskip enabled**: if [Rockskip](/code-search/code-navigation/rockskip) is enabled, it'll search for symbols stored in Postgres. After initial indexing, queries should be resolved quickly. +- **Searcher service with an index for the commit**: if the searcher service has already indexed this commit (i.e. someone has visited the commit before) then the query should be resolved quickly. Indexes are deleted in LRU fashion to remain under the configured maximum disk usage which [defaults to 100GB](/code-search/code-navigation/search_based_code_navigation#what-configuration-settings-can-i-apply). +- **Searcher service with an index for a different commit**: if the searcher service has already indexed a **different** commit in the same repository, then it will make a copy of the previous index on disk then run [ctags](https://github.com/universal-ctags/ctags#readme) on the files that changed between the two commits and update the symbols in the new index. This process takes roughly 20 seconds on a monorepo with 40M LOC and 400K files. +- **Searcher service without any indexes (cold start)**: if the searcher service has never seen this repository before, then it needs to run ctags on all symbols and construct the index from scratch. This process takes roughly 20 minutes on a monorepo with 40M LOC and 400K files. -Once the symbols service has built an index for a commit, here's the query performance: +Once the searcher service has built an index for a commit, here's the query performance: - Exact matches `^foo$` are optimized to use an index - Prefix matches `^foo` are optimized to use an index diff --git a/public/llms.txt b/public/llms.txt index 23ba28f45..6ce773fb2 100644 --- a/public/llms.txt +++ b/public/llms.txt @@ -102813,9 +102813,7 @@ prometheus /bin/prom-wrapper Up query-runner /sbin/tini -- /usr/local/b ... Up redis-cache /sbin/tini -- redis-server ... Up 6379/tcp redis-store /sbin/tini -- redis-server ... Up 6379/tcp -repo-updater /sbin/tini -- /usr/local/b ... Up searcher-0 /sbin/tini -- /usr/local/b ... Up (healthy) -symbols-0 /sbin/tini -- /usr/local/b ... Up (healthy) 3184/tcp syntect-server sh -c /http-server-stabili ... Up (healthy) 9238/tcp worker /sbin/tini -- /usr/local/b ... Up 3189/tcp zoekt-indexserver-0 /sbin/tini -- zoekt-source ... Up