This repo was created for Sourcegraph Implementation Engineering deployments, and is not intended, designed, built, or supported for use in any other scenario. Feel free to open issues or PRs, but responses are best effort.
- Sourcegraph was built with Git-native support, but customers have a variety of version control systems
- Sourcegraph has integrated the p4-fusion FOSS project into the product to support Perforce more directly
- Other version control systems are left up to the customer to convert to Git
- This project builds a framework to convert repos from other VCSes to Git
- This project uses semantic versioning, where x.y.z is major.minor.patch
- The minor version is incremented if a breaking change has been made, usually to either:
- Configuration files
- Environment variables
- Storage locations on the host
For Sourcegraph Cloud customers, they'll need to run the repo-converter, src serve-git, and the Sourcegraph Cloud Private Access Agent on a container platform with connectivity to both their Sourcegraph Cloud instance, and their code hosts. This can be done quite securely, as the src serve-git API endpoint does not need any ports exposed outside of the container network Running src serve-git and the agent together on the same container network allows the agent to use the container platform's local DNS service to reach src serve-git, and prevents src serve-git's unauthenticated HTTP endpoint from needing to be opened outside of the container network.
For Self-hosted Sourcegraph customers, they'll need to run the repo-converter and src serve-git together in a location that can reach their code hosts, and their Sourcegraph instance can reach the src serve-git API.
The repo-converter and src serve-git containers need to share a storage volume, as repo-converter stores the cloned Git repos locally, and src serve-git serves them to the Sourcegraph instance from the same storage volume.
Deploying via containers allows for easier upgrades, troubleshooting, monitoring, logging, flexibility of hosting, etc. than running the binaries directly on the OS.
- Container host
- Docker Compose or Kubernetes
- Networking
- Has outbound connectivity to the code hosts
- Has outbound connectivity to the Sourcegraph Cloud instance (for Cloud customers)
- Has inbound connectivity from the Sourcegraph instance (for self-hosted customers)
- Storage
- Docker volume / Kubernetes persistent volume
- SSD with low latency random writes
- 2x the original repos' sum total size
- CPU
- The container runs a separate repo conversion process for each repo, in parallel, so maximum performance during the initial conversion process can be achieved with at least 1 thread or core for each repo in scope for conversion
- Repo conversion speed is more network-bound than CPU or memory
- Memory
- ~1 GB / repo to be converted in parallel
- Depends on the size of the largest commit, as
git svn fetch
seems to hold entire commits' contents in memory, and the number of parallel jobs
- Code host
- Subversion
- HTTP(S)
- Username and password for a user account that has read access to the needed repos
- TFVC (Microsoft Team Foundation Version Control)
- Future, depending on availability of TFVC API clients
- Subversion
- Follow the SG Cloud team's documentation to add the needed entries to the sourcegraphConnect targetGroups list in the Cloud instance's config.yaml, and get your PR approved and merged
- Clone this repo to a VM on the customer's network, and either install Docker and Docker's Compose plugin, or connect to a container platform
- Copy the
config.yaml
andservice-account-key.json
files using the instructions on the instance's Cloud Ops dashboard- Save them in the
./config/cloud-agent/
directory
- Save them in the
- Modify the contents of the
./config/cloud-agent/config.yaml
file:serviceAccountKeyFile: /sg/config/service-account-key.json
so the Go binary inside the agent container finds this file in the path as it's mapped via the docker-compose.yaml files- Only include the
- dialAddress
entries that this cloud agent instance can reach, remove the others, so the Cloud instance doesn't try using this agent instance for code hosts it can't reach - Use extra caution when pasting the
config.yaml
file in Windows, as it may use Windows' line endings or extra spaces, which breaks YAML, as a whitespace-dependent format
- Run
docker compose up -d
- Add a Code Host config to the customer's Cloud instance
- Type: src serve-git
- The url is the name of the container, ex.
"url": "http://src-serve-git-ubuntu.local:443",
"url": "http://src-serve-git-wsl.local:443",
- Note the port 443, even when used with http://
- Use the repo-converter to convert SVN,
TFVC, or Git repos,to Git format, which will store them in the../src-serve-root
directory, or use any other means to get the repos into the directory
- Env vars are used for configs which need the container to restart to get new values
- See
./src/config/load_env.py
for the list of environment variables, their data types, and default values - See
./src/config/validate_env.py
for any validation rules
- The contents of this file can be changed while the container is running, and the current version will be read at the start of each main loop in main.py
- Note, the syntax in the below examples is quite out of date, but the explanations of each may still be useful
- See
./config/repo-converter/repos-to-convert-example.yaml
for an example of the config layout - See
./src/config/load_repos.py
for the list of config keys - TODO: Move the config schema to a separate file, and read it into the code
- The default interval and batch size are set for sane polling for new repo commits during regular operations, but would be quite slow for initial cloning
- For initial cloning, adjust:
- The
REPO_CONVERTER_INTERVAL_SECONDS
environment variable- This is the outer loop interval, how often the service will start a repo conversion job for each repo, if one is not already running
- Thus, the longest break between two batches would be the length of this interval
- Try 60 seconds, and adjust based on your source code host's performance
- The
MAX_CONCURRENT_CONVERSIONS_PER_SERVER
andMAX_CONCURRENT_CONVERSIONS_GLOBAL
environment variables- These are the maximum number of concurrent / parallel repo conversion jobs which can be run, per source code host, and total for this service
- The defaults are 10 each, so if you're converting repos from two Subversion servers at the same time, a maximum of 10 jobs can run in parallel, from either server
- The
- Pull requests are always welcome
- See
./dev/TODO.md
for the list of tasks to be done