⚠️ Experimental Project – Not for Production Use
This repository demonstrates a research prototype that implements a simple n‑gram language model for code completion in VS Code. It is not guaranteed to be stable, fast, or secure enough for production workloads. Use at your own risk.
VSCode N‑Gram Code Suggester is a proof‑of‑concept that combines:
| Component | Description |
|---|---|
| Python training utility | CLI utility for train an n‑gram model on a codebase. Supports a handful of languages (C#, JavaScript, TypeScript, Python). |
| VS Code extension | Hooks into the editor’s Inline Suggest API and uses the trained n‑gram model to surface context‑aware completions. |
| Model file | A lightweight model that stores trigram/tag frequencies. |
The idea is to show that even a single‑sentence context can yield useful suggestions, without the heavy machinery of large neural models.
⚠️ The following steps assume you have Python 3.9+ installed and a recent VS Code release.
-
Clone the repo
git clone https://github.com/amest/vscode-ngram-code-suggester.git cd vscode-ngram-code-suggester -
Train a model on your codebase
python3 code_model_trainer.py \ --model extensions/models/model.json \ --language py
Replace the arguments above to fit your project. See the full CLI options table above.
Or you can skip this step and use default pre-trained model.
-
Build and Install the extension
# Build VSIX package cd extension npm install vsce package # Install VSIX in VS Code code --install-extension vscode-ngram-suggester-1.1.0.vsix
-
Enjoy autocompletion
Open a C# file, type a few tokens (words), and wait for auto‑suggestions.
⚠️ Important about configuration
If you train model on big dataset (after train, model have more 2 million patterns), you need to disable "Fuzzy search", "Use Smoothing" because suggest generation will become very slow. If suggest generation still slowly, enable "Use Trigger Characters".
If you dataset medium or small, don't disable this configuration. It's help to find suggestion if model don't contains equal pattern. Change "Min Confidence", "Max Fuzzy Checks" for control suggestion quality
Below is a quick reference to all user‑configurable options for the extension.
Add any of these to your workspace or user settings.json to tweak the behaviour.
| Setting | Type | Default | Constraints | Description |
|---|---|---|---|---|
| codeSuggester.modelPath | string | ./models/model.json.gz |
– | Path to the trained model file. Supports plain .json or gzipped .json.gz. |
| codeSuggester.maxSuggestions | number | 5 |
1 – 10 |
Maximum number of suggestions displayed in the IntelliSense list. |
| codeSuggester.maxFuzzyChecks | number | 2000 |
≥ 1000 |
Maximum number of fuzzy‑search checks performed. Higher values give better matches but can be slow on large models. |
| codeSuggester.minConfidence | number | 0.2 |
0.0 – 1.0 |
Minimum confidence threshold for a suggestion to be shown. |
| codeSuggester.enableFuzzyMatching | boolean | false |
– | Turns on fuzzy matching for similar code patterns. |
| codeSuggester.useSmoothing | boolean | false |
– | Enables smoothing algorithms to better handle rare n‑grams. |
| codeSuggester.useTriggerCharacters | boolean | false |
– | When enabled, suggestions are only triggered when the cursor is placed on a trigger character (. , ( ) [ { : ; =). Useful if auto‑suggestions feel sluggish. |
| codeSuggester.useProjectContext | boolean | true |
- | Use project context from open files for suggestions |
| codeSuggester.updateOnFileChange | boolean | false |
- | Update project model when files are modified (may impact performance) |
Experiment with the numeric limits and booleans to find the sweet spot for your project’s size and performance requirements.
For train model, need using python script code_model_trainer.py. Script scan code files in Glob pattern and index it.
| Argument | Short flag | Required / Optional | Type / Data type | Default | Allowed values | Description |
|---|---|---|---|---|---|---|
--model |
-m |
Required | string (file path) | – | – | Path to the model file |
--pattern |
-p |
Optional | string (glob pattern) | – | – | Glob pattern to match code files |
--language |
-l |
Optional | string | – | cs, js, ts, py, all |
Predefined glob pattern for programming language |
--n-gram |
-n |
Optional | integer | 4 |
– | Size of the n‑gram (default: 4) |
--smoothing |
-s |
Optional | string | laplace |
none, laplace |
Smoothing method (default: laplace) |
--alpha |
-a |
Optional | float | 1.0 |
– | Alpha parameter for Laplace smoothing (default: 1.0) |
--no-compress |
– | Optional | flag (boolean) | False (unset) |
– | Save the output without compression (set to True when present) |
# Only C#
python3 code_model_trainer.py --model ./model.json --language cs
# Python & JavaScript
python3 code_model_trainer.py --model ./model.json --language py
python3 code_model_trainer.py --model ./model.json --language js
# Mixed use without compression
python3 code_model_trainer.py --model ./model.json --pattern "src/**/*.ts" --language cs --no-compress
Clone repose in list and train you model:
- CSharp:
- Python
- TypeScript & JavaScript
Distributed under the MIT License. See the LICENSE file for details.