From 5c1162c6e5e6dd122a6e31dcbc013ed3e9b16f97 Mon Sep 17 00:00:00 2001 From: sai Date: Thu, 9 Oct 2025 23:35:06 +0530 Subject: [PATCH 1/5] engineer --- README.md | 2 + docs/context_window.md | 150 +++++++++++++++++++++++++++++++++++++++++ docs/index.rst | 1 + 3 files changed, 153 insertions(+) create mode 100644 docs/context_window.md diff --git a/README.md b/README.md index 172210af7f..6ad0d6055d 100644 --- a/README.md +++ b/README.md @@ -46,6 +46,8 @@ Choose **one** of: - Custom model: - See [docs](https://gpt-engineer.readthedocs.io/en/latest/open_models.html), supports local model, azure, etc. +Limiting context window: see `docs/context_window.md` for strategies to control token usage and avoid truncation. + Check the [Windows README](./WINDOWS_README.md) for Windows usage. **Other ways to run:** diff --git a/docs/context_window.md b/docs/context_window.md new file mode 100644 index 0000000000..37ea571499 --- /dev/null +++ b/docs/context_window.md @@ -0,0 +1,150 @@ +# Context window (token limit) + +This note explains what a context window (token limit) is, why it matters when using LLMs, and practical strategies to work within it. + +## What is the context window? + +A model's context window (also called token limit) is the maximum number of tokens the model can accept as input (and sometimes include in output). Tokens roughly correspond to pieces of words; common English text averages ~0.7–1.3 tokens per word depending on vocabulary and punctuation. + +If your prompt + conversation + document history exceed the context window, older content will be truncated (dropped) or the model will return an error depending on the client. + +## Why it matters + +- Cost: Many API providers bill per token. Sending more tokens increases costs. +- Performance: Larger inputs increase latency and can require more memory on the client/server side. +- Truncation / information loss: When the context exceeds the limit, parts of history or documents are omitted, which can break coherence, reasoning, or cause the model to lose earlier instructions or facts. + +## Practical strategies + +Below are three pragmatic strategies to manage content so it fits the context window while preserving useful information. + +### 1) Truncation (simple, predictable) + +When total tokens are too large, drop old or less-important content. This is easy, predictable, and safe for streaming/long chats. Use heuristics to drop older messages or large binary blobs (images, raw code) first. + +Pros: simple, low compute overhead. +Cons: may drop crucial earlier context. + +Conceptual pseudocode: + +``` +function build_payload(history, new_message, max_tokens): + payload = [system_prompt] + payload.append(new_message) + for msg in reversed(history): # start from most recent + if token_count(payload) + token_count(msg) > max_tokens: + break + payload.prepend(msg) + return payload +``` + +Tips: +- Keep a sliding window of the most recent N messages. +- Prefer to keep the system instructions and the most recent user/assistant turn. + +### 2) Summarization / compaction (preserve meaning) + +Compress older content into a shorter summary that preserves important facts. Periodically summarize the conversation or documents and store the summary in place of raw items. This preserves context at a lower token cost. + +Pros: maintains semantic information; better for long-running sessions. +Cons: requires extra API calls or compute for summarization and careful prompt engineering to avoid losing critical specifics. + +Conceptual pseudocode: + +``` +if total_tokens(history) > summary_threshold: + chunk = select_oldest_chunk(history) + summary = call_model_summarize(chunk) + remove chunk from history + append summary_marker(summary) to history + +# Then build payload as in truncation, prioritizing summaries + recent messages +``` + +Implementation notes: +- Use structured summaries when possible: facts, entities, decisions, open tasks. +- Keep both a human-readable summary and a small machine-friendly key-value store for retrieval. +- Re-summarize incrementally: each time you summarize, append to the summary rather than re-summarize everything from scratch. + +### 3) Configuration option (developer-facing control) + +Expose a configuration option to tune how the system behaves when approaching the token limit. Example knobs: + +- max_context_tokens: hard limit used when composing payloads. +- strategy: one of ["truncate", "summarize", "hybrid"]. +- preserve_system_prompts: boolean; always keep system prompts. +- preserve_recent_turns: N recent user/assistant turns to always keep. + +This lets users choose tradeoffs appropriate to their use case (cost vs. fidelity). + +Example configuration object (JSON-like): + +``` +config = { + "max_context_tokens": 32000, + "strategy": "hybrid", + "preserve_system_prompts": true, + "preserve_recent_turns": 6, + "summary_chunk_size": 4000 # tokens per summarization chunk +} +``` + +Hybrid strategy: try to include as much recent raw context as possible, then include summaries of older content, and finally truncate if still necessary. + +## Pseudocode: hybrid end-to-end + +``` +function prepare_context(history, new_message, config): + ensure system_prompt in history (or separate) + + # Step 1: try to keep recent turns + payload = [system_prompt, new_message] + for msg in reversed(history.recent(config.preserve_recent_turns)): + if token_count(payload) + token_count(msg) <= config.max_context_tokens: + payload.prepend(msg) + + # Step 2: include summaries of older content + older = history.older_than_recent() + for chunk in chunked(older, config.summary_chunk_size): + summary = get_or_create_summary(chunk) + if token_count(payload) + token_count(summary) <= config.max_context_tokens: + payload.append(summary) + else: + break + + # Step 3: if still too large, truncate the least-important remaining items + if token_count(payload) > config.max_context_tokens: + payload = truncate_least_important(payload, config.max_context_tokens) + + return payload +``` + +## Troubleshooting notes & edge cases + +- "Off-by-one" token errors: different tokenizers or APIs may count tokens differently. Always leave a safety buffer (e.g., 32–256 tokens) when computing allowed tokens for model input + expected output. + +- Unexpected truncation of system messages: ensure system prompts are treated as highest priority and pinned into the payload. + +- Cost spikes when summarizing: summarization itself consumes tokens (both input and output), so amortize summarization by doing it infrequently or offline when possible. + +- Losing exact data (e.g., code or long tables): summaries can lose exact formatting or specifics. For cases where exactness matters, keep the original as a downloadable artifact and include a short index or pointer in the summary. + +- Very long single documents: chunk documents into logical sections and summarize each section, or use retrieval (vector DB) + short relevant context injection instead of sending whole doc. + +- Multi-user/parallel sessions: keep per-session histories and shared summaries carefully namespaced to avoid mixing users' contexts. + +## Additional suggestions + +- Instrument token usage and provide metrics to users (tokens per request, cost per request, average history length). This helps tune thresholds. +- Provide a debugging mode that prints the token counts and what was dropped or summarized before each request. +- When integrating with retrieval (vector DBs), index long documents and retrieve only the most relevant chunks to inject into prompts rather than pushing entire documents. + +## References and further reading + +- Tokenization and how tokens map to words depends on the model's tokenizer (BPE / byte-level BPE etc.). +- For long-running agents, consider combining summarization with retrieval-augmented generation (RAG) patterns. + + +--- + +Notes: this page is intentionally concise. If you have an existing draft on the canvas you want copied verbatim, paste it here or tell me where to read it and I will replace this content with the draft's exact text. \ No newline at end of file diff --git a/docs/index.rst b/docs/index.rst index 11a428de13..b845558688 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -16,6 +16,7 @@ Welcome to GPT-ENGINEER's Documentation windows_readme_link open_models.md tracing_debugging.md + context_window.md .. toctree:: :maxdepth: 2 From 8ca6d13af8c64c6c54ce9d46888be6d3fee3fcc4 Mon Sep 17 00:00:00 2001 From: sai Date: Fri, 10 Oct 2025 00:18:02 +0530 Subject: [PATCH 2/5] docs: fix quickstart typos and links --- docs/quickstart.rst | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/quickstart.rst b/docs/quickstart.rst index ee26403192..afbe5cc08a 100644 --- a/docs/quickstart.rst +++ b/docs/quickstart.rst @@ -5,13 +5,13 @@ Quickstart Installation ============ -To install LangChain run: +To install gpt-engineer run: .. code-block:: console $ python -m pip install gpt-engineer -For more details, see our [Installation guide](/instllation.html). +For more details, see our `Installation guide `_. Setup API Key ============= @@ -29,9 +29,9 @@ Choose one of the following: - Create a copy of ``.env.template`` named ``.env`` - Add your ``OPENAI_API_KEY`` in .env -- If you want to use a custom model, visit our docs on `using open models and azure models <./open_models.html>`_. +- If you want to use a custom model, visit our docs on `using open models and azure models `_. -- To set API key on windows check the `Windows README <./windows_readme_link.html>`_. +- To set API key on Windows check the `Windows README `_. Building with ``gpt-engineer`` ============================== @@ -60,7 +60,7 @@ Improve Existing Code $ gpte projects/my-old-project -i -By running ``gpt-engineer`` you agree to our `terms <./terms_link.html>`_. +By running ``gpt-engineer`` you agree to our `terms `_. To **run in the browser** you can simply: From 2c5bc8d1c5c14887e093517c4332aba649710c47 Mon Sep 17 00:00:00 2001 From: sai Date: Fri, 10 Oct 2025 00:22:45 +0530 Subject: [PATCH 3/5] docs: fix example link in open_llms README --- docs/examples/open_llms/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/examples/open_llms/README.md b/docs/examples/open_llms/README.md index 93e8f3300f..b17e713092 100644 --- a/docs/examples/open_llms/README.md +++ b/docs/examples/open_llms/README.md @@ -53,4 +53,4 @@ export MODEL_NAME="CodeLlama" python examples/open_llms/langchain_interface.py ``` -That's it 🤓 time to go back [to](/docs/open_models.md#running-the-example) and give `gpte` a try. +That's it 🤓 time to go back [to](../../open_models.md) and give `gpte` a try. From 115ba28785785073d97b9dc7db7559e06873fae7 Mon Sep 17 00:00:00 2001 From: sai Date: Fri, 10 Oct 2025 05:45:51 +0530 Subject: [PATCH 4/5] docs: update README to simplified project overview --- README.md | 131 ++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 88 insertions(+), 43 deletions(-) diff --git a/README.md b/README.md index 6ad0d6055d..2599f060fe 100644 --- a/README.md +++ b/README.md @@ -12,80 +12,125 @@ The OG code genereation experimentation platform! If you are looking for the evolution that is an opinionated, managed service – check out gptengineer.app. If you are looking for a well maintained hackable CLI for – check out aider. + # gpt-engineer + +GitHub Repo stars · Discord Follow · License · GitHub Issues or Pull Requests · GitHub Release · Twitter Follow + +The OG code generation experimentation platform! + +If you are looking for the evolution that is an opinionated, managed service – check out gptengineer.app. + +If you are looking for a well maintained hackable CLI – check out aider. gpt-engineer lets you: + - Specify software in natural language - Sit back and watch as an AI writes and executes the code - Ask the AI to implement improvements + ## Getting Started ### Install gpt-engineer -For **stable** release: +For stable release: + +```bash +python -m pip install gpt-engineer +``` -- `python -m pip install gpt-engineer` +For development: -For **development**: -- `git clone https://github.com/gpt-engineer-org/gpt-engineer.git` -- `cd gpt-engineer` -- `poetry install` -- `poetry shell` to activate the virtual environment +```bash +git clone https://github.com/gpt-engineer-org/gpt-engineer.git +cd gpt-engineer +poetry install +poetry shell # activate the virtual environment +``` -We actively support Python 3.10 - 3.12. The last version to support Python 3.8 - 3.9 was [0.2.6](https://pypi.org/project/gpt-engineer/0.2.6/). +We actively support Python 3.10 - 3.12. The last version to support Python 3.8 - 3.9 was 0.2.6. ### Setup API key -Choose **one** of: -- Export env variable (you can add this to .bashrc so that you don't have to do it each time you start the terminal) - - `export OPENAI_API_KEY=[your api key]` -- .env file: - - Create a copy of `.env.template` named `.env` - - Add your OPENAI_API_KEY in .env -- Custom model: - - See [docs](https://gpt-engineer.readthedocs.io/en/latest/open_models.html), supports local model, azure, etc. +Choose one of: -Limiting context window: see `docs/context_window.md` for strategies to control token usage and avoid truncation. +- Export an environment variable (add it to your shell profile so you don't need to set it every time): -Check the [Windows README](./WINDOWS_README.md) for Windows usage. +```bash +export OPENAI_API_KEY=[your api key] +``` -**Other ways to run:** -- Use Docker ([instructions](docker/README.md)) -- Do everything in your browser: -[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://github.com/gpt-engineer-org/gpt-engineer/codespaces) +- Use a `.env` file: + - Create a copy of `.env.template` named `.env` + - Add your `OPENAI_API_KEY` in `.env` + +- Custom model: See the docs for instructions (supports local models, Azure, etc.). + +Check the `WINDOWS_README.md` file for Windows-specific instructions. + +Other ways to run: + +- Use Docker (see `docker/README.md`) +- Open in GitHub Codespaces + + +## Usage ### Create new code (default usage) -- Create an empty folder for your project anywhere on your computer -- Create a file called `prompt` (no extension) inside your new folder and fill it with instructions -- Run `gpte ` with a relative path to your folder - - For example: `gpte projects/my-new-project` from the gpt-engineer directory root with your new folder in `projects/` + +1. Create an empty folder for your project. +2. Inside that folder create a file named `prompt` (no extension) and fill it with your instructions. +3. From the gpt-engineer repo root run: + +```bash +gpte +# example: gpte projects/my-new-project +``` + ### Improve existing code -- Locate a folder with code which you want to improve anywhere on your computer -- Create a file called `prompt` (no extension) inside your new folder and fill it with instructions for how you want to improve the code -- Run `gpte -i` with a relative path to your folder - - For example: `gpte projects/my-old-project -i` from the gpt-engineer directory root with your folder in `projects/` -### Benchmark custom agents -- gpt-engineer installs the binary 'bench', which gives you a simple interface for benchmarking your own agent implementations against popular public datasets. -- The easiest way to get started with benchmarking is by checking out the [template](https://github.com/gpt-engineer-org/gpte-bench-template) repo, which contains detailed instructions and an agent template. -- Currently supported benchmark: - - [APPS](https://github.com/hendrycks/apps) - - [MBPP](https://github.com/google-research/google-research/tree/master/mbpp) +1. Locate the folder containing the code you want to improve. +2. Create a `prompt` file inside it with instructions for the improvement. +3. Run: + +```bash +gpte -i +# example: gpte projects/my-old-project -i +``` + + +### Benchmarking + +The `gpt-engineer` package installs a `bench` binary for benchmarking agent implementations. See the `gpte-bench-template` repo for a starter template. + +Supported datasets include APPS and MBPP. -The community has started work with different benchmarking initiatives, as described in [this Loom](https://www.loom.com/share/206805143fbb4302b5455a5329eaab17?sid=f689608f-8e49-44f7-b55f-4c81e9dc93e6) video. ### Research -Some of our community members have worked on different research briefs that could be taken further. See [this document](https://docs.google.com/document/d/1qmOj2DvdPc6syIAm8iISZFpfik26BYw7ZziD5c-9G0E/edit?usp=sharing) if you are interested. -## Terms -By running gpt-engineer, you agree to our [terms](https://github.com/gpt-engineer-org/gpt-engineer/blob/main/TERMS_OF_USE.md). +See the `docs` and community resources for research notes and briefs. + + +## Notes + +- Limiting context window: see `docs/context_window.md` for strategies to control token usage and avoid truncation. +- By running gpt-engineer you agree to the terms in `TERMS_OF_USE.md`. + + +## Links & Community + +- Roadmap: `ROADMAP.md` +- Governance: `GOVERNANCE.md` +- Contributing: `.github/CONTRIBUTING.md` +- Discord: https://discord.gg/8tcDQ89Ej2 + + +--- +_This README was updated locally._ -## Relation to gptengineer.app (GPT Engineer) -[gptengineer.app](https://gptengineer.app/) is a commercial project for the automatic generation of web apps. -It features a UI for non-technical users connected to a git-controlled codebase. The gptengineer.app team is actively supporting the open source community. From 218b041d6122589e608f8ad32dac254c379567c1 Mon Sep 17 00:00:00 2001 From: sai Date: Fri, 10 Oct 2025 05:47:06 +0530 Subject: [PATCH 5/5] docs: update docs and README --- .pre-commit-config.yaml | 2 +- docs/context_window.md | 2 +- docs/introduction.md | 4 ++-- docs/open_models.md | 2 +- 4 files changed, 5 insertions(+), 5 deletions(-) diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index 3fe0daa264..ab3b2685aa 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -1,7 +1,7 @@ # See https://pre-commit.com for more information # See https://pre-commit.com/hooks.html for more hooks fail_fast: true -default_stages: [commit] +default_stages: [pre-commit] repos: - repo: https://github.com/psf/black diff --git a/docs/context_window.md b/docs/context_window.md index 37ea571499..b44c8764ab 100644 --- a/docs/context_window.md +++ b/docs/context_window.md @@ -147,4 +147,4 @@ function prepare_context(history, new_message, config): --- -Notes: this page is intentionally concise. If you have an existing draft on the canvas you want copied verbatim, paste it here or tell me where to read it and I will replace this content with the draft's exact text. \ No newline at end of file +Notes: this page is intentionally concise. If you have an existing draft on the canvas you want copied verbatim, paste it here or tell me where to read it and I will replace this content with the draft's exact text. diff --git a/docs/introduction.md b/docs/introduction.md index 4cfdcf86a2..0b28807eb7 100644 --- a/docs/introduction.md +++ b/docs/introduction.md @@ -4,9 +4,9 @@
## Get started -[Here’s](/en/latest/installation.html) how to install ``gpt-engineer``, set up your environment, and start building. +[Here’s](installation.rst) how to install ``gpt-engineer``, set up your environment, and start building. -We recommend following our [Quickstart](/en/latest/quickstart.html) guide to familiarize yourself with the framework by building your first application with ``gpt-engineer``. +We recommend following our [Quickstart](quickstart.rst) guide to familiarize yourself with the framework by building your first application with ``gpt-engineer``.
diff --git a/docs/open_models.md b/docs/open_models.md index e1cc5a5294..fd688b19af 100644 --- a/docs/open_models.md +++ b/docs/open_models.md @@ -72,7 +72,7 @@ Feel free to try out larger models on your hardware and see what happens. Running the Example ================== -To see that your setup works check [test open LLM setup](examples/test_open_llm/README.md). +To see that your setup works check [test open LLM setup](examples/open_llms/README.md). If above tests work proceed 😉