You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Params handling for reasoning (CoT-capable) models other than OpenAI o-series. Enabled automatic retry of LLM calls with dropping unsupported params if such unsupported params were set for the model. Improved handling and validation of LLM call params.
Copy file name to clipboardExpand all lines: CONTRIBUTING.md
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -104,7 +104,7 @@ To sign the agreement:
104
104
pytest
105
105
```
106
106
107
-
Please note that we use pytest-vcr to record and replay LLM API interactions. Your changes may require re-recording VCR cassettes for the tests. See [VCR Cassette Management](#vcr-cassette-management) section below for details.
107
+
Please note that we use [pytest-recording](https://github.com/kiwicom/pytest-recording) to record and replay LLM API interactions. Your changes may require re-recording VCR cassettes for the tests. See [VCR Cassette Management](#vcr-cassette-management) section below for details.
108
108
109
109
4. **Commit your changes** using Conventional Commits format:
110
110
@@ -171,7 +171,7 @@ By submitting issues or feature requests to this project, you acknowledge that t
171
171
172
172
### VCR Cassette Management
173
173
174
-
We use pytest-vcr to record and replay HTTP interactions with LLM APIs. This allows tests to run without making actual API calls after the initial recording.
174
+
We use [pytest-recording](https://github.com/kiwicom/pytest-recording) to record and replay HTTP interactions with LLM APIs. This allows tests to run without making actual API calls after the initial recording.
<imgsrc="https://contextgem.dev/_static/tab_solid.png"alt="ContextGem: 2nd Product of the week"width="250">
22
23
<br/><br/>
23
24
24
25
ContextGem is a free, open-source LLM framework that makes it radically easier to extract structured data and insights from documents — with minimal code.
25
26
27
+
---
28
+
26
29
27
30
## 💎 Why ContextGem?
28
31
29
32
Most popular LLM frameworks for extracting structured data from documents require extensive boilerplate code to extract even basic information. This significantly increases development time and complexity.
30
33
31
34
ContextGem addresses this challenge by providing a flexible, intuitive framework that extracts structured data and insights from documents with minimal effort. Complex, most time-consuming parts are handled with **powerful abstractions**, eliminating boilerplate code and reducing development overhead.
32
35
33
-
Read more on the project [motivation](https://contextgem.dev/motivation.html) in the documentation.
36
+
📖 Read more on the project [motivation](https://contextgem.dev/motivation.html) in the documentation.
34
37
35
38
36
39
## ⭐ Key features
@@ -158,8 +161,9 @@ Read more on the project [motivation](https://contextgem.dev/motivation.html) in
158
161
159
162
\* See [descriptions](https://contextgem.dev/motivation.html#the-contextgem-solution) of ContextGem abstractions and [comparisons](https://contextgem.dev/vs_other_frameworks.html) of specific implementation examples using ContextGem and other popular open-source LLM frameworks.
160
163
164
+
## 💡 What you can build
161
165
162
-
## 💡 With **minimal code**, you can:
166
+
With **minimal code**, you can:
163
167
164
168
-**Extract structured data** from documents (text, images)
165
169
-**Identify and analyze key aspects** (topics, themes, categories) within documents ([learn more](https://contextgem.dev/aspects/aspects.html))
@@ -253,17 +257,17 @@ for item in anomalies_concept.extracted_items:
253
257
254
258
---
255
259
256
-
See more examples in the documentation:
260
+
### 📚 More Examples
257
261
258
-
### Basic usage examples
262
+
**Basic usage:**
259
263
-[Aspect Extraction from Document](https://contextgem.dev/quickstart.html#aspect-extraction-from-document)
260
264
-[Extracting Aspect with Sub-Aspects](https://contextgem.dev/quickstart.html#extracting-aspect-with-sub-aspects)
261
265
-[Concept Extraction from Aspect](https://contextgem.dev/quickstart.html#concept-extraction-from-aspect)
262
266
-[Concept Extraction from Document (text)](https://contextgem.dev/quickstart.html#concept-extraction-from-document-text)
263
267
-[Concept Extraction from Document (vision)](https://contextgem.dev/quickstart.html#concept-extraction-from-document-vision)
-[Extracting Aspects and Concepts from a Document](https://contextgem.dev/advanced_usage.html#extracting-aspects-and-concepts-from-a-document)
269
273
-[Using a Multi-LLM Pipeline to Extract Data from Several Documents](https://contextgem.dev/advanced_usage.html#using-a-multi-llm-pipeline-to-extract-data-from-several-documents)
Learn more about [DOCX converter features](https://contextgem.dev/converters/docx.html) in the documentation.
306
-
309
+
📖 Learn more about [DOCX converter features](https://contextgem.dev/converters/docx.html) in the documentation.
307
310
308
311
## 🎯 Focused document analysis
309
312
310
313
ContextGem leverages LLMs' long context windows to deliver superior extraction accuracy from individual documents. Unlike RAG approaches that often [struggle with complex concepts and nuanced insights](https://www.linkedin.com/pulse/raging-contracts-pitfalls-rag-contract-review-shcherbak-ai-ptg3f), ContextGem capitalizes on continuously expanding context capacity, evolving LLM capabilities, and decreasing costs. This focused approach enables direct information extraction from complete documents, eliminating retrieval inconsistencies while optimizing for in-depth single-document analysis. While this delivers higher accuracy for individual documents, ContextGem does not currently support cross-document querying or corpus-wide retrieval - for these use cases, modern RAG systems (e.g., LlamaIndex, Haystack) remain more appropriate.
311
314
312
-
Read more on [how ContextGem works](https://contextgem.dev/how_it_works.html) in the documentation.
313
-
315
+
📖 Read more on [how ContextGem works](https://contextgem.dev/how_it_works.html) in the documentation.
314
316
315
317
## 🤖 Supported LLMs
316
318
@@ -320,8 +322,7 @@ ContextGem supports both cloud-based and local LLMs through [LiteLLM](https://gi
320
322
-**Model Architectures**: Works with both reasoning/CoT-capable (e.g. o4-mini) and non-reasoning models (e.g. gpt-4.1)
321
323
-**Simple API**: Unified interface for all LLMs with easy provider switching
322
324
323
-
Learn more about [supported LLM providers and models](https://contextgem.dev/llms/supported_llms.html), how to [configure LLMs](https://contextgem.dev/llms/llm_config.html), and [LLM extraction methods](https://contextgem.dev/llms/llm_extraction_methods.html) in the documentation.
324
-
325
+
📖 Learn more about [supported LLM providers and models](https://contextgem.dev/llms/supported_llms.html), how to [configure LLMs](https://contextgem.dev/llms/llm_config.html), and [LLM extraction methods](https://contextgem.dev/llms/llm_extraction_methods.html) in the documentation.
325
326
326
327
## ⚡ Optimizations
327
328
@@ -342,36 +343,35 @@ ContextGem allows you to save and load Document objects, pipelines, and LLM conf
342
343
- Transfer extraction results between systems
343
344
- Persist pipeline and LLM configurations for later reuse
344
345
345
-
Learn more about [serialization options](https://contextgem.dev/serialization.html) in the documentation.
346
-
346
+
📖 Learn more about [serialization options](https://contextgem.dev/serialization.html) in the documentation.
347
347
348
348
## 📚 Documentation
349
349
350
-
Full documentation is available at [contextgem.dev](https://contextgem.dev).
351
-
352
-
A raw text version of the full documentation is available at [`docs/docs-raw-for-llm.txt`](https://github.com/shcherbak-ai/contextgem/blob/main/docs/docs-raw-for-llm.txt). This file is automatically generated and contains all documentation in a format optimized for LLM ingestion (e.g. for Q&A).
You can also explore the repository through [DeepWiki](https://deepwiki.com/shcherbak-ai/contextgem), an AI-powered conversational interface that provides visual architecture maps and natural language Q&A for the codebase.
352
+
📄 **Raw documentation for LLMs:** Available at [`docs/docs-raw-for-llm.txt`](https://github.com/shcherbak-ai/contextgem/blob/main/docs/docs-raw-for-llm.txt) - automatically generated, optimized for LLM ingestion.
355
353
356
-
For a history of changes, improvements, and bug fixes, see the [CHANGELOG](https://github.com/shcherbak-ai/contextgem/blob/main/CHANGELOG.md).
354
+
🤖 **AI-powered code exploration:**[DeepWiki](https://deepwiki.com/shcherbak-ai/contextgem) provides visual architecture maps and natural language Q&A for the codebase.
357
355
356
+
📈 **Change history:** See the [CHANGELOG](https://github.com/shcherbak-ai/contextgem/blob/main/CHANGELOG.md) for version history, improvements, and bug fixes.
358
357
359
358
## 💬 Community
360
359
361
-
If you have a feature request or a bug report, feel free to [open an issue](https://github.com/shcherbak-ai/contextgem/issues/new) on GitHub. If you'd like to discuss a topic or get general advice on using ContextGem for your project, start a thread in [GitHub Discussions](https://github.com/shcherbak-ai/contextgem/discussions/new/).
360
+
🐛 **Found a bug or have a feature request?**[Open an issue](https://github.com/shcherbak-ai/contextgem/issues/new) on GitHub.
362
361
362
+
💭 **Need help or want to discuss?** Start a thread in [GitHub Discussions](https://github.com/shcherbak-ai/contextgem/discussions/new/).
363
363
364
364
## 🤝 Contributing
365
365
366
-
We welcome contributions from the community - whether it's fixing a typo or developing a completely new feature! To get started, please check out our [Contributor Guidelines](https://github.com/shcherbak-ai/contextgem/blob/main/CONTRIBUTING.md).
366
+
We welcome contributions from the community - whether it's fixing a typo or developing a completely new feature!
367
367
368
+
📋 **Get started:** Check out our [Contributor Guidelines](https://github.com/shcherbak-ai/contextgem/blob/main/CONTRIBUTING.md).
368
369
369
370
## 🔐 Security
370
371
371
372
This project is automatically scanned for security vulnerabilities using [CodeQL](https://codeql.github.com/). We also use [Snyk](https://snyk.io) as needed for supplementary dependency checks.
372
373
373
-
See [SECURITY](https://github.com/shcherbak-ai/contextgem/blob/main/SECURITY.md) file for details.
374
-
374
+
🛡️ **Security policy:** See [SECURITY](https://github.com/shcherbak-ai/contextgem/blob/main/SECURITY.md) file for details.
375
375
376
376
## 💖 Acknowledgements
377
377
@@ -388,17 +388,20 @@ ContextGem relies on these excellent open-source packages:
388
388
389
389
## 🌱 Support the project
390
390
391
-
ContextGem is just getting started, and your support means the world to us! If you find ContextGem useful, the best way to help is by sharing it with others and giving the project a ⭐. Your feedback and contributions are what make this project grow!
391
+
ContextGem is just getting started, and your support means the world to us!
392
392
393
+
⭐ **Star the project** if you find ContextGem useful
394
+
📢 **Share it** with others who might benefit
395
+
🔧 **Contribute** with feedback, issues, or code improvements
393
396
394
-
## 📄 License & Contact
397
+
Your engagement is what makes this project grow!
395
398
396
-
This project is licensed under the Apache 2.0 License - see the [LICENSE](https://github.com/shcherbak-ai/contextgem/blob/main/LICENSE) and [NOTICE](https://github.com/shcherbak-ai/contextgem/blob/main/NOTICE) files for details.
**License:** Apache 2.0 License - see the [LICENSE](https://github.com/shcherbak-ai/contextgem/blob/main/LICENSE) and [NOTICE](https://github.com/shcherbak-ai/contextgem/blob/main/NOTICE) files for details.
399
402
400
-
Shcherbak AI is now part of Microsoft for Startups.
0 commit comments