|
2 | 2 |
|
3 | 3 | All notable changes to LocalLab will be documented in this file.
|
4 | 4 |
|
| 5 | +## [0.7.0] - 2025-05-16 |
| 6 | + |
| 7 | +### Improved |
| 8 | + |
| 9 | +- Significantly enhanced stream generation quality with comprehensive improvements: |
| 10 | + |
| 11 | + - Improved token generation parameters for higher quality responses |
| 12 | + - Enhanced stop sequence detection with better conversation markers handling |
| 13 | + - Implemented more intelligent repetition detection to prevent loops |
| 14 | + - Optimized token buffering and yielding logic for smoother streaming |
| 15 | + - Added better error handling and recovery in streaming responses |
| 16 | + |
| 17 | +- Improved non-streaming generation quality across all endpoints: |
| 18 | + |
| 19 | + - Enhanced generate, chat, and batch generation methods with optimized parameters |
| 20 | + - Implemented repetition detection to prevent the model from getting stuck |
| 21 | + - Added comprehensive special token handling and cleanup |
| 22 | + - Improved conversation marker detection for better response termination |
| 23 | + - Balanced parameters between quality and speed for optimal performance |
| 24 | + |
| 25 | +- Optimized memory management: |
| 26 | + - Reduced frequency of memory checks to avoid interrupting generation |
| 27 | + - Implemented smarter memory threshold for cache clearing |
| 28 | + - Added better error recovery for out-of-memory situations |
| 29 | + |
| 30 | +### Changed |
| 31 | + |
| 32 | +- Increased default max_length from 2048 to 4096 for non-streaming generation |
| 33 | +- Increased token generation batch size from 4 to 8 for better efficiency |
| 34 | +- Adjusted top_k (80), top_p (0.92), and repetition_penalty (1.15) for better quality |
| 35 | +- Increased max_time parameter to 180 seconds for more complete responses |
| 36 | +- Enhanced all generation endpoints with consistent high-quality parameters |
| 37 | + |
5 | 38 | ## [0.6.6] - 2025-05-16
|
6 | 39 |
|
7 | 40 | ### Fixed
|
|
0 commit comments