This document describes the performance characteristics of CryptVault, including benchmarks, optimization strategies, and performance targets.
- Target: Process 1000 data points in < 5 seconds
- Acceptable: < 10 seconds
- Current: ~3-5 seconds (varies by pattern complexity)
- Target: < 500 MB peak memory for typical analysis
- Acceptable: < 1 GB
- Current: ~200-400 MB (varies by data size)
| Component | Target Time | Acceptable Time | Notes |
|---|---|---|---|
| Data Fetching | < 1s | < 3s | Network dependent |
| Pattern Detection | < 2s | < 5s | Depends on pattern count |
| Technical Indicators | < 500ms | < 1s | Vectorized with NumPy |
| ML Predictions | < 1s | < 3s | Model dependent |
| Chart Generation | < 500ms | < 1s | Terminal rendering |
# Full workflow benchmark
python scripts/benchmark_performance.py --symbol BTC --iterations 10
# Indicator-only benchmark
python scripts/benchmark_performance.py --indicators-only
# Save report to file
python scripts/benchmark_performance.py --output benchmark_report.txt- Average Time: 3.2s
- Min Time: 2.8s
- Max Time: 4.1s
- Memory Peak: 320 MB
- SMA (20): 0.8ms
- EMA (12): 1.2ms
- RSI (14): 2.1ms
- MACD: 3.5ms
- Bollinger Bands: 2.8ms
- API Response Caching: 5-minute TTL for market data
- Computation Caching: Cache expensive calculations
- Pattern Caching: Cache detected patterns
from cryptvault.data.cache import DataCache
cache = DataCache(ttl=300) # 5 minute cache
data = cache.get_or_fetch(symbol, fetch_function)- Reuse HTTP connections for API calls
- Implement connection pooling for database access
- Use persistent sessions for external APIs
All indicator calculations use NumPy vectorization for optimal performance:
# Vectorized SMA calculation
def calculate_sma(prices, period):
weights = np.ones(period) / period
return np.convolve(prices, weights, mode='valid')Time Complexity: O(n) for all indicators Space Complexity: O(n)
- Use sliding window for moving averages
- Implement incremental calculations where possible
- Avoid redundant computations
- Limit pattern search to recent data (configurable window)
- Use peak/trough detection to reduce candidate points
- Filter patterns by minimum confidence threshold
- Pattern detectors can run independently
- Use concurrent execution for multiple pattern types
- Implement async operations for I/O-bound tasks
# Use context managers for resources
with open_connection() as conn:
data = fetch_data(conn)
# Connection automatically closedfrom cryptvault.utils.profiling import profile_memory
with profile_memory("pattern_detection") as mem_stats:
patterns = detect_patterns(data)
print(f"Peak memory: {mem_stats['peak_mb']:.2f} MB")- Limit maximum data points (default: 10,000)
- Truncate old data when exceeding limits
- Use generators for large datasets
- Cache trained models to avoid retraining
- Cache predictions with timestamp
- Implement prediction invalidation logic
- Extract features once and reuse
- Use efficient feature computation
- Cache feature matrices
from cryptvault.utils.profiling import profile_function
@profile_function
def my_function():
# Function code
passfrom cryptvault.utils.profiling import benchmark_operation
with benchmark_operation("data_fetch", {"symbol": "BTC"}):
data = fetch_data("BTC")from cryptvault.utils.profiling import profile_memory
with profile_memory("analysis") as mem_stats:
result = analyze_data(data)from cryptvault.utils.profiling import generate_performance_report
report = generate_performance_report()
print(report)- Issue: Network latency and rate limits
- Impact: 1-3 seconds per request
- Mitigation: Caching, connection pooling, batch requests
- Issue: Combinatorial complexity for some patterns
- Impact: 2-5 seconds for complex patterns
- Mitigation: Search space reduction, parallel processing
- Issue: Training on large datasets is slow
- Impact: 5-10 seconds for initial training
- Mitigation: Model caching, incremental training
- Analysis workflow execution time
- Component-level execution times
- Memory usage (peak and average)
- Cache hit rates
- API call latency
import logging
logger = logging.getLogger(__name__)
logger.info(f"Analysis completed in {execution_time:.2f}s")
logger.warning(f"Slow operation detected: {operation_name} took {time:.2f}s")- Log warnings for operations > 1 second
- Log errors for operations > 5 seconds
- Track performance degradation over time
- All indicators use NumPy vectorization
- API responses are cached appropriately
- Expensive computations are cached
- Resources are properly released (connections, files)
- Memory usage is within acceptable limits
- No unnecessary data copies
- Efficient algorithms with documented complexity
- Parallel processing where applicable
- Performance profiling enabled in development
- Benchmark tests run regularly
- Implement async data fetching
- Add more aggressive caching
- Optimize pattern detection algorithms
- Reduce memory allocations
- Implement distributed processing
- Add GPU acceleration for ML models
- Optimize database queries
- Implement streaming data processing
def test_indicator_performance():
"""Test that indicators meet performance targets."""
prices = generate_test_data(1000)
start = time.time()
result = calculate_sma(prices, 20)
duration = time.time() - start
assert duration < 0.01, f"SMA too slow: {duration:.4f}s"def test_analysis_performance():
"""Test that full analysis meets performance targets."""
analyzer = PatternAnalyzer()
start = time.time()
result = analyzer.analyze_ticker('BTC', days=60)
duration = time.time() - start
assert duration < 5.0, f"Analysis too slow: {duration:.2f}s"
assert result.success