This document walks through the actual code. We'll build key features step by step and explain the decisions along the way.
caesar-cipher/
├── src/caesar_cipher/
│ ├── __init__.py # Package exports for CaesarCipher and FrequencyAnalyzer
│ ├── cipher.py # Core encryption/decryption logic (62 lines)
│ ├── analyzer.py # Chi-squared frequency analysis (60 lines)
│ ├── constants.py # English letter frequencies, alphabet (43 lines)
│ ├── main.py # Typer CLI with 3 commands (171 lines)
│ └── utils.py # File I/O and validation (40 lines)
├── tests/
│ ├── test_cipher.py # Cipher tests with edge cases
│ ├── test_analyzer.py # Frequency analysis tests
│ └── test_cli.py # End-to-end CLI tests
└── pyproject.toml # Dependencies and tool config
What we're building: The fundamental operation that takes one letter and shifts it by N positions.
In cipher.py:30-40:
def _shift_char(self, char: str, shift: int) -> str:
"""
Shift a single character by the specified amount while preserving case
"""
if char in UPPERCASE_LETTERS:
idx = UPPERCASE_LETTERS.index(char)
return UPPERCASE_LETTERS[(idx + shift) % ALPHABET_SIZE]
if char in LOWERCASE_LETTERS:
idx = LOWERCASE_LETTERS.index(char)
return LOWERCASE_LETTERS[(idx + shift) % ALPHABET_SIZE]
return charWhy this code works:
- Line 35-36: Uppercase letters shift within uppercase. 'A' + 3 = 'D'. The modulo handles wrapping ('Z' + 1 = 'A').
- Line 37-39: Lowercase letters shift within lowercase independently. Preserves case.
- Line 40: Non-letters return unchanged. Spaces, punctuation, numbers pass through.
Common mistakes here:
# Wrong approach - converts everything to uppercase
def shift_char(char, shift):
if char.isalpha():
idx = ord(char.upper()) - ord('A')
return chr((idx + shift) % 26 + ord('A'))
return char
# Why this fails: "Hello" becomes "KHOOR" instead of "Khoor"You need separate handling for upper and lower case. The code uses two different alphabet strings (UPPERCASE_LETTERS and LOWERCASE_LETTERS from constants.py) to preserve case naturally.
Now we need to apply the shift to every character in the message.
In cipher.py:43-46:
def encrypt(self, plaintext: str) -> str:
"""
Encrypt plaintext using the configured shift key
"""
return "".join(self._shift_char(char, self.key) for char in plaintext)What's happening:
- Generator expression iterates each character:
for char in plaintext - Calls
_shift_char()with the instance's key - Joins all results into a string
Why we do it this way: This is more memory efficient than building a list and joining. For huge texts, the generator yields characters one at a time instead of storing all shifted characters in memory.
Alternative approaches:
- Loop with
result += shifted_char: Works but creates intermediate strings (O(n²) time in the worst case) - List comprehension then join:
"".join([...]): Uses more memory but same speed
In cipher.py:48-52:
def decrypt(self, ciphertext: str) -> str:
"""
Decrypt ciphertext using the configured shift key
"""
return "".join(self._shift_char(char, -self.key) for char in ciphertext)Key parts explained:
Decryption is just encryption with a negative key. If we encrypted with +3, we decrypt with -3. The _shift_char() method handles negative shifts automatically via modulo arithmetic:
'K' is position 10
Shift by -3: (10 + (-3)) % 26 = 7 % 26 = 7
Position 7 is 'H'
Python's modulo operator handles negatives correctly: (-1) % 26 = 25, which wraps 'A' back to 'Z'.
You have ciphertext but don't know the key. With only 26 possibilities, try them all.
Generate all 26 decryptions and let frequency analysis pick the best one.
In cipher.py:53-60:
@staticmethod
def crack(ciphertext: str) -> list[tuple[int, str]]:
"""
Attempt all possible shifts to decrypt ciphertext without knowing the key
"""
results = []
for shift in range(ALPHABET_SIZE):
cipher = CaesarCipher(key=shift)
decrypted = cipher.decrypt(ciphertext)
results.append((shift, decrypted))
return resultsWhy static method:
This doesn't operate on a specific instance. It generates all possible instances. Making it static makes the intent clear: CaesarCipher.crack(text) is like asking "what are all Caesar decryptions of this text?"
Optimization we didn't do:
We could avoid creating 26 cipher objects and just call _shift_char() directly. But clarity beats micro-optimization here. Creating 26 objects is negligible.
When you crack a message, you get 26 candidates. Which one is real English?
Score each candidate by how closely its letter frequencies match known English frequencies. Lower chi-squared scores mean better matches.
In analyzer.py:27-42:
def calculate_chi_squared(self, text: str) -> float:
"""
Calculate chi-squared statistic comparing text to expected English frequencies
"""
text_upper = text.upper()
letter_counts = Counter(char for char in text_upper if char.isalpha())
if not letter_counts:
return float("inf")
total_letters = sum(letter_counts.values())
chi_squared = 0.0
for letter, expected_freq in self.reference_frequencies.items():
observed_count = letter_counts.get(letter, 0)
expected_count = (expected_freq / 100) * total_letters
if expected_count > 0:
chi_squared += ((observed_count - expected_count)**2) / expected_count
return chi_squaredBreaking it down:
Lines 29-30: Convert to uppercase and count only letters. Counter gives us {'H': 1, 'E': 1, 'L': 2, 'O': 1} for "HELLO".
Lines 32-33: Empty text has infinite badness. Can't score nothing.
Line 35: Total count needed to convert percentages to expected counts.
Lines 38-41: For each letter A-Z, compare observed vs expected:
- Expected: If the text has 100 letters and E should be 12.7%, we expect 12.7 letters of E
- Observed: How many E's are actually there
- Chi-squared adds up squared differences normalized by expected values
Lower scores mean closer to English. Gibberish has wild frequency distributions and scores high.
In analyzer.py:53-60:
def rank_candidates(self, candidates: list[tuple[int, str]]) -> list[tuple[int, str, float]]:
"""
Rank decryption candidates by their English frequency score
"""
scored = [
(shift, text, self.score_text(text))
for shift, text in candidates
]
return sorted(scored, key=lambda x: x[2])Takes list of (shift, decrypted_text) tuples, adds scores, sorts by score (ascending, so best is first).
The tool has three commands: encrypt, decrypt, crack. All use similar patterns.
Encrypt command (main.py:24-60):
@app.command()
def encrypt(
text: Annotated[str | None, typer.Argument(...)] = None,
key: Annotated[int, typer.Option("--key", "-k", ...)] = 3,
input_file: Annotated[Path | None, typer.Option(...)] = None,
output_file: Annotated[Path | None, typer.Option(...)] = None,
quiet: Annotated[bool, typer.Option(...)] = False,
) -> None:The Annotated syntax:
Typer uses type hints to generate the CLI. Annotated[str | None, typer.Argument(...)] means "this is an optional string argument". The help text is in the Argument() call.
Validation flow:
try:
validate_key(key) # utils.py:36
plaintext = read_input(text, input_file) # utils.py:11
cipher = CaesarCipher(key=key)
encrypted = cipher.encrypt(plaintext)
# ... output ...
except (ValueError, OSError) as e:
console.print(f"[red]Error:[/red] {e}")
raise typer.Exit(code=1) from NoneError handling:
Catch both ValueError (from validation) and OSError (from file operations). Print error in red using Rich, then exit with code 1. The from None suppresses the exception chain in the output.
Three ways to provide input: command line argument, file, or stdin.
In utils.py:11-24:
def read_input(text: str | None, input_file: Path | None) -> str:
"""
Read input from text argument, file, or stdin
"""
if text:
return text
if input_file:
return input_file.read_text(encoding="utf-8")
if not sys.stdin.isatty():
return sys.stdin.read()
raise ValueError("No input provided. Use TEXT argument, --input-file, or pipe to stdin")Priority order:
- Explicit text argument
- Input file
- Stdin (but only if it's piped, not interactive terminal)
The sys.stdin.isatty() check:
Returns False when input is piped: echo "test" | caesar-cipher encrypt. Returns True when running interactively. We only read stdin if it's piped to avoid hanging waiting for user input that won't come.
The crack command displays results in a formatted table.
In main.py:135-148:
table = Table(title="Caesar Cipher Brute Force Results")
table.add_column("Rank", style="cyan", justify="right")
table.add_column("Shift", style="magenta", justify="right")
table.add_column("Score", style="yellow", justify="right")
table.add_column("Decrypted Text", style="green")
display_count = len(ranked) if show_all else min(top, len(ranked))
for rank, (shift, text_result, score) in enumerate(ranked[:display_count], 1):
table.add_row(
str(rank),
str(shift),
f"{score:.2f}",
text_result[:80] # Truncate long text
)
console.print(table)Why Rich: Colored output makes it obvious which columns are which. The table auto-formats and aligns columns. Much nicer than printing tab-separated values.
Truncation:
text_result[:80] limits displayed text to 80 characters. Otherwise really long decryptions make the table unreadable.
Example test for encryption (test_cipher.py:14-16):
def test_encrypt_basic(self) -> None:
cipher = CaesarCipher(key=3)
assert cipher.encrypt("HELLO") == "KHOOR"What this tests: Basic shift functionality. If this fails, everything is broken.
Edge cases tested:
- Wraparound:
test_alphabet_wraparound_uppercase()checks 'XYZ' → 'ABC' - Case preservation:
test_encrypt_mixed_case()checks "Hello World" → "Khoor Zruog" - Non-letters:
test_encrypt_preserves_punctuation()checks "Hello!" → "Khoor!" - Empty string:
test_empty_string()checks "" → ""
Roundtrip test (test_cipher.py:42-46):
def test_encrypt_decrypt_roundtrip(self) -> None:
cipher = CaesarCipher(key=13)
original = "The Quick Brown Fox Jumps Over The Lazy Dog!"
encrypted = cipher.encrypt(original)
decrypted = cipher.decrypt(encrypted)
assert decrypted == originalThis catches asymmetry bugs. If encrypt and decrypt aren't true inverses, this fails.
Testing that frequency analysis actually picks the right answer (test_analyzer.py:44-54):
def test_rank_candidates_with_actual_cipher(self) -> None:
cipher = CaesarCipher(key=3)
plaintext = "THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG"
ciphertext = cipher.encrypt(plaintext)
candidates = CaesarCipher.crack(ciphertext)
analyzer = FrequencyAnalyzer()
ranked = analyzer.rank_candidates(candidates)
best_shift, best_text, _best_score = ranked[0]
assert best_shift == 3
assert best_text == plaintextThis is an integration test because it uses both cipher and analyzer together. It verifies the whole crack workflow works end to end.
Testing the actual commands (test_cli.py:14-18):
def test_encrypt_basic(self) -> None:
result = runner.invoke(app, ["encrypt", "HELLO", "--key", "3"])
assert result.exit_code == 0
assert "KHOOR" in result.stdoutUses Typer's test runner to simulate command line invocation. Checks exit code and output.
File I/O test (test_cli.py:63-73):
def test_encrypt_from_file(self, tmp_path: Path) -> None:
input_file = tmp_path / "input.txt"
input_file.write_text("HELLO WORLD")
result = runner.invoke(
app,
["encrypt", "--input-file", str(input_file), "--key", "3"]
)
assert result.exit_code == 0
assert "KHOOR ZRUOG" in result.stdoutCreates a temp file, reads from it, verifies output. This tests the file handling path without leaving artifacts.
Symptom:
Crash or wrong output when using --key -3.
Cause:
# Problematic code - doesn't normalize negative keys
def __init__(self, key: int):
self.key = key # If key=-3, modulo in _shift_char breaksFix:
# Correct approach (cipher.py:23)
self.key = key % ALPHABET_SIZENormalizing during construction means all other code can assume key is 0-25.
Why this matters: Caesar cipher with key=-3 is the same as key=23. Both shift left by 3. Normalizing makes them equivalent.
Symptom: Division by zero in chi-squared calculation.
Cause:
# Bad - crashes if text is empty
def calculate_chi_squared(text):
total = sum(counts.values())
# ... expected_count / total will divide by zeroFix:
# Good - early return (analyzer.py:32-33)
if not letter_counts:
return float("inf")Empty text has infinitely bad frequency match. Can't be English if there are no letters.
Symptom:
decrypt("KHOOR") with key=3 gives gibberish instead of "HELLO".
Cause:
# Wrong - decrypt shifts forward instead of backward
def decrypt(self, ciphertext):
return self.encrypt(ciphertext) # This is just double encryption!Fix:
# Right - decrypt shifts backward (cipher.py:51)
return "".join(self._shift_char(char, -self.key) for char in ciphertext)Decryption must reverse the shift. Negative key does this.
Problem: The best ranked candidate isn't the correct decryption.
How to debug:
- Check
constants.py:17-43- Are the English frequencies correct? - Print scores for all 26 candidates - Is the correct one close to the top?
- Try with longer ciphertext - Short text doesn't have enough letters for reliable frequency analysis
Common causes:
- Text is too short (under 50 letters)
- Text isn't English (frequency analysis assumes English)
- Text has unusual word distribution (technical jargon, names)
Problem: caesar-cipher encrypt --key 3 hangs forever.
How to debug:
- Check if you forgot to provide input text
- Look at
utils.py:20-21- If stdin isn't a TTY, it waits for piped input - Either provide text:
caesar-cipher encrypt "HELLO" --key 3or pipe it:echo "HELLO" | caesar-cipher encrypt --key 3
Common cause: Running the command without text and without piping stdin. The code waits for input that never comes.
cipher.py
├── Only imports from constants
├── No CLI dependencies
└── No analyzer dependencies
We separate the cipher from its applications because:
- You can unit test encryption without mocking CLI
- The cipher could be used in a web API without changing anything
- Testing focuses on algorithm correctness, not I/O
This makes the codebase easier to reason about. If tests in test_cipher.py pass, the algorithm is correct regardless of how it's invoked.
_shift_char(leading underscore) = Internal helper method, not part of public APIUPPERCASE_LETTERS(all caps) = Constant that shouldn't changecrack(verb) = Methods are actions
Following these patterns makes it easier to distinguish between public API, internal implementation, and configuration.
Want to support numbers in encryption? Here's the process:
- Modify constants in
constants.py:10-11
DIGITS = string.digits
ALL_CHARS = UPPERCASE_LETTERS + LOWERCASE_LETTERS + DIGITS- Update _shift_char in
cipher.py:30-40
def _shift_char(self, char: str, shift: int) -> str:
if char in UPPERCASE_LETTERS:
idx = UPPERCASE_LETTERS.index(char)
return UPPERCASE_LETTERS[(idx + shift) % 26]
if char in LOWERCASE_LETTERS:
idx = LOWERCASE_LETTERS.index(char)
return LOWERCASE_LETTERS[(idx + shift) % 26]
if char in DIGITS:
idx = DIGITS.index(char)
return DIGITS[(idx + shift) % 10] # 10 digits
return char- Add tests in
test_cipher.py
def test_encrypt_digits(self) -> None:
cipher = CaesarCipher(key=3)
assert cipher.encrypt("Test123") == "Whvw456"The architecture makes this easy because cipher logic is isolated from everything else.
- typer (0.20.0): CLI framework with automatic help and type hints. Better than argparse for modern Python.
- rich (14.2.0): Colored terminal output and table formatting. Makes the crack command output readable.
- pytest (9.0.2): Test framework. Standard for Python testing.
- mypy (1.19.0): Static type checker. Catches type errors before runtime.
- ruff (0.14.8): Fast linter combining many tools. Replaces flake8, isort, black.
Check for vulnerabilities:
pip install pip-audit
pip-auditIf you see a vulnerability in typer or rich, check if there's a newer version that fixes it. Update the version constraints in pyproject.toml:7-9.
# Install in development mode
pip install -e .
# Or build a wheel
pip install build
python -m buildThis produces a wheel in dist/ that can be installed anywhere.
# Install with dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Run linter
ruff check src/ tests/
# Run type checker
mypy src/The [dev] extra includes testing and linting tools from pyproject.toml:10-17.
For a CLI tool, "production" means publishing to PyPI:
# Build
python -m build
# Upload to PyPI
python -m twine upload dist/*Then users install with pip install caesar-salad-cipher and get the caesar-cipher command in their PATH.
You've seen how the code works. Now:
- Try the challenges - 04-CHALLENGES.md has extension ideas like Vigenère cipher
- Modify the code - Change the frequency scoring to use bigrams (two-letter pairs) instead of single letters
- Read related projects - Look at real-world frequency analysis tools for inspiration