Skip to content

Commit 464ca6b

Browse files
Copilotbact
andcommitted
Rename corpus_integrity to corpus_data to reflect expanded test scope
Co-authored-by: bact <128572+bact@users.noreply.github.com>
1 parent 6a4d46a commit 464ca6b

File tree

8 files changed

+25
-24
lines changed

8 files changed

+25
-24
lines changed

.github/workflows/corpus-data.yml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,14 +8,14 @@ on:
88
paths:
99
- ".github/workflows/corpus-data.yml"
1010
- "pythainlp/corpus/**"
11-
- "tests/corpus_integrity/**"
11+
- "tests/corpus_data/**"
1212
pull_request:
1313
branches:
1414
- dev
1515
paths:
1616
- ".github/workflows/corpus-data.yml"
1717
- "pythainlp/corpus/**"
18-
- "tests/corpus_integrity/**"
18+
- "tests/corpus_data/**"
1919

2020
# Avoid duplicate runs for the same source branch and repository
2121
concurrency:
@@ -50,16 +50,16 @@ jobs:
5050
env:
5151
PYTHONIOENCODING: utf-8
5252
run: |
53-
python -m unittest discover -s tests/corpus_integrity -p "test_catalog*.py" -v
53+
python -m unittest discover -s tests/corpus_data -p "test_catalog*.py" -v
5454
5555
- name: Test built-in corpus files
5656
env:
5757
PYTHONIOENCODING: utf-8
5858
run: |
59-
python -m unittest discover -s tests/corpus_integrity -p "test_builtin_*.py" -v
59+
python -m unittest discover -s tests/corpus_data -p "test_builtin_*.py" -v
6060
6161
- name: Test downloadable corpus files
6262
env:
6363
PYTHONIOENCODING: utf-8
6464
run: |
65-
python -m unittest discover -s tests/corpus_integrity -p "test_downloadable_*.py" -v
65+
python -m unittest discover -s tests/corpus_data -p "test_downloadable_*.py" -v

tests/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -169,20 +169,20 @@ for real-world usage:
169169
- Multi-engine robustness testing across all core tokenization engines
170170
- Very long strings that can cause performance issues (issue #893)
171171

172-
## Corpus data tests (corpus_integrity/)
172+
## Corpus data tests (corpus_data/)
173173

174174
A separate test suite that verifies the integrity, format, parseability, and catalog
175175
functionality of corpus data in PyThaiNLP. These tests are separate from regular unit tests
176176
because they test actual file loading and parsing (not mocked), downloadable corpus tests
177177
require network access, and they verify corpus catalog operations.
178178

179179
For detailed information about corpus data tests, see:
180-
[tests/corpus_integrity/README.md](corpus_integrity/README.md)
180+
[tests/corpus_data/README.md](corpus_data/README.md)
181181

182182
The corpus data tests are triggered automatically via GitHub Actions
183-
when changes are made to `pythainlp/corpus/**` or `tests/corpus_integrity/**`.
183+
when changes are made to `pythainlp/corpus/**` or `tests/corpus_data/**`.
184184

185185
**Run corpus data tests:**
186186
```shell
187-
python -m unittest tests.corpus_integrity
187+
python -m unittest tests.corpus_data
188188
```
Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -49,32 +49,32 @@ size of the downloads.
4949
Run all corpus data tests:
5050

5151
```bash
52-
python -m unittest discover -s tests/corpus_integrity -v
52+
python -m unittest discover -s tests/corpus_data -v
5353
```
5454

5555
Run only catalog tests:
5656

5757
```bash
58-
python -m unittest tests.corpus_integrity.test_catalog -v
58+
python -m unittest tests.corpus_data.test_catalog -v
5959
```
6060

6161
Run only built-in corpus tests:
6262

6363
```bash
64-
python -m unittest tests.corpus_integrity.test_builtin_corpus -v
64+
python -m unittest tests.corpus_data.test_builtin_corpus -v
6565
```
6666

6767
Run only downloadable corpus tests:
6868

6969
```bash
70-
python -m unittest tests.corpus_integrity.test_downloadable_corpus -v
70+
python -m unittest tests.corpus_data.test_downloadable_corpus -v
7171
```
7272

7373
## CI Integration
7474

7575
The corpus data tests run automatically via GitHub Actions workflow (`.github/workflows/corpus-data.yml`) when:
7676
- Changes are made to `pythainlp/corpus/**`
77-
- Changes are made to `tests/corpus_integrity/**`
77+
- Changes are made to `tests/corpus_data/**`
7878
- The workflow file itself is modified
7979

8080
## What is Tested

tests/corpus_data/__init__.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# SPDX-FileCopyrightText: 2016-2026 PyThaiNLP Project
2+
# SPDX-FileType: SOURCE
3+
# SPDX-License-Identifier: Apache-2.0
4+
"""
5+
Corpus data tests.
6+
7+
These tests verify the integrity, format, parseability, and catalog
8+
functionality of corpus data in PyThaiNLP. They are separate from
9+
regular unit tests to avoid slowing down development test cycles
10+
with large file downloads and network access.
11+
"""
File renamed without changes.
File renamed without changes.

tests/corpus_integrity/__init__.py

Lines changed: 0 additions & 10 deletions
This file was deleted.

0 commit comments

Comments
 (0)