Skip to content

Commit 1bcfbe3

Browse files
Copilotbact
andcommitted
Simplify to "Corpus test" with tests.corpus package name
Co-authored-by: bact <128572+bact@users.noreply.github.com>
1 parent 464ca6b commit 1bcfbe3

File tree

7 files changed

+30
-30
lines changed

7 files changed

+30
-30
lines changed
Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,21 @@
11
# SPDX-FileCopyrightText: 2016-2026 PyThaiNLP Project
22
# SPDX-License-Identifier: Apache-2.0
33

4-
name: Corpus Data Test
4+
name: Corpus Test
55

66
on:
77
push:
88
paths:
9-
- ".github/workflows/corpus-data.yml"
9+
- ".github/workflows/corpus.yml"
1010
- "pythainlp/corpus/**"
11-
- "tests/corpus_data/**"
11+
- "tests/corpus/**"
1212
pull_request:
1313
branches:
1414
- dev
1515
paths:
16-
- ".github/workflows/corpus-data.yml"
16+
- ".github/workflows/corpus.yml"
1717
- "pythainlp/corpus/**"
18-
- "tests/corpus_data/**"
18+
- "tests/corpus/**"
1919

2020
# Avoid duplicate runs for the same source branch and repository
2121
concurrency:
@@ -26,7 +26,7 @@ concurrency:
2626
cancel-in-progress: true
2727

2828
jobs:
29-
corpus-data:
29+
corpus:
3030
runs-on: ubuntu-latest
3131
permissions:
3232
contents: read
@@ -50,16 +50,16 @@ jobs:
5050
env:
5151
PYTHONIOENCODING: utf-8
5252
run: |
53-
python -m unittest discover -s tests/corpus_data -p "test_catalog*.py" -v
53+
python -m unittest discover -s tests/corpus -p "test_catalog*.py" -v
5454
5555
- name: Test built-in corpus files
5656
env:
5757
PYTHONIOENCODING: utf-8
5858
run: |
59-
python -m unittest discover -s tests/corpus_data -p "test_builtin_*.py" -v
59+
python -m unittest discover -s tests/corpus -p "test_builtin_*.py" -v
6060
6161
- name: Test downloadable corpus files
6262
env:
6363
PYTHONIOENCODING: utf-8
6464
run: |
65-
python -m unittest discover -s tests/corpus_data -p "test_downloadable_*.py" -v
65+
python -m unittest discover -s tests/corpus -p "test_downloadable_*.py" -v

tests/README.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -169,20 +169,20 @@ for real-world usage:
169169
- Multi-engine robustness testing across all core tokenization engines
170170
- Very long strings that can cause performance issues (issue #893)
171171

172-
## Corpus data tests (corpus_data/)
172+
## Corpus test (corpus/)
173173

174174
A separate test suite that verifies the integrity, format, parseability, and catalog
175-
functionality of corpus data in PyThaiNLP. These tests are separate from regular unit tests
175+
functionality of corpus in PyThaiNLP. These tests are separate from regular unit tests
176176
because they test actual file loading and parsing (not mocked), downloadable corpus tests
177177
require network access, and they verify corpus catalog operations.
178178

179-
For detailed information about corpus data tests, see:
180-
[tests/corpus_data/README.md](corpus_data/README.md)
179+
For detailed information about corpus test, see:
180+
[tests/corpus/README.md](corpus/README.md)
181181

182-
The corpus data tests are triggered automatically via GitHub Actions
183-
when changes are made to `pythainlp/corpus/**` or `tests/corpus_data/**`.
182+
The corpus test is triggered automatically via GitHub Actions
183+
when changes are made to `pythainlp/corpus/**` or `tests/corpus/**`.
184184

185-
**Run corpus data tests:**
185+
**Run corpus test:**
186186
```shell
187-
python -m unittest tests.corpus_data
187+
python -m unittest tests.corpus
188188
```
Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
1-
# Corpus data tests
1+
# Corpus test
22

33
This directory contains tests that verify the integrity, format,
4-
parseability, and catalog functionality of corpus data in PyThaiNLP.
4+
parseability, and catalog functionality of corpus in PyThaiNLP.
55

66
## Purpose
77

88
These tests are separate from regular unit tests because:
99

1010
1. They test actual file loading and parsing (not mocked)
1111
2. Downloadable corpus tests require network access and can be slow
12-
3. They verify corpus data format and structure
12+
3. They verify corpus format and structure
1313
4. They test corpus catalog download and query functionality
1414
5. They should only run when corpus files or corpus code changes
1515

@@ -46,35 +46,35 @@ size of the downloads.
4646

4747
## Running Tests
4848

49-
Run all corpus data tests:
49+
Run all corpus tests:
5050

5151
```bash
52-
python -m unittest discover -s tests/corpus_data -v
52+
python -m unittest discover -s tests/corpus -v
5353
```
5454

5555
Run only catalog tests:
5656

5757
```bash
58-
python -m unittest tests.corpus_data.test_catalog -v
58+
python -m unittest tests.corpus.test_catalog -v
5959
```
6060

6161
Run only built-in corpus tests:
6262

6363
```bash
64-
python -m unittest tests.corpus_data.test_builtin_corpus -v
64+
python -m unittest tests.corpus.test_builtin_corpus -v
6565
```
6666

6767
Run only downloadable corpus tests:
6868

6969
```bash
70-
python -m unittest tests.corpus_data.test_downloadable_corpus -v
70+
python -m unittest tests.corpus.test_downloadable_corpus -v
7171
```
7272

7373
## CI Integration
7474

75-
The corpus data tests run automatically via GitHub Actions workflow (`.github/workflows/corpus-data.yml`) when:
75+
The corpus test runs automatically via GitHub Actions workflow (`.github/workflows/corpus.yml`) when:
7676
- Changes are made to `pythainlp/corpus/**`
77-
- Changes are made to `tests/corpus_data/**`
77+
- Changes are made to `tests/corpus/**`
7878
- The workflow file itself is modified
7979

8080
## What is Tested
@@ -100,6 +100,6 @@ When adding a new corpus file or function to `pythainlp.corpus`:
100100
## Relationship to Unit Tests
101101

102102
- **Unit tests** (`tests/core/test_corpus.py`): Use mocks for speed, test code logic
103-
- **Corpus data tests** (this directory): Use real data, test file integrity and catalog
103+
- **Corpus test** (this directory): Use real data, test file integrity and catalog
104104

105105
Both test suites are important and complementary.
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,10 @@
22
# SPDX-FileType: SOURCE
33
# SPDX-License-Identifier: Apache-2.0
44
"""
5-
Corpus data tests.
5+
Corpus test.
66
77
These tests verify the integrity, format, parseability, and catalog
8-
functionality of corpus data in PyThaiNLP. They are separate from
8+
functionality of corpus in PyThaiNLP. They are separate from
99
regular unit tests to avoid slowing down development test cycles
1010
with large file downloads and network access.
1111
"""
File renamed without changes.

0 commit comments

Comments
 (0)