Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .markdownlint.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
"default": true,
"MD013": {
"line_length": 150,
"tables": false
"tables": false,
"code_blocks": false
},
"MD033": false,
"MD041": false,
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ please cite the software as follows:

> Phatthiyaphaibun, Wannaphong, Korakot Chaovavanich, Charin Polpanumas,
> Arthit Suriyawongkul, Lalita Lowphansirikul, and Pattarawat Chormai.
> “Pythainlp: Thai Natural Language Processing in Python”.
> “PyThaiNLP: Thai Natural Language Processing in Python”.
> Zenodo, 2 June 2024. <http://doi.org/10.5281/zenodo.3519354>.

with this BibTeX entry:
Expand Down
15 changes: 13 additions & 2 deletions README_TH.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,10 @@ PyThaiNLP ดาวน์โหลดข้อมูล (ดูแค็ตต

หากคุณใช้ซอฟต์แวร์ `PyThaiNLP` ในโครงงานหรืองานวิจัยของคุณ คุณสามารถอ้างอิงได้ตามนี้:

> Phatthiyaphaibun, Wannaphong, Korakot Chaovavanich, Charin Polpanumas, Arthit Suriyawongkul, Lalita Lowphansirikul, and Pattarawat Chormai. “Pythainlp: Thai Natural Language Processing in Python”. Zenodo, 2 June 2024. <http://doi.org/10.5281/zenodo.3519354>.
> Phatthiyaphaibun, Wannaphong, Korakot Chaovavanich, Charin Polpanumas,
> Arthit Suriyawongkul, Lalita Lowphansirikul, and Pattarawat Chormai.
> “PyThaiNLP: Thai Natural Language Processing in Python”.
> Zenodo, 2 June 2024. <http://doi.org/10.5281/zenodo.3519354>.

โดยใช้รายการ BibTeX นี้:

Expand All @@ -157,7 +160,15 @@ PyThaiNLP ดาวน์โหลดข้อมูล (ดูแค็ตต
[NLP-OSS 2023](https://nlposs.github.io/2023/)
คุณสามารถอ้างอิงได้ตามนี้:

> Wannaphong Phatthiyaphaibun, Korakot Chaovavanich, Charin Polpanumas, Arthit Suriyawongkul, Lalita Lowphansirikul, Pattarawat Chormai, Peerat Limkonchotiwat, Thanathip Suntorntip, and Can Udomcharoenchaikit. 2023. [PyThaiNLP: Thai Natural Language Processing in Python.](https://aclanthology.org/2023.nlposs-1.4) In Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023), pages 25–36, Singapore, Singapore. Empirical Methods in Natural Language Processing.
> Wannaphong Phatthiyaphaibun, Korakot Chaovavanich, Charin Polpanumas,
> Arthit Suriyawongkul, Lalita Lowphansirikul, Pattarawat Chormai,
> Peerat Limkonchotiwat, Thanathip Suntorntip, and Can Udomcharoenchaikit.
> 2023.
> [PyThaiNLP: Thai Natural Language Processing in Python.](https://aclanthology.org/2023.nlposs-1.4)
> In Proceedings of the 3rd Workshop for Natural Language Processing
> Open Source Software (NLP-OSS 2023),
> pages 25–36, Singapore, Singapore.
> Empirical Methods in Natural Language Processing.

โดยใช้รายการ BibTeX นี้:

Expand Down
14 changes: 11 additions & 3 deletions pythainlp/corpus/corpus_license.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,16 @@
# Corpus License

- Corpora, datasets, and documentation created by the PyThaiNLP project are released under [Creative Commons Zero 1.0 Universal Public Domain Dedication License](https://creativecommons.org/publicdomain/zero/1.0/) (CC0).
- Language models created by the PyThaiNLP project are released under [Creative Commons Attribution 4.0 International Public License](https://creativecommons.org/licenses/by/4.0/) (CC-by).
- For more information about corpora that PyThaiNLP uses, see [https://github.com/PyThaiNLP/pythainlp-corpus/](https://github.com/PyThaiNLP/pythainlp-corpus/).
- Corpora, datasets, and documentation created by the PyThaiNLP project are
released under
[Creative Commons Zero 1.0 Universal Public Domain Dedication License][cc0]
(CC0-1.0).
- Language models created by the PyThaiNLP project are released under
[Creative Commons Attribution 4.0 International Public License][cc-by] (CC-BY-4.0).
- For more information about corpora that PyThaiNLP uses, see
<https://github.com/PyThaiNLP/pythainlp-corpus/>.

[cc0]: https://creativecommons.org/publicdomain/zero/1.0/
[cc-by]: https://creativecommons.org/licenses/by/4.0/

## Dictionaries and Word Lists

Expand Down
17 changes: 10 additions & 7 deletions tests/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
---
SPDX-FileCopyrightText: 2026 PyThaiNLP Project
SPDX-FileType: DOCUMENTATION
SPDX-License-Identifier: Apache-2.0
Expand Down Expand Up @@ -88,11 +89,13 @@ on their dependency requirements.

Different ML/AI frameworks often have conflicting version requirements for
their dependencies. For example:

- PyTorch and TensorFlow may require different versions of numpy or protobuf
- Large frameworks take significant time to install (~1-3 GB each)
- Some packages require Cython compilation or system libraries

By separating tests by dependency group, we can:

- Test each framework independently without conflicts
- Optimize CI/CD resources by running only relevant test groups
- Make it easier for developers to test specific functionality
Expand All @@ -106,9 +109,9 @@ By separating tests by dependency group, we can:
- Use this for comprehensive testing when all dependencies are available
- Test case class suffix: `TestCaseN`

#### Modular suites by dependency:
#### Modular suites by dependency

**PyTorch-based: tests.noauto_torch**
##### PyTorch-based: tests.noauto_torch

- Run `unittest tests.noauto_torch`
- Need dependencies from `pip install "pythainlp[noauto-torch]"`
Expand All @@ -121,7 +124,7 @@ By separating tests by dependency group, we can:
- Dependencies: ~2-3 GB
- Test case class suffix: `TestCaseN`

**TensorFlow-based: tests.noauto_tensorflow**
##### TensorFlow-based: tests.noauto_tensorflow

- Run `unittest tests.noauto_tensorflow`
- Need dependencies from `pip install "pythainlp[noauto-tensorflow]"`
Expand All @@ -131,7 +134,7 @@ By separating tests by dependency group, we can:
- Note: May conflict with PyTorch dependencies
- Test case class suffix: `TestCaseN`

**ONNX Runtime-based: tests.noauto_onnx**
##### ONNX Runtime-based: tests.noauto_onnx

- Run `unittest tests.noauto_onnx`
- Need dependencies from `pip install "pythainlp[noauto-onnx]"`
Expand All @@ -140,7 +143,7 @@ By separating tests by dependency group, we can:
- Dependencies: ~200-500 MB
- Test case class suffix: `TestCaseN`

**Cython-compiled: tests.noauto_cython**
##### Cython-compiled: tests.noauto_cython

- Run `unittest tests.noauto_cython`
- Need dependencies from `pip install "pythainlp[noauto-cython]"`
Expand All @@ -150,12 +153,12 @@ By separating tests by dependency group, we can:
- Platform-specific build requirements
- Test case class suffix: `TestCaseN`

**Network-dependent: tests.noauto_network**
##### Network-dependent: tests.noauto_network

- Run `unittest tests.noauto_network`
- Need dependencies from `pip install "pythainlp[noauto-network]"`
- Tests requiring network access:
- HuggingFace Hub model downloads
- Hugging Face Hub model downloads
- External API calls
- Requires: Internet connection, may involve large downloads
- Test case class suffix: `TestCaseN`
Expand Down
9 changes: 5 additions & 4 deletions tests/corpus/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
---
SPDX-FileCopyrightText: 2026 PyThaiNLP Project
SPDX-FileType: DOCUMENTATION
SPDX-License-Identifier: Apache-2.0
Expand Down Expand Up @@ -91,11 +92,11 @@ Each test verifies:
1. **Loadability**: File can be loaded without errors
2. **Type correctness**: Returns expected data type
(frozenset, list, dict)
4. **Non-empty**: Contains actual data
5. **Format validity**: Data structure matches expected format
6. **Content validity**: Contains expected content
3. **Non-empty**: Contains actual data
4. **Format validity**: Data structure matches expected format
5. **Content validity**: Contains expected content
(e.g., Thai characters)
8. **Catalog functionality**: Catalog can be downloaded
6. **Catalog functionality**: Catalog can be downloaded
and queried correctly

## Adding new tests
Expand Down
Loading