diff --git a/.markdownlint.json b/.markdownlint.json index 24adacd16..d2a2fc0c6 100644 --- a/.markdownlint.json +++ b/.markdownlint.json @@ -2,7 +2,8 @@ "default": true, "MD013": { "line_length": 150, - "tables": false + "tables": false, + "code_blocks": false }, "MD033": false, "MD041": false, diff --git a/README.md b/README.md index 261d13d2f..54ee858c7 100644 --- a/README.md +++ b/README.md @@ -104,7 +104,7 @@ please cite the software as follows: > Phatthiyaphaibun, Wannaphong, Korakot Chaovavanich, Charin Polpanumas, > Arthit Suriyawongkul, Lalita Lowphansirikul, and Pattarawat Chormai. -> “Pythainlp: Thai Natural Language Processing in Python”. +> “PyThaiNLP: Thai Natural Language Processing in Python”. > Zenodo, 2 June 2024. . with this BibTeX entry: diff --git a/README_TH.md b/README_TH.md index 97d647ff8..ee1696616 100644 --- a/README_TH.md +++ b/README_TH.md @@ -131,7 +131,10 @@ PyThaiNLP ดาวน์โหลดข้อมูล (ดูแค็ตต หากคุณใช้ซอฟต์แวร์ `PyThaiNLP` ในโครงงานหรืองานวิจัยของคุณ คุณสามารถอ้างอิงได้ตามนี้: -> Phatthiyaphaibun, Wannaphong, Korakot Chaovavanich, Charin Polpanumas, Arthit Suriyawongkul, Lalita Lowphansirikul, and Pattarawat Chormai. “Pythainlp: Thai Natural Language Processing in Python”. Zenodo, 2 June 2024. . +> Phatthiyaphaibun, Wannaphong, Korakot Chaovavanich, Charin Polpanumas, +> Arthit Suriyawongkul, Lalita Lowphansirikul, and Pattarawat Chormai. +> “PyThaiNLP: Thai Natural Language Processing in Python”. +> Zenodo, 2 June 2024. . โดยใช้รายการ BibTeX นี้: @@ -157,7 +160,15 @@ PyThaiNLP ดาวน์โหลดข้อมูล (ดูแค็ตต [NLP-OSS 2023](https://nlposs.github.io/2023/) คุณสามารถอ้างอิงได้ตามนี้: -> Wannaphong Phatthiyaphaibun, Korakot Chaovavanich, Charin Polpanumas, Arthit Suriyawongkul, Lalita Lowphansirikul, Pattarawat Chormai, Peerat Limkonchotiwat, Thanathip Suntorntip, and Can Udomcharoenchaikit. 2023. [PyThaiNLP: Thai Natural Language Processing in Python.](https://aclanthology.org/2023.nlposs-1.4) In Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023), pages 25–36, Singapore, Singapore. Empirical Methods in Natural Language Processing. +> Wannaphong Phatthiyaphaibun, Korakot Chaovavanich, Charin Polpanumas, +> Arthit Suriyawongkul, Lalita Lowphansirikul, Pattarawat Chormai, +> Peerat Limkonchotiwat, Thanathip Suntorntip, and Can Udomcharoenchaikit. +> 2023. +> [PyThaiNLP: Thai Natural Language Processing in Python.](https://aclanthology.org/2023.nlposs-1.4) +> In Proceedings of the 3rd Workshop for Natural Language Processing +> Open Source Software (NLP-OSS 2023), +> pages 25–36, Singapore, Singapore. +> Empirical Methods in Natural Language Processing. โดยใช้รายการ BibTeX นี้: diff --git a/pythainlp/corpus/corpus_license.md b/pythainlp/corpus/corpus_license.md index f4e0efba0..6094288a4 100644 --- a/pythainlp/corpus/corpus_license.md +++ b/pythainlp/corpus/corpus_license.md @@ -1,8 +1,16 @@ # Corpus License -- Corpora, datasets, and documentation created by the PyThaiNLP project are released under [Creative Commons Zero 1.0 Universal Public Domain Dedication License](https://creativecommons.org/publicdomain/zero/1.0/) (CC0). -- Language models created by the PyThaiNLP project are released under [Creative Commons Attribution 4.0 International Public License](https://creativecommons.org/licenses/by/4.0/) (CC-by). -- For more information about corpora that PyThaiNLP uses, see [https://github.com/PyThaiNLP/pythainlp-corpus/](https://github.com/PyThaiNLP/pythainlp-corpus/). +- Corpora, datasets, and documentation created by the PyThaiNLP project are + released under + [Creative Commons Zero 1.0 Universal Public Domain Dedication License][cc0] + (CC0-1.0). +- Language models created by the PyThaiNLP project are released under + [Creative Commons Attribution 4.0 International Public License][cc-by] (CC-BY-4.0). +- For more information about corpora that PyThaiNLP uses, see + . + +[cc0]: https://creativecommons.org/publicdomain/zero/1.0/ +[cc-by]: https://creativecommons.org/licenses/by/4.0/ ## Dictionaries and Word Lists diff --git a/tests/README.md b/tests/README.md index f17bc06eb..3a4a6a607 100644 --- a/tests/README.md +++ b/tests/README.md @@ -1,3 +1,4 @@ +--- SPDX-FileCopyrightText: 2026 PyThaiNLP Project SPDX-FileType: DOCUMENTATION SPDX-License-Identifier: Apache-2.0 @@ -88,11 +89,13 @@ on their dependency requirements. Different ML/AI frameworks often have conflicting version requirements for their dependencies. For example: + - PyTorch and TensorFlow may require different versions of numpy or protobuf - Large frameworks take significant time to install (~1-3 GB each) - Some packages require Cython compilation or system libraries By separating tests by dependency group, we can: + - Test each framework independently without conflicts - Optimize CI/CD resources by running only relevant test groups - Make it easier for developers to test specific functionality @@ -106,9 +109,9 @@ By separating tests by dependency group, we can: - Use this for comprehensive testing when all dependencies are available - Test case class suffix: `TestCaseN` -#### Modular suites by dependency: +#### Modular suites by dependency -**PyTorch-based: tests.noauto_torch** +##### PyTorch-based: tests.noauto_torch - Run `unittest tests.noauto_torch` - Need dependencies from `pip install "pythainlp[noauto-torch]"` @@ -121,7 +124,7 @@ By separating tests by dependency group, we can: - Dependencies: ~2-3 GB - Test case class suffix: `TestCaseN` -**TensorFlow-based: tests.noauto_tensorflow** +##### TensorFlow-based: tests.noauto_tensorflow - Run `unittest tests.noauto_tensorflow` - Need dependencies from `pip install "pythainlp[noauto-tensorflow]"` @@ -131,7 +134,7 @@ By separating tests by dependency group, we can: - Note: May conflict with PyTorch dependencies - Test case class suffix: `TestCaseN` -**ONNX Runtime-based: tests.noauto_onnx** +##### ONNX Runtime-based: tests.noauto_onnx - Run `unittest tests.noauto_onnx` - Need dependencies from `pip install "pythainlp[noauto-onnx]"` @@ -140,7 +143,7 @@ By separating tests by dependency group, we can: - Dependencies: ~200-500 MB - Test case class suffix: `TestCaseN` -**Cython-compiled: tests.noauto_cython** +##### Cython-compiled: tests.noauto_cython - Run `unittest tests.noauto_cython` - Need dependencies from `pip install "pythainlp[noauto-cython]"` @@ -150,12 +153,12 @@ By separating tests by dependency group, we can: - Platform-specific build requirements - Test case class suffix: `TestCaseN` -**Network-dependent: tests.noauto_network** +##### Network-dependent: tests.noauto_network - Run `unittest tests.noauto_network` - Need dependencies from `pip install "pythainlp[noauto-network]"` - Tests requiring network access: - - HuggingFace Hub model downloads + - Hugging Face Hub model downloads - External API calls - Requires: Internet connection, may involve large downloads - Test case class suffix: `TestCaseN` diff --git a/tests/corpus/README.md b/tests/corpus/README.md index a1a5f434c..e2543a011 100644 --- a/tests/corpus/README.md +++ b/tests/corpus/README.md @@ -1,3 +1,4 @@ +--- SPDX-FileCopyrightText: 2026 PyThaiNLP Project SPDX-FileType: DOCUMENTATION SPDX-License-Identifier: Apache-2.0 @@ -91,11 +92,11 @@ Each test verifies: 1. **Loadability**: File can be loaded without errors 2. **Type correctness**: Returns expected data type (frozenset, list, dict) -4. **Non-empty**: Contains actual data -5. **Format validity**: Data structure matches expected format -6. **Content validity**: Contains expected content +3. **Non-empty**: Contains actual data +4. **Format validity**: Data structure matches expected format +5. **Content validity**: Contains expected content (e.g., Thai characters) -8. **Catalog functionality**: Catalog can be downloaded +6. **Catalog functionality**: Catalog can be downloaded and queried correctly ## Adding new tests