- Location: Macau, China
- Dates: November 3-5, 2026
- Registration: Registration Form
- Contact: nlp2ct.junchao@gmail.com
The rapid development of large language models (LLMs) has given rise to a series of challenges, including the generation of disinformation, the spread of harmful content, and various forms of misuse. Against this backdrop, the efficient discrimination between LLM-generated text and human-written text has become an urgent and critical research issue in the field of natural language processing (NLP). While remarkable progress has been made, relevant research has largely focused on English, systematic and technical exploration for the Chinese remain scarce. This shared task aims to fill this gap, build more robust Chinese LLM-generated text detectors, and advance research and real-world applications in this field within the Chinese.
Following the success of the 1st Shared Task on LLM-Generated Text Detection (NLPCC 2025), the 2nd Shared Task on LLM-Generated Text Detection in 2026 features significant upgrades: the task formulation has been expanded from binary to ternary classification. Specifically, in addition to distinguishing between human-written text and LLM-generated text, a new category for identifying LLM-refined text has been introduced, which better aligns with real-world application scenarios of LLMs. Participating teams are required to design and implement text detection algorithms based on the training data provided to achieve accurate classification and will undergo rigorous stress testing.
- [ 2025.04.15 ] 💡 We have released the detailed task guidelines and training data ~
- [ 2025.03.20 ] 🔥 Registration for NLPCC-2026 Task 6 is now open. Welcome to join us!
The training set for this task is primarily sampled and adapted from the CUDRT (TIST 2026) dataset (Chinese subset: Complete 25 ratio). The test set will be released in the later stages of the competition for teams to verify the effectiveness of their methods and evaluate final performance. The evaluation dataset is extended and constructed based on the DetectRL benchmark (NeurIPS 2024) framework, containing multiple generation models and domain data to ensure the authenticity and challenge of the evaluation scenarios.
Training Set
Contains data from 4 types of LLMs and 2 domains. Specifically, data sources include news and academic writing, and generation models include GPT-4, Qwen, ChatGLM, and Baichuan. The training set contains a total of 20,370 pairs of samples. Each pair contains four fields: "ID", "HWT", "LGT", and "HLT". "ID" identifies the sample number, while "HWT", "LGT", and "HLT" represent "Human-written text", "LLM-generated text", and "LLM-refined text" respectively.
Data Download
The training data can be found at the following Github folder link:
Data Restrictions
- To support the development of detection systems, participants are allowed to perform data processing and augmentation based on the provided training data, but no external additional new datasets may be introduced.
Data Format
Data is stored in JSON object format.
- Training data format:
{
"ID": sample ID,
"HWT": "text written by a human",
"LGT": "text generated by a model",
"HLT": "text written by a human, then refined by a model",
}- Test set format:
{
"ID": test sample ID,
"text": "HWT, LGT, or HLT",
"label": "label (HWT: 0, LGT: 1, HLT: 2)",
}Evaluation Metrics
The official evaluation metric for this task is the macro-averaged F1-Score.
The submission platform for this evaluation task (tentatively Alibaba Cloud Tianchi) will be launched one week before the release of the test data for Phase 1 evaluation (open evaluation, leaderboard verification); and subsequently Phase 2 evaluation (closed evaluation, final ranking). Specific details of the test result text are as follows:
- Test Result File
- Your test result file must be a JSON file containing all samples.
- Please ensure that the text and ID fields remain unchanged.
- Each sample in the JSON file should contain the following fields:
- "ID": Unique identifier of the sample
- "text": Text content of the sample
- "label": Classification result according to the following rules:
- Human-written text: label is 0
- LLM-generated text: label is 1
- LLM-refined text: label is 2
- Code and Data
- The code folder should contain all the code required for data augmentation, data processing, model training, and model inference, as well as the complete dataset used to train the detector (i.e., containing augmented or processed samples).
- Since the submitted code may be reviewed and reproduced, please include a simple README.md (or equivalent document) and an environment configuration file (e.g., requirements.txt, if applicable). In the document, briefly explain the reproduction process to ensure that your submitted results can be reproduced.
- Technical Report
- The technical report should describe in detail the methods used to solve the task, including data processing, data augmentation, and the specific methods/model architectures and parameters used. If necessary, you can add formulas and pseudo-code to help understanding. All solutions will be handled confidentially and deleted uniformly after the NLPCC 2026 conference (no archiving).
Important Notes
- If any violation of the competition rules is found, the participation results will be disqualified.
- Submissions after the deadline will not be accepted.
- The organizers reserve all rights, including but not limited to modifying the competition rules, data, etc., according to the actual situation during the competition. All modifications will be synchronized to all participating teams via email as soon as possible. Participating teams are also requested to stay tuned to our task official website [Latest News].
- If you encounter any problems during the competition, please feel free to contact us via the following email: nlp2ct.junchao@gmail.com.
| Date | Task |
|---|---|
| March 20, 2026 | Shared task announcement and call for participation |
| March 20, 2026 | Registration opens |
| April 15, 2026 | Release of detailed task guidelines and training data |
| May 25, 2026 | Registration deadline |
| June 11, 2026 | Test data release |
| June 20, 2026 | Deadline for participants to submit results |
| June 30, 2026 | Evaluation results released and call for system reports and conference papers |
This shared task is jointly organized by the Natural Language Processing & Portuguese-Chinese Machine Translation Laboratory (NLP2CT) at the University of Macau, Central China Normal University, and Alibaba Cloud.
- Junchao Wu | University of Macau (Contact: nlp2ct.junchao@gmail.com | Homepage)
- Derek Fai Wong | University of Macau (Homepage)
- Runzhe Zhan | University of Macau (Homepage)
- Zeyu Wu | University of Macau
- Zhiwen Xie | University of Macau / Central China Normal University
- Yichao Du | Alibaba Cloud
- Longyue Wang | Alibaba Cloud (Homepage)
FAQ
Q: Where can I register for this shared task? Can team registration be updated?
A: Fill out the following link to register: https://alidocs.dingtalk.com/notable/share/form/v018K4nyeZ1wGK0YnLb_dv19yqvsgs3oebp3pcjys_bUfwzOn. Before the registration deadline, you can fill out this link repeatedly to update team registration information.
References
The following are references related to this shared task. If your research uses relevant datasets, or if this task is helpful to you, please consider citing the following literature:
@article{wu2024detectrl,
title={Detectrl: Benchmarking llm-generated text detection in real-world scenarios},
author={Wu, Junchao and Zhan, Runzhe and Wong, Derek F and Yang, Shu and Yang, Xinyi and Yuan, Yulin and Chao, Lidia S},
journal={Advances in Neural Information Processing Systems},
volume={37},
pages={100369--100401},
year={2024}
}@inproceedings{wu2025overview,
title={Overview of the NLPCC 2025 Shared Task 1: LLM-Generated Text Detection},
author={Wu, Junchao and Zhan, Runzhe and Wang, Qianli and Yuan, Yulin and Chao, Lidia S and Wong, Derek F},
booktitle={CCF International Conference on Natural Language Processing and Chinese Computing},
pages={263--274},
year={2025},
organization={Springer}
}@article{tao2026toward,
title={Toward Reliable Detection of LLM-Generated Texts: A Comprehensive Evaluation Framework with CUDRT},
author={Tao, Zhen and Chen, Yanfang and Xi, Dinghao and Li, Zhiyu and Xu, Wei},
journal={ACM Transactions on Intelligent Systems and Technology},
volume={17},
number={2},
pages={1--35},
year={2026}
}Other results of our work in the field of LLM-generated text detection for reference:
- Wu, J., Yang, S., Zhan, R., Yuan, Y., Chao, L. S., & Wong, D. F. (2025). A Survey on LLM-Generated Text Detection: Necessity, Methods, and Future Directions. Computational Linguistics, 1-66.
- Wu, J., Zhan, R., Wong, D. F., Yang, S., Liu, X., Chao, L. S., & Zhang, M. (2025, January). Who Wrote This? The Key to Zero-Shot LLM-Generated Text Detection Is GECScore. In Proceedings of the 31st International Conference on Computational Linguistics (pp. 10275-10292).
- Chen, X., Wu, J., Yang, S., Zhan, R., Wu, Z., Luo, Z., Wang, D., Yang, M., Chao, L. S., & Wong, D. F. (2025). RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns. Transactions of the Association for Computational Linguistics, 13, 1812-1831.
If you are a new researcher in this field, we hope the following papers can help you quickly get familiar with this field (continuously updating):
- Bao, G., Zhao, Y., Teng, Z., Yang, L., & Zhang, Y. Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature. In The Twelfth International Conference on Learning Representations.
- Hans, A., Schwarzschild, A., Cherepanova, V., Kazemi, H., Saha, A., Goldblum, M., ... & Goldstein, T. (2024, July). Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text. In International Conference on Machine Learning (pp. 17519-17537). PMLR.
- Koike, R., Kaneko, M., & Okazaki, N. (2024, March). Outfox: Llm-generated essay detection through in-context learning with adversarially generated examples. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 38, No. 19, pp. 21258-21266).
- Zhu, B., Yuan, L., Cui, G., Chen, Y., Fu, C., He, B., ... & Gu, M. (2023, December). Beat llms at their own game: Zero-shot llm-generated text detection via querying chatgpt. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (pp. 7470-7483).
- Yu, X., Qi, Y., Chen, K., Chen, G., Yang, X., Zhu, P., ... & Yu, N. (2024). Dpic: Decoupling prompt and intrinsic characteristics for llm generated text detection. Advances in Neural Information Processing Systems, 37, 16194-16212.
- Hu, X., Chen, P. Y., & Ho, T. Y. (2023). Radar: Robust ai-text detection via adversarial learning. Advances in neural information processing systems, 36, 15077-15095.
- Yu, X., Chen, K., Yang, Q., Zhang, W., & Yu, N. (2024, November). Text fluoroscopy: Detecting LLM-generated text through intrinsic features. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (pp. 15838-15846).
- Wang, P., Li, L., Ren, K., Jiang, B., Zhang, D., & Qiu, X. SeqXGPT: Sentence-Level AI-Generated Text Detection. In The 2023 Conference on Empirical Methods in Natural Language Processing.
- Ma, S., & Wang, Q. (2024, November). Zero-shot detection of LLM-generated text using token cohesiveness. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (pp. 17538-17553).
- Bao, G., Zhao, Y., He, J., & Zhang, Y. Glimpse: Enabling White-Box Methods to Use Proprietary Models for Zero-Shot LLM-Generated Text Detection. In The Thirteenth International Conference on Learning Representations.
- Zhou, H., Zhu, J., Su, P., Ye, K., Yang, Y., Gavioli-Akilagun, S., & Shi, C. AdaDetectGPT: Adaptive Detection of LLM-Generated Text with Statistical Guarantees. In The Thirty-ninth Annual Conference on Neural Information Processing Systems.
- Zhu, X., Ren, Y., Fang, F., Tan, Q., Wang, S., & Cao, Y. DNA-DetectLLM: Unveiling AI-Generated Text via a DNA-Inspired Mutation-Repair Paradigm. In The Thirty-ninth Annual Conference on Neural Information Processing Systems. ...
Copyright © 2026 NLP2CT Lab. All Rights Reserved.