Skip to content

Dataset for the paper: Beyond Binary: Towards Fine-Grained LLM-Generated Text Detection via Role Recognition and Involvement Measurement

Notifications You must be signed in to change notification settings

ZihaoCheng123/LLMDetect

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Beyond Binary: Towards Fine-Grained LLM-Generated Text Detection via Role Recognition and Involvement Measurement

This repository contains the data of the paper Beyond Binary: Towards Fine-Grained LLM-Generated Text Detection via Role Recognition and Involvement Measuremen. See the paper for additional details:

Cheng, Z., Zhou, L., Jiang, F., Wang, B., & Li, H. (2024). Beyond Binary: Towards Fine-Grained LLM-Generated Text Detection via Role Recognition and Involvement Measurement. Link

You can also view the dataset on Hugging Face. Link

You can download the PLM-based models on Hugging Face. Link

This is a comprehensive benchmark for LLM-Generated Text Detection.

  • LLMDetect
    • HNDC
      • train.json
      • val.json
      • test.json
    • DetectEval
      • Cross-context
        • cross-time
        • cross-prompt
        • cross-source
        • cross-cultural
        • cross-domain
      • Multi-intensity
        • Variable-Length-Extension
        • Multi-Staged-Polish

label "PR" is LLM Involvement Ratio in our task, and we use "human","draft","revise","continue" to label the text source.

If the label "human" equals 1, indicating it is Human-Author.

If the label "draft" equals 1, indicating it is LLM-Creator.

If the label "revise" equals 1, indicating it is LLM-Polisher.

If the label "continue" equals 1, indicating it is LLM-Extender.

Citation

@inproceedings{cheng2025beyond,
  title={Beyond binary: Towards fine-grained llm-generated text detection via role recognition and involvement measurement},
  author={Cheng, Zihao and Zhou, Li and Jiang, Feng and Wang, Benyou and Li, Haizhou},
  booktitle={Proceedings of the ACM on Web Conference 2025},
  pages={2677--2688},
  year={2025}
}

About

Dataset for the paper: Beyond Binary: Towards Fine-Grained LLM-Generated Text Detection via Role Recognition and Involvement Measurement

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages