A reporting checklist for large language models (LLMs) in behavioral science has been developed to address inconsistencies in how these models are described in scientific research. The checklist, published in a peer-reviewed journal, aims to improve reproducibility and transparency by requiring researchers to specify the exact model version, parameters, and training data used.
Key elements include documenting the model's architecture, fine-tuning details, and any prompt engineering techniques. The checklist also emphasizes the need to report potential biases and limitations of the LLM, as well as the date of access, since models are frequently updated.
This initiative responds to growing concerns about the reliability of studies using LLMs like GPT-4 or Claude, where vague reporting can make results impossible to replicate. The checklist is designed to be adaptable for different research contexts and model types.