Measure of estimated confidence for non-hallucinative nature of outputs generated by Large Language Models.
dataset datasets llm llms llm-training llm-evaluation llms-reasoning llm-evaluation-toolkit llms-benchmarking llm-evaluation-framework llm-evaluation-metrics llms-efficency llms-evalution
-
Updated
Aug 6, 2025 - Python