AceMath-7B-Instruct/evaluation
xxl 540df61db4 first commit 2025-01-21 10:48:19 +08:00
..
README.md first commit 2025-01-21 10:48:19 +08:00
calculate_scores.py first commit 2025-01-21 10:48:19 +08:00
grader.py first commit 2025-01-21 10:48:19 +08:00

README.md

Introduction

This is the evaluation script used to reproduce math benchmarks scores for AceMath-1.5B/7B/72B-Instruct models based on their outputs. The benchmark can be downloaded from Qwen2.5-Math.

Calculate Scores

python calculate_scores.py