11 lines
336 B
Markdown
11 lines
336 B
Markdown
|
|
|
|
|
|
## Introduction
|
|
This is the evaluation script used to reproduce math benchmarks scores for AceMath-1.5B/7B/72B-Instruct models based on their outputs. The benchmark can be downloaded from [Qwen2.5-Math](https://github.com/QwenLM/Qwen2.5-Math/tree/main/evaluation/data).
|
|
|
|
## Calculate Scores
|
|
```console
|
|
python calculate_scores.py
|
|
```
|