49 lines
1.4 KiB
Markdown
49 lines
1.4 KiB
Markdown
---
|
|
library_name: transformers
|
|
tags:
|
|
- trl
|
|
- sft
|
|
base_model:
|
|
- meta-llama/Llama-3.2-1B-Instruct
|
|
datasets:
|
|
- ngxson/MiniThinky-dataset
|
|
---
|
|
|
|
# MiniThinky 1B
|
|
|
|
This is the newer checkpoint of [MiniThinky-1B-Llama-3.2 (version 1)](https://huggingface.co/ngxson/MiniThinky-1B-Llama-3.2), which the loss decreased from 0.7 to 0.5
|
|
|
|
Link to GGUF version: [click here](https://huggingface.co/ngxson/MiniThinky-v2-1B-Llama-3.2-Q8_0-GGUF)
|
|
|
|
Chat template is the same with llama 3, but the response will be as follow:
|
|
|
|
```
|
|
<|thinking|>{thinking_process}
|
|
<|answer|>
|
|
{real_answer}
|
|
```
|
|
|
|
## IMPORTANT: System message
|
|
|
|
The model is **very sensitive** to system message. Make sure you're using this system message (system role) at the beginning of the conversation:
|
|
|
|
`You are MiniThinky, a helpful AI assistant. You always think before giving the answer. Use <|thinking|> before thinking and <|answer|> before giving the answer.`
|
|
|
|
## Q&A
|
|
|
|
**Hardware used to trained it?**
|
|
I used a HF space with 4xL40S, trained for 5 hours (v1) and an additional of 6 hours (v2)
|
|
|
|
**Benchmark?**
|
|
I don't have time to do it alone. If you can help, please open a discussion!
|
|
|
|
**Can it count number of "r" in "raspberry"?**
|
|
Unfortunately no
|
|
|
|
**Other things that I can tune?**
|
|
Maybe lower temperature, or set top_k=1
|
|
|
|
---
|
|
|
|
TODO: include more info here + maybe do some benchmarks? (Plz add a discussion if you're interested)
|