rwkv-7-world/README.md

1.9 KiB

language tags license datasets
en
zh
fr
es
de
pt
ru
it
ja
ko
vi
ar
pytorch
text-generation
causal-lm
rwkv
apache-2.0
HuggingFaceFW/fineweb-edu
mlfoundations/dclm-baseline-1.0
cerebras/SlimPajama-627B
EleutherAI/pile
bigcode/starcoderdata
oscar-corpus/OSCAR-2301

RWKV-7 World

Use rwkv pip package 0.8.28+ for RWKV-7 inference: https://pypi.org/project/rwkv/

Evals and more information: https://www.rwkv.com/

For developers: https://github.com/BlinkDL/RWKV-LM/tree/main/RWKV-v7

Chat demo: https://github.com/BlinkDL/ChatRWKV/blob/main/API_DEMO_CHAT.py

Model Description

RWKV-7 trained on 100+ world languages (80% English, 10% multilang, 10% code).

World-v3 = 3.1T tokens

World-v2.9 = subsampled 2T tokens

World-v2.8 = subsampled 1T tokens

Recommended fine-tuning format (use \n for newlines):

User: xxxxxxxxxxxxxxx

Assistant: xxxxxxxxxxxxxxx
xxxxxxxxxxxxxxx
xxxxxxxxxxxxxxx

User: xxxxxxxxxxxxxxx
xxxxxxxxxxxxxxx

Assistant: xxxxxxxxxxxxxxx
xxxxxxxxxxxxxxx
xxxxxxxxxxxxxxx
xxxxxxxxxxxxxxx

A good chat prompt (better replace \n\n in xxx to \n, such that there will never be extra \n\n in response):

User: hi

Assistant: Hi. I am your assistant and I will provide expert full response in full details. Please feel free to ask any question and I will always answer it.

User: xxx

Assistant:

QA prompt (better replace \n\n in xxx to \n, such that there will never be extra \n\n in response):

Question: xxx

Answer:

and

Instruction: xxx

Input: xxx

Response:

!!! There should not be any space after your final ":" or you will upset the tokenizer and see non-English reponse !!!

!!! There should not be any space after your final ":" or you will upset the tokenizer and see non-English reponse !!!

!!! There should not be any space after your final ":" or you will upset the tokenizer and see non-English reponse !!!