Go to file
Charles95 5c7ba6f291 first commit 2024-10-21 03:55:08 +00:00
.gitattributes first commit 2024-10-21 03:55:08 +00:00
README.md first commit 2024-10-21 03:55:08 +00:00
config.json first commit 2024-10-21 03:55:08 +00:00
configuration_internlm2.py first commit 2024-10-21 03:55:08 +00:00
generation_config.json first commit 2024-10-21 03:55:08 +00:00
model-00001-of-00004.safetensors first commit 2024-10-21 03:55:08 +00:00
model-00002-of-00004.safetensors first commit 2024-10-21 03:55:08 +00:00
model-00003-of-00004.safetensors first commit 2024-10-21 03:55:08 +00:00
model-00004-of-00004.safetensors first commit 2024-10-21 03:55:08 +00:00
model.safetensors.index.json first commit 2024-10-21 03:55:08 +00:00
modeling_internlm2.py first commit 2024-10-21 03:55:08 +00:00
special_tokens_map.json first commit 2024-10-21 03:55:08 +00:00
tokenization_internlm2.py first commit 2024-10-21 03:55:08 +00:00
tokenization_internlm2_fast.py first commit 2024-10-21 03:55:08 +00:00
tokenizer.model first commit 2024-10-21 03:55:08 +00:00
tokenizer_config.json first commit 2024-10-21 03:55:08 +00:00

README.md

license language
mit
en
zh

Introduction

The ShieldLM model (paper link) initialized from internlm2-chat-7b. ShieldLM is a bilingual (Chinese and English) safety detector that mainly aims to help to detect safety issues in LLMs' generations. It aligns with general human safety standards, supports fine-grained customizable detection rules, and provides explanations for its decisions. Refer to our github repository for more detailed information.

Usage

Please refer to our github repository for the detailed usage instructions.

Performance

ShieldLM demonstrates impressive detection performance across 4 ID and OOD test sets, compared to strong baselines such as GPT-4, Llama Guard and Perspective API. Refer to our paper for more detailed evaluation results.