first commit
This commit is contained in:
parent
c57f439a6d
commit
a4a66d4dd0
|
@ -0,0 +1,162 @@
|
|||
EXAONE AI Model License Agreement 1.1 - NC
|
||||
|
||||
This License Agreement (“Agreement”) is entered into between you (“Licensee”) and LG Management Development
|
||||
Institute Co., Ltd. (“Licensor”), governing the use of the EXAONE AI Model (“Model”). By downloading,
|
||||
installing, copying, or using the Model, you agree to comply with and be bound by the terms of this Agreement.
|
||||
If you do not agree to all the terms, you must not download, install, copy, or use the Model. This Agreement
|
||||
constitutes a binding legal agreement between the Licensee and Licensor.
|
||||
|
||||
1. Definitions
|
||||
1.1 Model: The artificial intelligence model provided by Licensor, which includes any software,
|
||||
algorithms, machine learning models, or related components supplied by Licensor. This definition extends
|
||||
to encompass all updates, enhancements, improvements, bug fixes, patches, or other modifications that may
|
||||
be provided by Licensor from time to time, whether automatically or manually implemented.
|
||||
1.2 Derivatives: Any modifications, alterations, enhancements, improvements, adaptations, or derivative
|
||||
works of the Model created by Licensee or any third party. This includes changes made to the Model's
|
||||
architecture, parameters, data processing methods, or any other aspect of the Model that results in a
|
||||
modification of its functionality or output.
|
||||
1.3 Output: Any data, results, content, predictions, analyses, insights, or other materials generated by
|
||||
the Model or Derivatives, regardless of whether they are in their original form or have been further
|
||||
processed or modified by the Licensee. This includes, but is not limited to, textual or numerical produced
|
||||
directly or indirectly through the use of the Model.
|
||||
1.4 Licensor: LG Management Development Institute Co., Ltd., the owner, developer, and provider of the
|
||||
EXAONE AI Model. The Licensor holds all rights, title, and interest in the Model and is responsible for
|
||||
granting licenses to use the Model under the terms specified in this Agreement.
|
||||
1.5 Licensee: The individual, organization, corporation, academic institution, government agency, or other
|
||||
entity using or intending to use the Model under the terms and conditions of this Agreement. The Licensee
|
||||
is responsible for ensuring compliance with the Agreement by all authorized users who access or utilize
|
||||
the Model on behalf of the Licensee.
|
||||
|
||||
2. License Grant
|
||||
2.1 Grant of License: Subject to the terms and conditions outlined in this Agreement, the Licensor hereby
|
||||
grants the Licensee a limited, non-exclusive, non-transferable, worldwide, and revocable license to:
|
||||
a. Access, download, install, and use the Model solely for research purposes. This includes
|
||||
evaluation, testing, academic research, experimentation, and participation in competitions, provided
|
||||
that such participation is in a non-commercial context. Notwithstanding Section 3.1, the Licensee may
|
||||
only provide the Model or Derivatives for a competition if no commercial license is granted to the
|
||||
competition organizer or any third party.
|
||||
b. Publicly disclose research results and findings derived from the use of the Model or Derivatives,
|
||||
including publishing papers or presentations.
|
||||
c. Modify the Model and create Derivatives based on the Model, provided that such modifications and
|
||||
Derivatives are used exclusively for research purposes. The Licensee may conduct experiments, perform
|
||||
analyses, and apply custom modifications to the Model to explore its capabilities and performance
|
||||
under various scenarios. If the Model is modified, the modified Model must include “EXAONE” at the
|
||||
beginning of its name.
|
||||
d. Distribute the Model and Derivatives in each case with a copy of this Agreement.
|
||||
2.2 Scope of License: The license granted herein does not authorize the Licensee to use the Model for any
|
||||
purpose not explicitly permitted under this Agreement. Any use beyond the scope of this license, including
|
||||
any commercial application or external distribution, is strictly prohibited unless explicitly agreed upon
|
||||
in writing by the Licensor.
|
||||
|
||||
3. Restrictions
|
||||
3.1 Commercial Use: The Licensee is expressly prohibited from using the Model, Derivatives, or Output for
|
||||
any commercial purposes, including but not limited to, developing or deploying products, services, or
|
||||
applications that generate revenue, whether directly or indirectly. Any commercial exploitation of the
|
||||
Model or its derivatives requires a separate commercial license agreement with the Licensor. Furthermore,
|
||||
the Licensee shall not use the Model, Derivatives or Output to develop or improve other models.
|
||||
3.2 Reverse Engineering: The Licensee shall not decompile, disassemble, reverse engineer, or attempt to
|
||||
derive the source code, underlying ideas, algorithms, or structure of the Model, except to the extent that
|
||||
such activities are expressly permitted by applicable law. Any attempt to bypass or circumvent
|
||||
technological protection measures applied to the Model is strictly prohibited.
|
||||
3.3 Unlawful Use: The Licensee shall not use the Model and Derivatives for any illegal, fraudulent, or
|
||||
unauthorized activities, nor for any purpose that violates applicable laws or regulations. This includes
|
||||
but is not limited to the creation, distribution, or dissemination of malicious, deceptive, or unlawful
|
||||
content.
|
||||
3.4 Ethical Use: The Licensee shall ensure that the Model or Derivatives is used in an ethical and
|
||||
responsible manner, adhering to the following guidelines:
|
||||
a. The Model and Derivatives shall not be used to generate, propagate, or amplify false, misleading,
|
||||
or harmful information, including fake news, misinformation, or disinformation.
|
||||
b. The Model and Derivatives shall not be employed to create, distribute, or promote content that is
|
||||
discriminatory, harassing, defamatory, abusive, or otherwise offensive to individuals or groups based
|
||||
on race, gender, sexual orientation, religion, nationality, or other protected characteristics.
|
||||
c. The Model and Derivatives shall not infringe on the rights of others, including intellectual
|
||||
property rights, privacy rights, or any other rights recognized by law. The Licensee shall obtain all
|
||||
necessary permissions and consents before using the Model and Derivatives in a manner that may impact
|
||||
the rights of third parties.
|
||||
d. The Model and Derivatives shall not be used in a way that causes harm, whether physical, mental,
|
||||
emotional, or financial, to individuals, organizations, or communities. The Licensee shall take all
|
||||
reasonable measures to prevent misuse or abuse of the Model and Derivatives that could result in harm
|
||||
or injury.
|
||||
|
||||
4. Ownership
|
||||
4.1 Intellectual Property: All rights, title, and interest in and to the Model, including any
|
||||
modifications, Derivatives, and associated documentation, are and shall remain the exclusive property of
|
||||
the Licensor. The Licensee acknowledges that this Agreement does not transfer any ownership rights to the
|
||||
Licensee. All trademarks, service marks, and logos associated with the Model are the property of the
|
||||
Licensor.
|
||||
4.2 Output: All rights, title, and interest in and to the Output generated by the Model and Derivatives
|
||||
whether in its original form or modified, are and shall remain the exclusive property of the Licensor.
|
||||
Licensee may use, modify, and distribute the Output and its derivatives for research purpose. The Licensee
|
||||
shall not claim ownership of the Output except as expressly provided in this Agreement. The Licensee may
|
||||
use the Output solely for the purposes permitted under this Agreement and shall not exploit the Output for
|
||||
unauthorized or commercial purposes.
|
||||
4.3 Attribution: In any publication or presentation of results obtained using the Model, the Licensee
|
||||
shall provide appropriate attribution to the Licensor, citing the Model's name and version, along with any
|
||||
relevant documentation or references specified by the Licensor.
|
||||
|
||||
5. No Warranty
|
||||
5.1 “As-Is” Basis: The Model, Derivatives, and Output are provided on an “as-is” and “as-available” basis,
|
||||
without any warranties or representations of any kind, whether express, implied, or statutory. The
|
||||
Licensor disclaims all warranties, including but not limited to, implied warranties of merchantability,
|
||||
fitness for a particular purpose, accuracy, reliability, non-infringement, or any warranty arising from
|
||||
the course of dealing or usage of trade.
|
||||
5.2 Performance and Reliability: The Licensor does not warrant or guarantee that the Model, Derivatives or
|
||||
Output will meet the Licensee’s requirements, that the operation of the Model, Derivatives or Output will
|
||||
be uninterrupted or error-free, or that defects in the Model will be corrected. The Licensee acknowledges
|
||||
that the use of the Model, Derivatives or Output is at its own risk and that the Model, Derivatives or
|
||||
Output may contain bugs, errors, or other limitations.
|
||||
5.3 No Endorsement: The Licensor does not endorse, approve, or certify any results, conclusions, or
|
||||
recommendations derived from the use of the Model. The Licensee is solely responsible for evaluating the
|
||||
accuracy, reliability, and suitability of the Model for its intended purposes.
|
||||
|
||||
6. Limitation of Liability
|
||||
6.1 No Liability for Damages: To the fullest extent permitted by applicable law, in no event shall the
|
||||
Licensor be liable for any special, incidental, indirect, consequential, exemplary, or punitive damages,
|
||||
including but not limited to, damages for loss of business profits, business interruption, loss of
|
||||
business information, loss of data, or any other pecuniary or non-pecuniary loss arising out of or in
|
||||
connection with the use or inability to use the Model, Derivatives or any Output, even if the Licensor has
|
||||
been advised of the possibility of such damages.
|
||||
6.2 Indemnification: The Licensee agrees to indemnify, defend, and hold harmless the Licensor, its
|
||||
affiliates, officers, directors, employees, and agents from and against any claims, liabilities, damages,
|
||||
losses, costs, or expenses (including reasonable attorneys' fees) arising out of or related to the
|
||||
Licensee's use of the Model, any Derivatives, or any Output, including any violation of this Agreement or
|
||||
applicable laws.
|
||||
|
||||
7. Termination
|
||||
7.1 Termination by Licensor: The Licensor reserves the right to terminate this Agreement and revoke the
|
||||
Licensee’s rights to use the Model at any time, with or without cause, and without prior notice if the
|
||||
Licensee breaches any of the terms or conditions of this Agreement. Termination shall be effective
|
||||
immediately upon notice.
|
||||
7.2 Effect of Termination: Upon termination of this Agreement, the Licensee must immediately cease all use
|
||||
of the Model, Derivatives, and Output and destroy all copies of the Model, Derivatives, and Output in its
|
||||
possession or control, including any backup or archival copies. The Licensee shall certify in writing to
|
||||
the Licensor that such destruction has been completed.
|
||||
7.3 Survival: The provisions of this Agreement that by their nature should survive termination, including
|
||||
but not limited to, Sections 4 (Ownership), 5 (No Warranty), 6 (Limitation of Liability), and this Section
|
||||
7 (Termination), shall continue to apply after termination.
|
||||
|
||||
8. Governing Law
|
||||
8.1 Governing Law: This Agreement shall be governed by and construed in accordance with the laws of the
|
||||
Republic of Korea, without regard to its conflict of laws principles.
|
||||
8.2 Arbitration: Any disputes, controversies, or claims arising out of or relating to this Agreement,
|
||||
including its existence, validity, interpretation, performance, breach, or termination, shall be referred
|
||||
to and finally resolved by arbitration administered by the Korean Commercial Arbitration Board (KCAB) in
|
||||
accordance with the International Arbitration Rules of the Korean Commercial Arbitration Board in force at
|
||||
the time of the commencement of the arbitration. The seat of arbitration shall be Seoul, Republic of
|
||||
Korea. The tribunal shall consist of one arbitrator. The language of the arbitration shall be English.
|
||||
|
||||
9. Alterations
|
||||
9.1 Modifications: The Licensor reserves the right to modify or amend this Agreement at any time, in its
|
||||
sole discretion. Any modifications will be effective upon posting the updated Agreement on the Licensor’s
|
||||
website or through other means of communication. The Licensee is responsible for reviewing the Agreement
|
||||
periodically for changes. Continued use of the Model after any modifications have been made constitutes
|
||||
acceptance of the revised Agreement.
|
||||
9.2 Entire Agreement: This Agreement constitutes the entire agreement between the Licensee and Licensor
|
||||
concerning the subject matter hereof and supersedes all prior or contemporaneous oral or written
|
||||
agreements, representations, or understandings. Any terms or conditions of any purchase order or other
|
||||
document submitted by the Licensee in connection with the Model that are in addition to, different from,
|
||||
or inconsistent with the terms and conditions of this Agreement are not binding on the Licensor and are
|
||||
void.
|
||||
|
||||
By downloading, installing, or using the EXAONE AI Model, the Licensee acknowledges that it has read,
|
||||
understood, and agrees to be bound by the terms and conditions of this Agreement.
|
Binary file not shown.
After Width: | Height: | Size: 243 KiB |
|
@ -0,0 +1,39 @@
|
|||
{
|
||||
"activation_function": "silu",
|
||||
"architectures": [
|
||||
"ExaoneForCausalLM"
|
||||
],
|
||||
"attention_dropout": 0.0,
|
||||
"auto_map": {
|
||||
"AutoConfig": "configuration_exaone.ExaoneConfig",
|
||||
"AutoModelForCausalLM": "modeling_exaone.ExaoneForCausalLM",
|
||||
"AutoModelForSequenceClassification": "modeling_exaone.ExaoneForSequenceClassification"
|
||||
},
|
||||
"bos_token_id": 1,
|
||||
"embed_dropout": 0.0,
|
||||
"eos_token_id": 361,
|
||||
"head_dim": 128,
|
||||
"hidden_size": 4096,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 14336,
|
||||
"layer_norm_epsilon": 1e-05,
|
||||
"max_position_embeddings": 32768,
|
||||
"model_type": "exaone",
|
||||
"num_attention_heads": 32,
|
||||
"num_key_value_heads": 8,
|
||||
"num_layers": 32,
|
||||
"pad_token_id": 0,
|
||||
"rope_scaling": {
|
||||
"factor": 8.0,
|
||||
"high_freq_factor": 4.0,
|
||||
"low_freq_factor": 1.0,
|
||||
"original_max_position_embeddings": 8192,
|
||||
"rope_type": "llama3"
|
||||
},
|
||||
"rope_theta": 1000000.0,
|
||||
"tie_word_embeddings": false,
|
||||
"torch_dtype": "float32",
|
||||
"transformers_version": "4.43.0",
|
||||
"use_cache": true,
|
||||
"vocab_size": 102400
|
||||
}
|
|
@ -0,0 +1 @@
|
|||
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}
|
|
@ -0,0 +1,183 @@
|
|||
# coding=utf-8
|
||||
# Copyright 2021 The LG AI Research EXAONE Lab. All rights reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
"""EXAONE model configuration"""
|
||||
|
||||
from transformers.configuration_utils import PretrainedConfig
|
||||
from transformers.utils import logging
|
||||
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
|
||||
EXAONE_PRETRAINED_CONFIG_ARCHIVE_MAP = {}
|
||||
|
||||
|
||||
class ExaoneConfig(PretrainedConfig):
|
||||
r"""
|
||||
This is the configuration class to store the configuration of a [`ExaoneModel`]. It is used to
|
||||
instantiate a EXAONE model according to the specified arguments, defining the model architecture. Instantiating a
|
||||
configuration with the defaults will yield a similar configuration to that of the EXAONE-3.0-7.8B-Instruct [LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct](https://huggingface.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct)
|
||||
|
||||
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model
|
||||
outputs. Read the documentation from [`PretrainedConfig`] for more information.
|
||||
|
||||
|
||||
Args:
|
||||
vocab_size (`int`, *optional*, defaults to 102400):
|
||||
Vocabulary size of the EXAONE model. Defines the number of different tokens that can be represented by the
|
||||
`inputs_ids` passed when calling [`ExaoneModel`]. Vocabulary size of the model.
|
||||
Defines the different tokens that can be represented by the `inputs_ids` passed to the forward method of
|
||||
[`ExaoneModel`].
|
||||
max_position_embeddings (`int`, *optional*, defaults to 2048):
|
||||
The maximum sequence length that this model might ever be used with. Typically set this to something large
|
||||
just in case (e.g., 512 or 1024 or 2048).
|
||||
hidden_size (`int`, *optional*, defaults to 2048):
|
||||
Dimensionality of the encoder layers and the pooler layer.
|
||||
num_layers (`int`, *optional*, defaults to 32):
|
||||
Number of hidden layers in the Transformer encoder.
|
||||
num_attention_heads (`int`, *optional*, defaults to 32):
|
||||
Number of attention heads for each attention layer in the Transformer decoder.
|
||||
num_key_value_heads (`int`, *optional*):
|
||||
This is the number of key_value heads that should be used to implement Grouped Query Attention. If
|
||||
`num_key_value_heads=num_attention_heads`, the model will use Multi Head Attention (MHA), if
|
||||
`num_key_value_heads=1 the model will use Multi Query Attention (MQA) otherwise GQA is used. When
|
||||
converting a multi-head checkpoint to a GQA checkpoint, each group key and value head should be constructed
|
||||
by meanpooling all the original heads within that group. For more details checkout [this
|
||||
paper](https://arxiv.org/pdf/2305.13245.pdf). If it is not specified, will default to
|
||||
`num_attention_heads`.
|
||||
intermediate_size (`int`, *optional*, defaults to `hidden_size * 4`):
|
||||
Dimensionality of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder.
|
||||
activation_function (`str` or `function`, *optional*, defaults to `"silu"`):
|
||||
The non-linear activation function (function or string) in the decoder.
|
||||
rope_theta (`float`, *optional*, defaults to 10000.0):
|
||||
The base period of the RoPE embeddings.
|
||||
rope_scaling (`Dict`, *optional*):
|
||||
Dictionary containing the scaling configuration for the RoPE embeddings. NOTE: if you apply new rope type
|
||||
and you expect the model to work on longer `max_position_embeddings`, we recommend you to update this value
|
||||
accordingly.
|
||||
Expected contents:
|
||||
`rope_type` (`str`):
|
||||
The sub-variant of RoPE to use. Can be one of ['default', 'linear', 'dynamic', 'yarn', 'longrope',
|
||||
'llama3'], with 'default' being the original RoPE implementation.
|
||||
`factor` (`float`, *optional*):
|
||||
Used with all rope types except 'default'. The scaling factor to apply to the RoPE embeddings. In
|
||||
most scaling types, a `factor` of x will enable the model to handle sequences of length x *
|
||||
original maximum pre-trained length.
|
||||
`original_max_position_embeddings` (`int`, *optional*):
|
||||
Used with 'dynamic', 'longrope' and 'llama3'. The original max position embeddings used during
|
||||
pretraining.
|
||||
`attention_factor` (`float`, *optional*):
|
||||
Used with 'yarn' and 'longrope'. The scaling factor to be applied on the attention
|
||||
computation. If unspecified, it defaults to value recommended by the implementation, using the
|
||||
`factor` field to infer the suggested value.
|
||||
`beta_fast` (`float`, *optional*):
|
||||
Only used with 'yarn'. Parameter to set the boundary for extrapolation (only) in the linear
|
||||
ramp function. If unspecified, it defaults to 32.
|
||||
`beta_slow` (`float`, *optional*):
|
||||
Only used with 'yarn'. Parameter to set the boundary for interpolation (only) in the linear
|
||||
ramp function. If unspecified, it defaults to 1.
|
||||
`short_factor` (`List[float]`, *optional*):
|
||||
Only used with 'longrope'. The scaling factor to be applied to short contexts (<
|
||||
`original_max_position_embeddings`). Must be a list of numbers with the same length as the hidden
|
||||
size divided by the number of attention heads divided by 2
|
||||
`long_factor` (`List[float]`, *optional*):
|
||||
Only used with 'longrope'. The scaling factor to be applied to long contexts (<
|
||||
`original_max_position_embeddings`). Must be a list of numbers with the same length as the hidden
|
||||
size divided by the number of attention heads divided by 2
|
||||
`low_freq_factor` (`float`, *optional*):
|
||||
Only used with 'llama3'. Scaling factor applied to low frequency components of the RoPE
|
||||
`high_freq_factor` (`float`, *optional*):
|
||||
Only used with 'llama3'. Scaling factor applied to high frequency components of the RoPE
|
||||
embed_dropout (`float`, *optional*, defaults to 0.0):
|
||||
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
|
||||
attention_dropout (`float`, *optional*, defaults to 0.0):
|
||||
The dropout ratio for the attention probabilities.
|
||||
layer_norm_epsilon (`float`, *optional*, defaults to 1e-05):
|
||||
The epsilon used by the layer normalization layers.
|
||||
initializer_range (`float`, *optional*, defaults to 0.02):
|
||||
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
|
||||
use_cache (`bool`, *optional*, defaults to `True`):
|
||||
Whether or not the model should return the last key/values attentions (not used by all models). Only
|
||||
relevant if ``config.is_decoder=True``.
|
||||
bos_token_id (`int`, *optional*, defaults to 0):
|
||||
Beginning of stream token id.
|
||||
eos_token_id (`int`, *optional*, defaults to 2):
|
||||
End of stream token id.
|
||||
|
||||
Example:
|
||||
|
||||
```python
|
||||
>>> from transformers import EXAONEModel, ExaoneConfig
|
||||
|
||||
>>> # Initializing a EXAONE configuration
|
||||
>>> configuration = ExaoneConfig()
|
||||
|
||||
>>> # Initializing a model from configuration
|
||||
>>> model = EXAONEModel(configuration)
|
||||
|
||||
>>> # Accessing the model configuration
|
||||
>>> configuration = model.config
|
||||
```"""
|
||||
|
||||
model_type = "exaone"
|
||||
keys_to_ignore_at_inference = ["past_key_values"]
|
||||
attribute_map = {"num_hidden_layers": "num_layers"}
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
vocab_size=102400,
|
||||
max_position_embeddings=2048,
|
||||
hidden_size=2048,
|
||||
num_layers=32,
|
||||
num_attention_heads=32,
|
||||
num_key_value_heads=None,
|
||||
intermediate_size=None,
|
||||
activation_function="silu",
|
||||
rope_theta=10000.0,
|
||||
rope_scaling=None,
|
||||
embed_dropout=0.0,
|
||||
attention_dropout=0.0,
|
||||
layer_norm_epsilon=1e-5,
|
||||
initializer_range=0.02,
|
||||
use_cache=True,
|
||||
bos_token_id=0,
|
||||
eos_token_id=2,
|
||||
**kwargs,
|
||||
):
|
||||
self.vocab_size = vocab_size
|
||||
self.max_position_embeddings = max_position_embeddings
|
||||
self.hidden_size = hidden_size
|
||||
self.num_layers = num_layers
|
||||
self.num_attention_heads = num_attention_heads
|
||||
self.num_layers = num_layers
|
||||
if num_key_value_heads is None:
|
||||
num_key_value_heads = num_attention_heads
|
||||
self.num_key_value_heads = num_key_value_heads
|
||||
if intermediate_size:
|
||||
self.intermediate_size = intermediate_size
|
||||
else:
|
||||
self.intermediate_size = hidden_size * 4
|
||||
self.activation_function = activation_function
|
||||
self.embed_dropout = embed_dropout
|
||||
self.attention_dropout = attention_dropout
|
||||
self.layer_norm_epsilon = layer_norm_epsilon
|
||||
self.initializer_range = initializer_range
|
||||
self.use_cache = use_cache
|
||||
self.rope_theta = rope_theta
|
||||
self.rope_scaling = rope_scaling
|
||||
|
||||
self.bos_token_id = bos_token_id
|
||||
self.eos_token_id = eos_token_id
|
||||
|
||||
super().__init__(bos_token_id=bos_token_id, eos_token_id=eos_token_id, **kwargs)
|
|
@ -0,0 +1,7 @@
|
|||
{
|
||||
"_from_model_config": true,
|
||||
"bos_token_id": 1,
|
||||
"eos_token_id": 361,
|
||||
"pad_token_id": 0,
|
||||
"transformers_version": "4.43.0"
|
||||
}
|
File diff suppressed because it is too large
Load Diff
Binary file not shown.
Loading…
Reference in New Issue