first commit
This commit is contained in:
parent
c57f439a6d
commit
a4a66d4dd0
|
@ -0,0 +1,162 @@
|
||||||
|
EXAONE AI Model License Agreement 1.1 - NC
|
||||||
|
|
||||||
|
This License Agreement (“Agreement”) is entered into between you (“Licensee”) and LG Management Development
|
||||||
|
Institute Co., Ltd. (“Licensor”), governing the use of the EXAONE AI Model (“Model”). By downloading,
|
||||||
|
installing, copying, or using the Model, you agree to comply with and be bound by the terms of this Agreement.
|
||||||
|
If you do not agree to all the terms, you must not download, install, copy, or use the Model. This Agreement
|
||||||
|
constitutes a binding legal agreement between the Licensee and Licensor.
|
||||||
|
|
||||||
|
1. Definitions
|
||||||
|
1.1 Model: The artificial intelligence model provided by Licensor, which includes any software,
|
||||||
|
algorithms, machine learning models, or related components supplied by Licensor. This definition extends
|
||||||
|
to encompass all updates, enhancements, improvements, bug fixes, patches, or other modifications that may
|
||||||
|
be provided by Licensor from time to time, whether automatically or manually implemented.
|
||||||
|
1.2 Derivatives: Any modifications, alterations, enhancements, improvements, adaptations, or derivative
|
||||||
|
works of the Model created by Licensee or any third party. This includes changes made to the Model's
|
||||||
|
architecture, parameters, data processing methods, or any other aspect of the Model that results in a
|
||||||
|
modification of its functionality or output.
|
||||||
|
1.3 Output: Any data, results, content, predictions, analyses, insights, or other materials generated by
|
||||||
|
the Model or Derivatives, regardless of whether they are in their original form or have been further
|
||||||
|
processed or modified by the Licensee. This includes, but is not limited to, textual or numerical produced
|
||||||
|
directly or indirectly through the use of the Model.
|
||||||
|
1.4 Licensor: LG Management Development Institute Co., Ltd., the owner, developer, and provider of the
|
||||||
|
EXAONE AI Model. The Licensor holds all rights, title, and interest in the Model and is responsible for
|
||||||
|
granting licenses to use the Model under the terms specified in this Agreement.
|
||||||
|
1.5 Licensee: The individual, organization, corporation, academic institution, government agency, or other
|
||||||
|
entity using or intending to use the Model under the terms and conditions of this Agreement. The Licensee
|
||||||
|
is responsible for ensuring compliance with the Agreement by all authorized users who access or utilize
|
||||||
|
the Model on behalf of the Licensee.
|
||||||
|
|
||||||
|
2. License Grant
|
||||||
|
2.1 Grant of License: Subject to the terms and conditions outlined in this Agreement, the Licensor hereby
|
||||||
|
grants the Licensee a limited, non-exclusive, non-transferable, worldwide, and revocable license to:
|
||||||
|
a. Access, download, install, and use the Model solely for research purposes. This includes
|
||||||
|
evaluation, testing, academic research, experimentation, and participation in competitions, provided
|
||||||
|
that such participation is in a non-commercial context. Notwithstanding Section 3.1, the Licensee may
|
||||||
|
only provide the Model or Derivatives for a competition if no commercial license is granted to the
|
||||||
|
competition organizer or any third party.
|
||||||
|
b. Publicly disclose research results and findings derived from the use of the Model or Derivatives,
|
||||||
|
including publishing papers or presentations.
|
||||||
|
c. Modify the Model and create Derivatives based on the Model, provided that such modifications and
|
||||||
|
Derivatives are used exclusively for research purposes. The Licensee may conduct experiments, perform
|
||||||
|
analyses, and apply custom modifications to the Model to explore its capabilities and performance
|
||||||
|
under various scenarios. If the Model is modified, the modified Model must include “EXAONE” at the
|
||||||
|
beginning of its name.
|
||||||
|
d. Distribute the Model and Derivatives in each case with a copy of this Agreement.
|
||||||
|
2.2 Scope of License: The license granted herein does not authorize the Licensee to use the Model for any
|
||||||
|
purpose not explicitly permitted under this Agreement. Any use beyond the scope of this license, including
|
||||||
|
any commercial application or external distribution, is strictly prohibited unless explicitly agreed upon
|
||||||
|
in writing by the Licensor.
|
||||||
|
|
||||||
|
3. Restrictions
|
||||||
|
3.1 Commercial Use: The Licensee is expressly prohibited from using the Model, Derivatives, or Output for
|
||||||
|
any commercial purposes, including but not limited to, developing or deploying products, services, or
|
||||||
|
applications that generate revenue, whether directly or indirectly. Any commercial exploitation of the
|
||||||
|
Model or its derivatives requires a separate commercial license agreement with the Licensor. Furthermore,
|
||||||
|
the Licensee shall not use the Model, Derivatives or Output to develop or improve other models.
|
||||||
|
3.2 Reverse Engineering: The Licensee shall not decompile, disassemble, reverse engineer, or attempt to
|
||||||
|
derive the source code, underlying ideas, algorithms, or structure of the Model, except to the extent that
|
||||||
|
such activities are expressly permitted by applicable law. Any attempt to bypass or circumvent
|
||||||
|
technological protection measures applied to the Model is strictly prohibited.
|
||||||
|
3.3 Unlawful Use: The Licensee shall not use the Model and Derivatives for any illegal, fraudulent, or
|
||||||
|
unauthorized activities, nor for any purpose that violates applicable laws or regulations. This includes
|
||||||
|
but is not limited to the creation, distribution, or dissemination of malicious, deceptive, or unlawful
|
||||||
|
content.
|
||||||
|
3.4 Ethical Use: The Licensee shall ensure that the Model or Derivatives is used in an ethical and
|
||||||
|
responsible manner, adhering to the following guidelines:
|
||||||
|
a. The Model and Derivatives shall not be used to generate, propagate, or amplify false, misleading,
|
||||||
|
or harmful information, including fake news, misinformation, or disinformation.
|
||||||
|
b. The Model and Derivatives shall not be employed to create, distribute, or promote content that is
|
||||||
|
discriminatory, harassing, defamatory, abusive, or otherwise offensive to individuals or groups based
|
||||||
|
on race, gender, sexual orientation, religion, nationality, or other protected characteristics.
|
||||||
|
c. The Model and Derivatives shall not infringe on the rights of others, including intellectual
|
||||||
|
property rights, privacy rights, or any other rights recognized by law. The Licensee shall obtain all
|
||||||
|
necessary permissions and consents before using the Model and Derivatives in a manner that may impact
|
||||||
|
the rights of third parties.
|
||||||
|
d. The Model and Derivatives shall not be used in a way that causes harm, whether physical, mental,
|
||||||
|
emotional, or financial, to individuals, organizations, or communities. The Licensee shall take all
|
||||||
|
reasonable measures to prevent misuse or abuse of the Model and Derivatives that could result in harm
|
||||||
|
or injury.
|
||||||
|
|
||||||
|
4. Ownership
|
||||||
|
4.1 Intellectual Property: All rights, title, and interest in and to the Model, including any
|
||||||
|
modifications, Derivatives, and associated documentation, are and shall remain the exclusive property of
|
||||||
|
the Licensor. The Licensee acknowledges that this Agreement does not transfer any ownership rights to the
|
||||||
|
Licensee. All trademarks, service marks, and logos associated with the Model are the property of the
|
||||||
|
Licensor.
|
||||||
|
4.2 Output: All rights, title, and interest in and to the Output generated by the Model and Derivatives
|
||||||
|
whether in its original form or modified, are and shall remain the exclusive property of the Licensor.
|
||||||
|
Licensee may use, modify, and distribute the Output and its derivatives for research purpose. The Licensee
|
||||||
|
shall not claim ownership of the Output except as expressly provided in this Agreement. The Licensee may
|
||||||
|
use the Output solely for the purposes permitted under this Agreement and shall not exploit the Output for
|
||||||
|
unauthorized or commercial purposes.
|
||||||
|
4.3 Attribution: In any publication or presentation of results obtained using the Model, the Licensee
|
||||||
|
shall provide appropriate attribution to the Licensor, citing the Model's name and version, along with any
|
||||||
|
relevant documentation or references specified by the Licensor.
|
||||||
|
|
||||||
|
5. No Warranty
|
||||||
|
5.1 “As-Is” Basis: The Model, Derivatives, and Output are provided on an “as-is” and “as-available” basis,
|
||||||
|
without any warranties or representations of any kind, whether express, implied, or statutory. The
|
||||||
|
Licensor disclaims all warranties, including but not limited to, implied warranties of merchantability,
|
||||||
|
fitness for a particular purpose, accuracy, reliability, non-infringement, or any warranty arising from
|
||||||
|
the course of dealing or usage of trade.
|
||||||
|
5.2 Performance and Reliability: The Licensor does not warrant or guarantee that the Model, Derivatives or
|
||||||
|
Output will meet the Licensee’s requirements, that the operation of the Model, Derivatives or Output will
|
||||||
|
be uninterrupted or error-free, or that defects in the Model will be corrected. The Licensee acknowledges
|
||||||
|
that the use of the Model, Derivatives or Output is at its own risk and that the Model, Derivatives or
|
||||||
|
Output may contain bugs, errors, or other limitations.
|
||||||
|
5.3 No Endorsement: The Licensor does not endorse, approve, or certify any results, conclusions, or
|
||||||
|
recommendations derived from the use of the Model. The Licensee is solely responsible for evaluating the
|
||||||
|
accuracy, reliability, and suitability of the Model for its intended purposes.
|
||||||
|
|
||||||
|
6. Limitation of Liability
|
||||||
|
6.1 No Liability for Damages: To the fullest extent permitted by applicable law, in no event shall the
|
||||||
|
Licensor be liable for any special, incidental, indirect, consequential, exemplary, or punitive damages,
|
||||||
|
including but not limited to, damages for loss of business profits, business interruption, loss of
|
||||||
|
business information, loss of data, or any other pecuniary or non-pecuniary loss arising out of or in
|
||||||
|
connection with the use or inability to use the Model, Derivatives or any Output, even if the Licensor has
|
||||||
|
been advised of the possibility of such damages.
|
||||||
|
6.2 Indemnification: The Licensee agrees to indemnify, defend, and hold harmless the Licensor, its
|
||||||
|
affiliates, officers, directors, employees, and agents from and against any claims, liabilities, damages,
|
||||||
|
losses, costs, or expenses (including reasonable attorneys' fees) arising out of or related to the
|
||||||
|
Licensee's use of the Model, any Derivatives, or any Output, including any violation of this Agreement or
|
||||||
|
applicable laws.
|
||||||
|
|
||||||
|
7. Termination
|
||||||
|
7.1 Termination by Licensor: The Licensor reserves the right to terminate this Agreement and revoke the
|
||||||
|
Licensee’s rights to use the Model at any time, with or without cause, and without prior notice if the
|
||||||
|
Licensee breaches any of the terms or conditions of this Agreement. Termination shall be effective
|
||||||
|
immediately upon notice.
|
||||||
|
7.2 Effect of Termination: Upon termination of this Agreement, the Licensee must immediately cease all use
|
||||||
|
of the Model, Derivatives, and Output and destroy all copies of the Model, Derivatives, and Output in its
|
||||||
|
possession or control, including any backup or archival copies. The Licensee shall certify in writing to
|
||||||
|
the Licensor that such destruction has been completed.
|
||||||
|
7.3 Survival: The provisions of this Agreement that by their nature should survive termination, including
|
||||||
|
but not limited to, Sections 4 (Ownership), 5 (No Warranty), 6 (Limitation of Liability), and this Section
|
||||||
|
7 (Termination), shall continue to apply after termination.
|
||||||
|
|
||||||
|
8. Governing Law
|
||||||
|
8.1 Governing Law: This Agreement shall be governed by and construed in accordance with the laws of the
|
||||||
|
Republic of Korea, without regard to its conflict of laws principles.
|
||||||
|
8.2 Arbitration: Any disputes, controversies, or claims arising out of or relating to this Agreement,
|
||||||
|
including its existence, validity, interpretation, performance, breach, or termination, shall be referred
|
||||||
|
to and finally resolved by arbitration administered by the Korean Commercial Arbitration Board (KCAB) in
|
||||||
|
accordance with the International Arbitration Rules of the Korean Commercial Arbitration Board in force at
|
||||||
|
the time of the commencement of the arbitration. The seat of arbitration shall be Seoul, Republic of
|
||||||
|
Korea. The tribunal shall consist of one arbitrator. The language of the arbitration shall be English.
|
||||||
|
|
||||||
|
9. Alterations
|
||||||
|
9.1 Modifications: The Licensor reserves the right to modify or amend this Agreement at any time, in its
|
||||||
|
sole discretion. Any modifications will be effective upon posting the updated Agreement on the Licensor’s
|
||||||
|
website or through other means of communication. The Licensee is responsible for reviewing the Agreement
|
||||||
|
periodically for changes. Continued use of the Model after any modifications have been made constitutes
|
||||||
|
acceptance of the revised Agreement.
|
||||||
|
9.2 Entire Agreement: This Agreement constitutes the entire agreement between the Licensee and Licensor
|
||||||
|
concerning the subject matter hereof and supersedes all prior or contemporaneous oral or written
|
||||||
|
agreements, representations, or understandings. Any terms or conditions of any purchase order or other
|
||||||
|
document submitted by the Licensee in connection with the Model that are in addition to, different from,
|
||||||
|
or inconsistent with the terms and conditions of this Agreement are not binding on the Licensor and are
|
||||||
|
void.
|
||||||
|
|
||||||
|
By downloading, installing, or using the EXAONE AI Model, the Licensee acknowledges that it has read,
|
||||||
|
understood, and agrees to be bound by the terms and conditions of this Agreement.
|
Binary file not shown.
After Width: | Height: | Size: 243 KiB |
|
@ -0,0 +1,39 @@
|
||||||
|
{
|
||||||
|
"activation_function": "silu",
|
||||||
|
"architectures": [
|
||||||
|
"ExaoneForCausalLM"
|
||||||
|
],
|
||||||
|
"attention_dropout": 0.0,
|
||||||
|
"auto_map": {
|
||||||
|
"AutoConfig": "configuration_exaone.ExaoneConfig",
|
||||||
|
"AutoModelForCausalLM": "modeling_exaone.ExaoneForCausalLM",
|
||||||
|
"AutoModelForSequenceClassification": "modeling_exaone.ExaoneForSequenceClassification"
|
||||||
|
},
|
||||||
|
"bos_token_id": 1,
|
||||||
|
"embed_dropout": 0.0,
|
||||||
|
"eos_token_id": 361,
|
||||||
|
"head_dim": 128,
|
||||||
|
"hidden_size": 4096,
|
||||||
|
"initializer_range": 0.02,
|
||||||
|
"intermediate_size": 14336,
|
||||||
|
"layer_norm_epsilon": 1e-05,
|
||||||
|
"max_position_embeddings": 32768,
|
||||||
|
"model_type": "exaone",
|
||||||
|
"num_attention_heads": 32,
|
||||||
|
"num_key_value_heads": 8,
|
||||||
|
"num_layers": 32,
|
||||||
|
"pad_token_id": 0,
|
||||||
|
"rope_scaling": {
|
||||||
|
"factor": 8.0,
|
||||||
|
"high_freq_factor": 4.0,
|
||||||
|
"low_freq_factor": 1.0,
|
||||||
|
"original_max_position_embeddings": 8192,
|
||||||
|
"rope_type": "llama3"
|
||||||
|
},
|
||||||
|
"rope_theta": 1000000.0,
|
||||||
|
"tie_word_embeddings": false,
|
||||||
|
"torch_dtype": "float32",
|
||||||
|
"transformers_version": "4.43.0",
|
||||||
|
"use_cache": true,
|
||||||
|
"vocab_size": 102400
|
||||||
|
}
|
|
@ -0,0 +1 @@
|
||||||
|
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}
|
|
@ -0,0 +1,183 @@
|
||||||
|
# coding=utf-8
|
||||||
|
# Copyright 2021 The LG AI Research EXAONE Lab. All rights reserved.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
"""EXAONE model configuration"""
|
||||||
|
|
||||||
|
from transformers.configuration_utils import PretrainedConfig
|
||||||
|
from transformers.utils import logging
|
||||||
|
|
||||||
|
|
||||||
|
logger = logging.get_logger(__name__)
|
||||||
|
|
||||||
|
EXAONE_PRETRAINED_CONFIG_ARCHIVE_MAP = {}
|
||||||
|
|
||||||
|
|
||||||
|
class ExaoneConfig(PretrainedConfig):
|
||||||
|
r"""
|
||||||
|
This is the configuration class to store the configuration of a [`ExaoneModel`]. It is used to
|
||||||
|
instantiate a EXAONE model according to the specified arguments, defining the model architecture. Instantiating a
|
||||||
|
configuration with the defaults will yield a similar configuration to that of the EXAONE-3.0-7.8B-Instruct [LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct](https://huggingface.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct)
|
||||||
|
|
||||||
|
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model
|
||||||
|
outputs. Read the documentation from [`PretrainedConfig`] for more information.
|
||||||
|
|
||||||
|
|
||||||
|
Args:
|
||||||
|
vocab_size (`int`, *optional*, defaults to 102400):
|
||||||
|
Vocabulary size of the EXAONE model. Defines the number of different tokens that can be represented by the
|
||||||
|
`inputs_ids` passed when calling [`ExaoneModel`]. Vocabulary size of the model.
|
||||||
|
Defines the different tokens that can be represented by the `inputs_ids` passed to the forward method of
|
||||||
|
[`ExaoneModel`].
|
||||||
|
max_position_embeddings (`int`, *optional*, defaults to 2048):
|
||||||
|
The maximum sequence length that this model might ever be used with. Typically set this to something large
|
||||||
|
just in case (e.g., 512 or 1024 or 2048).
|
||||||
|
hidden_size (`int`, *optional*, defaults to 2048):
|
||||||
|
Dimensionality of the encoder layers and the pooler layer.
|
||||||
|
num_layers (`int`, *optional*, defaults to 32):
|
||||||
|
Number of hidden layers in the Transformer encoder.
|
||||||
|
num_attention_heads (`int`, *optional*, defaults to 32):
|
||||||
|
Number of attention heads for each attention layer in the Transformer decoder.
|
||||||
|
num_key_value_heads (`int`, *optional*):
|
||||||
|
This is the number of key_value heads that should be used to implement Grouped Query Attention. If
|
||||||
|
`num_key_value_heads=num_attention_heads`, the model will use Multi Head Attention (MHA), if
|
||||||
|
`num_key_value_heads=1 the model will use Multi Query Attention (MQA) otherwise GQA is used. When
|
||||||
|
converting a multi-head checkpoint to a GQA checkpoint, each group key and value head should be constructed
|
||||||
|
by meanpooling all the original heads within that group. For more details checkout [this
|
||||||
|
paper](https://arxiv.org/pdf/2305.13245.pdf). If it is not specified, will default to
|
||||||
|
`num_attention_heads`.
|
||||||
|
intermediate_size (`int`, *optional*, defaults to `hidden_size * 4`):
|
||||||
|
Dimensionality of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder.
|
||||||
|
activation_function (`str` or `function`, *optional*, defaults to `"silu"`):
|
||||||
|
The non-linear activation function (function or string) in the decoder.
|
||||||
|
rope_theta (`float`, *optional*, defaults to 10000.0):
|
||||||
|
The base period of the RoPE embeddings.
|
||||||
|
rope_scaling (`Dict`, *optional*):
|
||||||
|
Dictionary containing the scaling configuration for the RoPE embeddings. NOTE: if you apply new rope type
|
||||||
|
and you expect the model to work on longer `max_position_embeddings`, we recommend you to update this value
|
||||||
|
accordingly.
|
||||||
|
Expected contents:
|
||||||
|
`rope_type` (`str`):
|
||||||
|
The sub-variant of RoPE to use. Can be one of ['default', 'linear', 'dynamic', 'yarn', 'longrope',
|
||||||
|
'llama3'], with 'default' being the original RoPE implementation.
|
||||||
|
`factor` (`float`, *optional*):
|
||||||
|
Used with all rope types except 'default'. The scaling factor to apply to the RoPE embeddings. In
|
||||||
|
most scaling types, a `factor` of x will enable the model to handle sequences of length x *
|
||||||
|
original maximum pre-trained length.
|
||||||
|
`original_max_position_embeddings` (`int`, *optional*):
|
||||||
|
Used with 'dynamic', 'longrope' and 'llama3'. The original max position embeddings used during
|
||||||
|
pretraining.
|
||||||
|
`attention_factor` (`float`, *optional*):
|
||||||
|
Used with 'yarn' and 'longrope'. The scaling factor to be applied on the attention
|
||||||
|
computation. If unspecified, it defaults to value recommended by the implementation, using the
|
||||||
|
`factor` field to infer the suggested value.
|
||||||
|
`beta_fast` (`float`, *optional*):
|
||||||
|
Only used with 'yarn'. Parameter to set the boundary for extrapolation (only) in the linear
|
||||||
|
ramp function. If unspecified, it defaults to 32.
|
||||||
|
`beta_slow` (`float`, *optional*):
|
||||||
|
Only used with 'yarn'. Parameter to set the boundary for interpolation (only) in the linear
|
||||||
|
ramp function. If unspecified, it defaults to 1.
|
||||||
|
`short_factor` (`List[float]`, *optional*):
|
||||||
|
Only used with 'longrope'. The scaling factor to be applied to short contexts (<
|
||||||
|
`original_max_position_embeddings`). Must be a list of numbers with the same length as the hidden
|
||||||
|
size divided by the number of attention heads divided by 2
|
||||||
|
`long_factor` (`List[float]`, *optional*):
|
||||||
|
Only used with 'longrope'. The scaling factor to be applied to long contexts (<
|
||||||
|
`original_max_position_embeddings`). Must be a list of numbers with the same length as the hidden
|
||||||
|
size divided by the number of attention heads divided by 2
|
||||||
|
`low_freq_factor` (`float`, *optional*):
|
||||||
|
Only used with 'llama3'. Scaling factor applied to low frequency components of the RoPE
|
||||||
|
`high_freq_factor` (`float`, *optional*):
|
||||||
|
Only used with 'llama3'. Scaling factor applied to high frequency components of the RoPE
|
||||||
|
embed_dropout (`float`, *optional*, defaults to 0.0):
|
||||||
|
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
|
||||||
|
attention_dropout (`float`, *optional*, defaults to 0.0):
|
||||||
|
The dropout ratio for the attention probabilities.
|
||||||
|
layer_norm_epsilon (`float`, *optional*, defaults to 1e-05):
|
||||||
|
The epsilon used by the layer normalization layers.
|
||||||
|
initializer_range (`float`, *optional*, defaults to 0.02):
|
||||||
|
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
|
||||||
|
use_cache (`bool`, *optional*, defaults to `True`):
|
||||||
|
Whether or not the model should return the last key/values attentions (not used by all models). Only
|
||||||
|
relevant if ``config.is_decoder=True``.
|
||||||
|
bos_token_id (`int`, *optional*, defaults to 0):
|
||||||
|
Beginning of stream token id.
|
||||||
|
eos_token_id (`int`, *optional*, defaults to 2):
|
||||||
|
End of stream token id.
|
||||||
|
|
||||||
|
Example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> from transformers import EXAONEModel, ExaoneConfig
|
||||||
|
|
||||||
|
>>> # Initializing a EXAONE configuration
|
||||||
|
>>> configuration = ExaoneConfig()
|
||||||
|
|
||||||
|
>>> # Initializing a model from configuration
|
||||||
|
>>> model = EXAONEModel(configuration)
|
||||||
|
|
||||||
|
>>> # Accessing the model configuration
|
||||||
|
>>> configuration = model.config
|
||||||
|
```"""
|
||||||
|
|
||||||
|
model_type = "exaone"
|
||||||
|
keys_to_ignore_at_inference = ["past_key_values"]
|
||||||
|
attribute_map = {"num_hidden_layers": "num_layers"}
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
vocab_size=102400,
|
||||||
|
max_position_embeddings=2048,
|
||||||
|
hidden_size=2048,
|
||||||
|
num_layers=32,
|
||||||
|
num_attention_heads=32,
|
||||||
|
num_key_value_heads=None,
|
||||||
|
intermediate_size=None,
|
||||||
|
activation_function="silu",
|
||||||
|
rope_theta=10000.0,
|
||||||
|
rope_scaling=None,
|
||||||
|
embed_dropout=0.0,
|
||||||
|
attention_dropout=0.0,
|
||||||
|
layer_norm_epsilon=1e-5,
|
||||||
|
initializer_range=0.02,
|
||||||
|
use_cache=True,
|
||||||
|
bos_token_id=0,
|
||||||
|
eos_token_id=2,
|
||||||
|
**kwargs,
|
||||||
|
):
|
||||||
|
self.vocab_size = vocab_size
|
||||||
|
self.max_position_embeddings = max_position_embeddings
|
||||||
|
self.hidden_size = hidden_size
|
||||||
|
self.num_layers = num_layers
|
||||||
|
self.num_attention_heads = num_attention_heads
|
||||||
|
self.num_layers = num_layers
|
||||||
|
if num_key_value_heads is None:
|
||||||
|
num_key_value_heads = num_attention_heads
|
||||||
|
self.num_key_value_heads = num_key_value_heads
|
||||||
|
if intermediate_size:
|
||||||
|
self.intermediate_size = intermediate_size
|
||||||
|
else:
|
||||||
|
self.intermediate_size = hidden_size * 4
|
||||||
|
self.activation_function = activation_function
|
||||||
|
self.embed_dropout = embed_dropout
|
||||||
|
self.attention_dropout = attention_dropout
|
||||||
|
self.layer_norm_epsilon = layer_norm_epsilon
|
||||||
|
self.initializer_range = initializer_range
|
||||||
|
self.use_cache = use_cache
|
||||||
|
self.rope_theta = rope_theta
|
||||||
|
self.rope_scaling = rope_scaling
|
||||||
|
|
||||||
|
self.bos_token_id = bos_token_id
|
||||||
|
self.eos_token_id = eos_token_id
|
||||||
|
|
||||||
|
super().__init__(bos_token_id=bos_token_id, eos_token_id=eos_token_id, **kwargs)
|
|
@ -0,0 +1,7 @@
|
||||||
|
{
|
||||||
|
"_from_model_config": true,
|
||||||
|
"bos_token_id": 1,
|
||||||
|
"eos_token_id": 361,
|
||||||
|
"pad_token_id": 0,
|
||||||
|
"transformers_version": "4.43.0"
|
||||||
|
}
|
File diff suppressed because it is too large
Load Diff
Binary file not shown.
Loading…
Reference in New Issue