first commit

2024-12-31 16:57:43 +08:00 · 2024-12-31 16:57:43 +08:00 · a4a66d4dd0
parent c57f439a6d
commit a4a66d4dd0
8 changed files with 102178 additions and 0 deletions
--- a/162
+++ b/162
@ -0,0 +1,162 @@
+EXAONE AI Model License Agreement 1.1 - NC
+
+This License Agreement (“Agreement”) is entered into between you (“Licensee”) and LG Management Development 
+Institute Co., Ltd. (“Licensor”), governing the use of the EXAONE AI Model (“Model”). By downloading, 
+installing, copying, or using the Model, you agree to comply with and be bound by the terms of this Agreement.
+If you do not agree to all the terms, you must not download, install, copy, or use the Model. This Agreement 
+constitutes a binding legal agreement between the Licensee and Licensor.
+
+1. Definitions
+    1.1 Model: The artificial intelligence model provided by Licensor, which includes any software, 
+    algorithms, machine learning models, or related components supplied by Licensor. This definition extends 
+    to encompass all updates, enhancements, improvements, bug fixes, patches, or other modifications that may 
+    be provided by Licensor from time to time, whether automatically or manually implemented.
+    1.2 Derivatives: Any modifications, alterations, enhancements, improvements, adaptations, or derivative 
+    works of the Model created by Licensee or any third party. This includes changes made to the Model's 
+    architecture, parameters, data processing methods, or any other aspect of the Model that results in a 
+    modification of its functionality or output.
+    1.3 Output: Any data, results, content, predictions, analyses, insights, or other materials generated by 
+    the Model or Derivatives, regardless of whether they are in their original form or have been further 
+    processed or modified by the Licensee. This includes, but is not limited to, textual or numerical produced 
+    directly or indirectly through the use of the Model.
+    1.4 Licensor: LG Management Development Institute Co., Ltd., the owner, developer, and provider of the 
+    EXAONE AI Model. The Licensor holds all rights, title, and interest in the Model and is responsible for 
+    granting licenses to use the Model under the terms specified in this Agreement.
+    1.5 Licensee: The individual, organization, corporation, academic institution, government agency, or other 
+    entity using or intending to use the Model under the terms and conditions of this Agreement. The Licensee 
+    is responsible for ensuring compliance with the Agreement by all authorized users who access or utilize 
+    the Model on behalf of the Licensee.
+
+2. License Grant
+    2.1 Grant of License: Subject to the terms and conditions outlined in this Agreement, the Licensor hereby 
+    grants the Licensee a limited, non-exclusive, non-transferable, worldwide, and revocable license to:
+        a. Access, download, install, and use the Model solely for research purposes. This includes 
+        evaluation, testing, academic research, experimentation, and participation in competitions, provided 
+        that such participation is in a non-commercial context. Notwithstanding Section 3.1, the Licensee may 
+        only provide the Model or Derivatives for a competition if no commercial license is granted to the 
+        competition organizer or any third party.
+        b. Publicly disclose research results and findings derived from the use of the Model or Derivatives, 
+        including publishing papers or presentations.
+        c. Modify the Model and create Derivatives based on the Model, provided that such modifications and 
+        Derivatives are used exclusively for research purposes. The Licensee may conduct experiments, perform 
+        analyses, and apply custom modifications to the Model to explore its capabilities and performance 
+        under various scenarios. If the Model is modified, the modified Model must include “EXAONE” at the 
+        beginning of its name.
+        d. Distribute the Model and Derivatives in each case with a copy of this Agreement.
+    2.2 Scope of License: The license granted herein does not authorize the Licensee to use the Model for any 
+    purpose not explicitly permitted under this Agreement. Any use beyond the scope of this license, including 
+    any commercial application or external distribution, is strictly prohibited unless explicitly agreed upon 
+    in writing by the Licensor.
+
+3. Restrictions
+    3.1 Commercial Use: The Licensee is expressly prohibited from using the Model, Derivatives, or Output for 
+    any commercial purposes, including but not limited to, developing or deploying products, services, or 
+    applications that generate revenue, whether directly or indirectly. Any commercial exploitation of the 
+    Model or its derivatives requires a separate commercial license agreement with the Licensor. Furthermore, 
+    the Licensee shall not use the Model, Derivatives or Output to develop or improve other models.
+    3.2 Reverse Engineering: The Licensee shall not decompile, disassemble, reverse engineer, or attempt to 
+    derive the source code, underlying ideas, algorithms, or structure of the Model, except to the extent that 
+    such activities are expressly permitted by applicable law. Any attempt to bypass or circumvent 
+    technological protection measures applied to the Model is strictly prohibited.
+    3.3 Unlawful Use: The Licensee shall not use the Model and Derivatives for any illegal, fraudulent, or 
+    unauthorized activities, nor for any purpose that violates applicable laws or regulations. This includes 
+    but is not limited to the creation, distribution, or dissemination of malicious, deceptive, or unlawful 
+    content.
+    3.4 Ethical Use: The Licensee shall ensure that the Model or Derivatives is used in an ethical and 
+    responsible manner, adhering to the following guidelines:
+        a. The Model and Derivatives shall not be used to generate, propagate, or amplify false, misleading, 
+        or harmful information, including fake news, misinformation, or disinformation.
+        b. The Model and Derivatives shall not be employed to create, distribute, or promote content that is 
+        discriminatory, harassing, defamatory, abusive, or otherwise offensive to individuals or groups based 
+        on race, gender, sexual orientation, religion, nationality, or other protected characteristics.
+        c. The Model and Derivatives shall not infringe on the rights of others, including intellectual 
+        property rights, privacy rights, or any other rights recognized by law. The Licensee shall obtain all 
+        necessary permissions and consents before using the Model and Derivatives in a manner that may impact 
+        the rights of third parties.
+        d. The Model and Derivatives shall not be used in a way that causes harm, whether physical, mental, 
+        emotional, or financial, to individuals, organizations, or communities. The Licensee shall take all 
+        reasonable measures to prevent misuse or abuse of the Model and Derivatives that could result in harm 
+        or injury.
+
+4. Ownership
+    4.1 Intellectual Property: All rights, title, and interest in and to the Model, including any 
+    modifications, Derivatives, and associated documentation, are and shall remain the exclusive property of 
+    the Licensor. The Licensee acknowledges that this Agreement does not transfer any ownership rights to the 
+    Licensee. All trademarks, service marks, and logos associated with the Model are the property of the 
+    Licensor.
+    4.2 Output: All rights, title, and interest in and to the Output generated by the Model and Derivatives 
+    whether in its original form or modified, are and shall remain the exclusive property of the Licensor.
+    Licensee may use, modify, and distribute the Output and its derivatives for research purpose. The Licensee 
+    shall not claim ownership of the Output except as expressly provided in this Agreement. The Licensee may 
+    use the Output solely for the purposes permitted under this Agreement and shall not exploit the Output for 
+    unauthorized or commercial purposes.
+    4.3 Attribution: In any publication or presentation of results obtained using the Model, the Licensee 
+    shall provide appropriate attribution to the Licensor, citing the Model's name and version, along with any 
+    relevant documentation or references specified by the Licensor.
+
+5. No Warranty
+    5.1 “As-Is” Basis: The Model, Derivatives, and Output are provided on an “as-is” and “as-available” basis, 
+    without any warranties or representations of any kind, whether express, implied, or statutory. The 
+    Licensor disclaims all warranties, including but not limited to, implied warranties of merchantability, 
+    fitness for a particular purpose, accuracy, reliability, non-infringement, or any warranty arising from 
+    the course of dealing or usage of trade.
+    5.2 Performance and Reliability: The Licensor does not warrant or guarantee that the Model, Derivatives or 
+    Output will meet the Licensee’s requirements, that the operation of the Model, Derivatives or Output will 
+    be uninterrupted or error-free, or that defects in the Model will be corrected. The Licensee acknowledges 
+    that the use of the Model, Derivatives or Output is at its own risk and that the Model, Derivatives or 
+    Output may contain bugs, errors, or other limitations.
+    5.3 No Endorsement: The Licensor does not endorse, approve, or certify any results, conclusions, or 
+    recommendations derived from the use of the Model. The Licensee is solely responsible for evaluating the 
+    accuracy, reliability, and suitability of the Model for its intended purposes.
+
+6. Limitation of Liability
+    6.1 No Liability for Damages: To the fullest extent permitted by applicable law, in no event shall the 
+    Licensor be liable for any special, incidental, indirect, consequential, exemplary, or punitive damages, 
+    including but not limited to, damages for loss of business profits, business interruption, loss of 
+    business information, loss of data, or any other pecuniary or non-pecuniary loss arising out of or in 
+    connection with the use or inability to use the Model, Derivatives or any Output, even if the Licensor has 
+    been advised of the possibility of such damages.
+    6.2 Indemnification: The Licensee agrees to indemnify, defend, and hold harmless the Licensor, its 
+    affiliates, officers, directors, employees, and agents from and against any claims, liabilities, damages, 
+    losses, costs, or expenses (including reasonable attorneys' fees) arising out of or related to the 
+    Licensee's use of the Model, any Derivatives, or any Output, including any violation of this Agreement or 
+    applicable laws.
+
+7. Termination
+    7.1 Termination by Licensor: The Licensor reserves the right to terminate this Agreement and revoke the 
+    Licensee’s rights to use the Model at any time, with or without cause, and without prior notice if the 
+    Licensee breaches any of the terms or conditions of this Agreement. Termination shall be effective 
+    immediately upon notice.
+    7.2 Effect of Termination: Upon termination of this Agreement, the Licensee must immediately cease all use 
+    of the Model, Derivatives, and Output and destroy all copies of the Model, Derivatives, and Output in its 
+    possession or control, including any backup or archival copies. The Licensee shall certify in writing to 
+    the Licensor that such destruction has been completed.
+    7.3 Survival: The provisions of this Agreement that by their nature should survive termination, including 
+    but not limited to, Sections 4 (Ownership), 5 (No Warranty), 6 (Limitation of Liability), and this Section 
+    7 (Termination), shall continue to apply after termination.
+
+8. Governing Law
+    8.1 Governing Law: This Agreement shall be governed by and construed in accordance with the laws of the 
+    Republic of Korea, without regard to its conflict of laws principles.
+    8.2 Arbitration: Any disputes, controversies, or claims arising out of or relating to this Agreement, 
+    including its existence, validity, interpretation, performance, breach, or termination, shall be referred 
+    to and finally resolved by arbitration administered by the Korean Commercial Arbitration Board (KCAB) in 
+    accordance with the International Arbitration Rules of the Korean Commercial Arbitration Board in force at 
+    the time of the commencement of the arbitration. The seat of arbitration shall be Seoul, Republic of 
+    Korea. The tribunal shall consist of one arbitrator. The language of the arbitration shall be English.
+
+9. Alterations
+    9.1 Modifications: The Licensor reserves the right to modify or amend this Agreement at any time, in its 
+    sole discretion. Any modifications will be effective upon posting the updated Agreement on the Licensor’s 
+    website or through other means of communication. The Licensee is responsible for reviewing the Agreement 
+    periodically for changes. Continued use of the Model after any modifications have been made constitutes 
+    acceptance of the revised Agreement.
+    9.2 Entire Agreement: This Agreement constitutes the entire agreement between the Licensee and Licensor 
+    concerning the subject matter hereof and supersedes all prior or contemporaneous oral or written 
+    agreements, representations, or understandings. Any terms or conditions of any purchase order or other 
+    document submitted by the Licensee in connection with the Model that are in addition to, different from, 
+    or inconsistent with the terms and conditions of this Agreement are not binding on the Licensor and are 
+    void.
+
+By downloading, installing, or using the EXAONE AI Model, the Licensee acknowledges that it has read, 
+understood, and agrees to be bound by the terms and conditions of this Agreement.
--- a/assets/EXAONE_Symbol+BI_3d.png
+++ b/assets/EXAONE_Symbol+BI_3d.png
--- a/config.json
+++ b/config.json
@ -0,0 +1,39 @@
+{
+  "activation_function": "silu",
+  "architectures": [
+    "ExaoneForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "auto_map": {
+    "AutoConfig": "configuration_exaone.ExaoneConfig",
+    "AutoModelForCausalLM": "modeling_exaone.ExaoneForCausalLM",
+    "AutoModelForSequenceClassification": "modeling_exaone.ExaoneForSequenceClassification"
+  },
+  "bos_token_id": 1,
+  "embed_dropout": 0.0,
+  "eos_token_id": 361,
+  "head_dim": 128,
+  "hidden_size": 4096,
+  "initializer_range": 0.02,
+  "intermediate_size": 14336,
+  "layer_norm_epsilon": 1e-05,
+  "max_position_embeddings": 32768,
+  "model_type": "exaone",
+  "num_attention_heads": 32,
+  "num_key_value_heads": 8,
+  "num_layers": 32,
+  "pad_token_id": 0,
+  "rope_scaling": {
+    "factor": 8.0,
+    "high_freq_factor": 4.0,
+    "low_freq_factor": 1.0,
+    "original_max_position_embeddings": 8192,
+    "rope_type": "llama3"
+  },
+  "rope_theta": 1000000.0,
+  "tie_word_embeddings": false,
+  "torch_dtype": "float32",
+  "transformers_version": "4.43.0",
+  "use_cache": true,
+  "vocab_size": 102400
+}
--- a/configuration.json
+++ b/configuration.json
@ -0,0 +1 @@
+{"framework": "pytorch", "task": "text-generation", "allow_remote": true}
--- a/configuration_exaone.py
+++ b/configuration_exaone.py
@ -0,0 +1,183 @@
+# coding=utf-8
+# Copyright 2021 The LG AI Research EXAONE Lab. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""EXAONE model configuration"""
+
+from transformers.configuration_utils import PretrainedConfig
+from transformers.utils import logging
+
+
+logger = logging.get_logger(__name__)
+
+EXAONE_PRETRAINED_CONFIG_ARCHIVE_MAP = {}
+
+
+class ExaoneConfig(PretrainedConfig):
+    r"""
+    This is the configuration class to store the configuration of a [`ExaoneModel`]. It is used to
+    instantiate a EXAONE model according to the specified arguments, defining the model architecture. Instantiating a
+    configuration with the defaults will yield a similar configuration to that of the EXAONE-3.0-7.8B-Instruct [LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct](https://huggingface.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct)
+
+    Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model
+    outputs. Read the documentation from [`PretrainedConfig`] for more information.
+
+
+    Args:
+        vocab_size (`int`, *optional*, defaults to 102400):
+            Vocabulary size of the EXAONE model. Defines the number of different tokens that can be represented by the
+            `inputs_ids` passed when calling [`ExaoneModel`]. Vocabulary size of the model.
+            Defines the different tokens that can be represented by the `inputs_ids` passed to the forward method of
+            [`ExaoneModel`].
+        max_position_embeddings (`int`, *optional*, defaults to 2048):
+            The maximum sequence length that this model might ever be used with. Typically set this to something large
+            just in case (e.g., 512 or 1024 or 2048).
+        hidden_size (`int`, *optional*, defaults to 2048):
+            Dimensionality of the encoder layers and the pooler layer.
+        num_layers (`int`, *optional*, defaults to 32):
+            Number of hidden layers in the Transformer encoder.
+        num_attention_heads (`int`, *optional*, defaults to 32):
+            Number of attention heads for each attention layer in the Transformer decoder.
+        num_key_value_heads (`int`, *optional*):
+            This is the number of key_value heads that should be used to implement Grouped Query Attention. If
+            `num_key_value_heads=num_attention_heads`, the model will use Multi Head Attention (MHA), if
+            `num_key_value_heads=1 the model will use Multi Query Attention (MQA) otherwise GQA is used. When
+            converting a multi-head checkpoint to a GQA checkpoint, each group key and value head should be constructed
+            by meanpooling all the original heads within that group. For more details checkout [this
+            paper](https://arxiv.org/pdf/2305.13245.pdf). If it is not specified, will default to
+            `num_attention_heads`.
+        intermediate_size (`int`, *optional*, defaults to `hidden_size * 4`):
+            Dimensionality of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder.
+        activation_function (`str` or `function`, *optional*, defaults to `"silu"`):
+            The non-linear activation function (function or string) in the decoder.
+        rope_theta (`float`, *optional*, defaults to 10000.0):
+            The base period of the RoPE embeddings.
+        rope_scaling (`Dict`, *optional*):
+            Dictionary containing the scaling configuration for the RoPE embeddings. NOTE: if you apply new rope type
+            and you expect the model to work on longer `max_position_embeddings`, we recommend you to update this value
+            accordingly.
+            Expected contents:
+                `rope_type` (`str`):
+                    The sub-variant of RoPE to use. Can be one of ['default', 'linear', 'dynamic', 'yarn', 'longrope',
+                    'llama3'], with 'default' being the original RoPE implementation.
+                `factor` (`float`, *optional*):
+                    Used with all rope types except 'default'. The scaling factor to apply to the RoPE embeddings. In
+                    most scaling types, a `factor` of x will enable the model to handle sequences of length x *
+                    original maximum pre-trained length.
+                `original_max_position_embeddings` (`int`, *optional*):
+                    Used with 'dynamic', 'longrope' and 'llama3'. The original max position embeddings used during
+                    pretraining.
+                `attention_factor` (`float`, *optional*):
+                    Used with 'yarn' and 'longrope'. The scaling factor to be applied on the attention
+                    computation. If unspecified, it defaults to value recommended by the implementation, using the
+                    `factor` field to infer the suggested value.
+                `beta_fast` (`float`, *optional*):
+                    Only used with 'yarn'. Parameter to set the boundary for extrapolation (only) in the linear
+                    ramp function. If unspecified, it defaults to 32.
+                `beta_slow` (`float`, *optional*):
+                    Only used with 'yarn'. Parameter to set the boundary for interpolation (only) in the linear
+                    ramp function. If unspecified, it defaults to 1.
+                `short_factor` (`List[float]`, *optional*):
+                    Only used with 'longrope'. The scaling factor to be applied to short contexts (<
+                    `original_max_position_embeddings`). Must be a list of numbers with the same length as the hidden
+                    size divided by the number of attention heads divided by 2
+                `long_factor` (`List[float]`, *optional*):
+                    Only used with 'longrope'. The scaling factor to be applied to long contexts (<
+                    `original_max_position_embeddings`). Must be a list of numbers with the same length as the hidden
+                    size divided by the number of attention heads divided by 2
+                `low_freq_factor` (`float`, *optional*):
+                    Only used with 'llama3'. Scaling factor applied to low frequency components of the RoPE
+                `high_freq_factor` (`float`, *optional*):
+                    Only used with 'llama3'. Scaling factor applied to high frequency components of the RoPE
+        embed_dropout (`float`, *optional*, defaults to 0.0):
+            The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
+        attention_dropout (`float`, *optional*, defaults to 0.0):
+            The dropout ratio for the attention probabilities.
+        layer_norm_epsilon (`float`, *optional*, defaults to 1e-05):
+            The epsilon used by the layer normalization layers.
+        initializer_range (`float`, *optional*, defaults to 0.02):
+            The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
+        use_cache (`bool`, *optional*, defaults to `True`):
+            Whether or not the model should return the last key/values attentions (not used by all models). Only
+            relevant if ``config.is_decoder=True``.
+        bos_token_id (`int`, *optional*, defaults to 0):
+            Beginning of stream token id.
+        eos_token_id (`int`, *optional*, defaults to 2):
+            End of stream token id.
+
+    Example:
+
+    ```python
+    >>> from transformers import EXAONEModel, ExaoneConfig
+
+    >>> # Initializing a EXAONE configuration
+    >>> configuration = ExaoneConfig()
+
+    >>> # Initializing a model from configuration
+    >>> model = EXAONEModel(configuration)
+
+    >>> # Accessing the model configuration
+    >>> configuration = model.config
+    ```"""
+
+    model_type = "exaone"
+    keys_to_ignore_at_inference = ["past_key_values"]
+    attribute_map = {"num_hidden_layers": "num_layers"}
+
+    def __init__(
+        self,
+        vocab_size=102400,
+        max_position_embeddings=2048,
+        hidden_size=2048,
+        num_layers=32,
+        num_attention_heads=32,
+        num_key_value_heads=None,
+        intermediate_size=None,
+        activation_function="silu",
+        rope_theta=10000.0,
+        rope_scaling=None,
+        embed_dropout=0.0,
+        attention_dropout=0.0,
+        layer_norm_epsilon=1e-5,
+        initializer_range=0.02,
+        use_cache=True,
+        bos_token_id=0,
+        eos_token_id=2,
+        **kwargs,
+    ):
+        self.vocab_size = vocab_size
+        self.max_position_embeddings = max_position_embeddings
+        self.hidden_size = hidden_size
+        self.num_layers = num_layers
+        self.num_attention_heads = num_attention_heads
+        self.num_layers = num_layers
+        if num_key_value_heads is None:
+            num_key_value_heads = num_attention_heads
+        self.num_key_value_heads = num_key_value_heads
+        if intermediate_size:
+            self.intermediate_size = intermediate_size
+        else:
+            self.intermediate_size = hidden_size * 4
+        self.activation_function = activation_function
+        self.embed_dropout = embed_dropout
+        self.attention_dropout = attention_dropout
+        self.layer_norm_epsilon = layer_norm_epsilon
+        self.initializer_range = initializer_range
+        self.use_cache = use_cache
+        self.rope_theta = rope_theta
+        self.rope_scaling = rope_scaling
+
+        self.bos_token_id = bos_token_id
+        self.eos_token_id = eos_token_id
+
+        super().__init__(bos_token_id=bos_token_id, eos_token_id=eos_token_id, **kwargs)
--- a/generation_config.json
+++ b/generation_config.json
@ -0,0 +1,7 @@
+{
+  "_from_model_config": true,
+  "bos_token_id": 1,
+  "eos_token_id": 361,
+  "pad_token_id": 0,
+  "transformers_version": "4.43.0"
+}
--- a/merges.txt
+++ b/merges.txt
--- a/model-00001-of-00007.safetensors
+++ b/model-00001-of-00007.safetensors
				`@ -0,0 +1 @@`
				`{"framework": "pytorch", "task": "text-generation", "allow_remote": true}`