first commit

2025-03-12 17:45:08 +08:00 · 2025-03-12 17:45:08 +08:00 · 7d7785283f
parent 9d912d0875
commit 7d7785283f
17 changed files with 255767 additions and 2 deletions
--- a/316
+++ b/316
@ -0,0 +1,316 @@
+Instella-VL-1B Model [RESEARCH-ONLY RAIL-MS]
+
+Licensed Artifact(s):
+
+-   Model
+
+-   Source Code
+
+Section I: PREAMBLE
+
+BY ACCESSING, DOWNLOADING, INSTALLING, OR USING THE ARTIFACT, YOU AGREE
+TO BE BOUND BY THIS LICENSE. IF YOU DO NOT AGREE TO ALL OF THE TERMS AND
+CONDITIONS OF THIS LICENSE, DO NOT ACCESS, DOWNLOAD, INSTALL, OR USE THE
+ARTIFACT.
+
+1. Definitions
+
+(a) “Application” refers to a sequence of instructions or statements
+    written in machine code language, including object code (that is the
+    product of a compiler), binary code (data using a two-symbol system)
+    or an intermediate language (such as register transfer language).
+
+(b) “Artifact” refers to a software application (in either binary or
+    source code format), Model, and/or Source Code, in accordance with
+    what is specified above as the “Licensed Artifact”.
+
+(c) “Contribution” means any work, including any modifications or
+    additions to an Artifact, that is intentionally submitted to
+    Licensor for inclusion or incorporation in the Artifact directly or
+    indirectly by the rights owner. For the purposes of this definition,
+    “submitted” means any form of electronic, verbal, or written
+    communication sent to the Licensor or its representatives, including
+    but not limited to communication on electronic mailing lists, source
+    code control systems, and issue tracking systems that are managed
+    by, or on behalf of, the Licensor for the purpose of discussing,
+    sharing and improving the Artifact, but excluding communication that
+    is conspicuously marked or otherwise designated in writing by the
+    contributor as “Not a Contribution.”
+
+(d) “Contributor” means Licensor or any other individual or legal entity
+    that creates or owns a Contribution that is added to or incorporated
+    into an Artifact or its Derivative.
+
+(e) “Data” means a collection of information and/or content extracted
+    from the dataset used with a given Model, including to train,
+    pretrain, or otherwise evaluate the Model. The Data is not licensed
+    under this License.
+
+(f) “Derivative” means a work derived from or based upon an Artifact,
+    and includes all modified versions of such Artifact.
+
+(g) “Distribution” means any transmission, reproduction, publication or
+    other sharing of an Artifact or Derivative to a Third Party,
+    including providing a hosted service incorporating the Artifact,
+    which is made available by electronic or other remote means -
+    e.g. API-based or web access.
+
+(h) “Harm” includes but is not limited to physical, mental,
+    psychological, financial and reputational damage, pain, or loss.
+
+(i) “License” means the terms and conditions for use, reproduction, and
+    Distribution as defined in this document.
+
+(j) “Licensor” means the rights owner (by virtue of creation or
+    documented transfer of ownership) or entity authorized by the rights
+    owner (e.g., exclusive licensee) that is granting the rights in this
+    License.
+
+(k) “Model” means any machine-learning based assembly or assemblies
+    (including checkpoints), consisting of learnt weights, parameters
+    (including optimizer states), corresponding to the model
+    architecture as embodied in the Source Code.
+
+(l) “Output” means the results of operating a Model as embodied in
+    informational content resulting therefrom.
+
+(m) “Permitted Purpose” means for academic or research purposes only.
+
+(n) “Source Code” means any collection of text written using
+    human-readable programming language, including the code and scripts
+    used to define, run, load, benchmark or evaluate a Model or any
+    component thereof, and/or used to prepare data for training or
+    evaluation, if any. Source Code includes any accompanying
+    documentation, tutorials, examples, etc, if any. For clarity, the
+    term “Source Code” as used in this License includes any and all
+    Derivatives of such Source Code.
+
+(o) “Third Parties” means individuals or legal entities that are not
+    under common control with Licensor or You.
+
+(p) “Use” includes accessing, using, copying, modifying, and/or
+    distributing an Artifact; in connection with a Model as Artifact,
+    Use also includes creating content, fine-tuning, updating, running,
+    training, evaluating and/or re-parametrizing such Model.
+
+(q) “You” (or “Your”) means an individual or legal entity receiving and
+    exercising permissions granted by this License and/or making use of
+    the Artifact for permitted purposes and in any permitted field of
+    use, including usage of the Artifact in an end-use application -
+    e.g. chatbot, translator, image generator, etc.
+
+Section II: INTELLECTUAL PROPERTY RIGHTS
+
+Both copyright and patent grants may apply to the Artifact. The Artifact
+is subject to additional terms and conditions as described in Section III
+below. 
+
+2. Grant of Copyright License. Conditioned upon compliance with Section
+III below and subject to the terms and conditions of this License, each
+Contributor hereby grants to You, only in connection with the Permitted
+Purpose, a worldwide, non-exclusive, royalty-free copyright license to
+reproduce, use, publicly display, publicly perform, sublicense, and
+distribute the Artifact and Derivatives thereof.
+
+3. Grant of Patent License. Conditioned upon compliance with Section III
+below and subject to the terms and conditions of this License, and only
+where and as applicable, each Contributor hereby grants to You, only in
+connection with the Permitted Purpose, a worldwide, non-exclusive,
+royalty-free, irrevocable (except as stated in this paragraph) patent
+license to make, have made, use, sell, offer to sell, import, and
+otherwise transfer the Artifact where such license applies only to those
+patent claims licensable by such Contributor that are necessarily
+infringed by their Contribution(s) alone or by combination of their
+Contribution(s) with the Artifact to which such Contribution(s) was
+submitted. If You institute patent litigation against any entity
+(including a cross-claim or counterclaim in a lawsuit) alleging that the
+Artifact and/or a Contribution incorporated within the Artifact
+constitutes direct or contributory patent infringement, then any patent
+licenses granted to You under this License in connection with the
+Artifact shall terminate as of the date such litigation is asserted or
+filed.
+
+Licensor and Contributor each have the right to grant the licenses
+above.
+
+Section III: CONDITIONS OF USAGE, DISTRIBUTION AND REDISTRIBUTION
+
+4. Use-based Restrictions. The restrictions contained in the AMD
+Responsible AI Use Policy set forth in Attachment A are mandatory Use-
+based restrictions. Therefore You may not Use the Artifact in violation
+of such restrictions. You may Use the Artifact only subject to this
+License; if Section II is held unenforceable or inapplicable, this
+Section III will continue to govern any use of the Artifact. You shall
+require all of Your users who Use the Artifact or its Derivative
+to comply with the terms and conditions of this License, including
+those contained in this paragraph, and only for the Permitted Purpose.
+
+5. The Output You Generate with a Model (as Artifact). Except as set
+forth herein, Licensor claims no rights in the Output You generate. You
+are accountable for the Output You generate and its subsequent uses. No
+use of the Output may contravene any provision as stated in this
+License.
+
+6. Distribution and Redistribution. You may host for Third Party remote
+access purposes (e.g. software-as-a-service), reproduce and distribute
+copies of the Artifact or its Derivatives in any medium, with or without
+modifications, provided that You meet the following conditions:
+
+6.1.  Use-based restrictions in paragraph 4 MUST be included as a
+      condition precedent to effect any type of legal agreement (e.g. a
+      license) governing the use and/or distribution of the Artifact or
+      its Derivatives, and You shall give such notice to any subsequent
+      Third Party recipients;
+6.2.  You shall give any Third Party recipients of the Artifact or its
+      Derivatives a copy of this License;
+6.3.  You shall cause any modified files to carry prominent notices
+      stating that You changed the files;
+6.4.  You shall retain all copyright, patent, trademark, and attribution
+      notices excluding those notices that do not pertain to any part of
+      the Artifact or its Derivatives.
+6.5.  You and any Third Party recipients of the Artifact or its
+      Derivative shall adhere to the Permitted Purpose.
+
+You may add Your own copyright statement to Your modifications and may
+provide additional or different license terms and conditions with
+respect to paragraph 6.1., to govern the use, reproduction, or
+Distribution of Your modifications, or for any Derivative, provided that
+Your use, reproduction, and Distribution of the Artifact or its
+Derivative otherwise complies with the conditions stated in this
+License. In other words, the Use-based restrictions in Attachment A form
+the minimum set of terms for You to license to Third Parties any
+Artifact or its Derivative, but You may add more restrictive terms if
+You deem it necessary.
+
+Section IV: OTHER PROVISIONS
+
+7. Updates and Runtime Restrictions. To the maximum extent permitted by
+law, Licensor reserves the right to restrict (remotely or otherwise)
+usage of the Artifact in violation of this License or update the
+Artifact through electronic means.
+
+8. Trademarks and Related. Nothing in this License permits You to make
+use of Licensors’ trademarks, trade names, logos or to otherwise suggest
+endorsement or misrepresent the relationship between the parties; and
+any rights not expressly granted herein are reserved by the Licensors.
+
+9. Disclaimer of Warranty. Unless required by applicable law or agreed
+to in writing, Licensor provides the Artifact (and each Contributor
+provides its Contributions) on an “AS IS” BASIS, WITHOUT WARRANTIES OR
+CONDITIONS OF ANY KIND, either express or implied, including, without
+limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT,
+MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely
+responsible for determining the appropriateness of using the Artifact,
+and assume any risks associated with Your exercise of permissions under
+this License.
+
+10. Limitation of Liability. In no event and under no legal theory,
+whether in tort (including negligence), contract, or otherwise, unless
+required by applicable law (such as deliberate and grossly negligent
+acts) or agreed to in writing, shall any Contributor be liable to You
+for damages, including any direct, indirect, special, incidental, or
+consequential damages of any character arising as a result of this
+License or out of the use or inability to use the Artifact (including
+but not limited to damages for loss of goodwill, work stoppage, computer
+failure or malfunction, or any and all other commercial damages or
+losses), even if such Contributor has been advised of the possibility of
+such damages.
+
+11. If any provision of this License is held to be invalid, illegal or
+unenforceable, the remaining provisions shall be unaffected thereby and
+remain valid as if such provision had not been set forth herein.
+
+12. Term and Termination. The term of this License will commence upon
+the earlier of Your (a) acceptance of this License or (b) accessing the
+Artifact; and will continue in full force and effect until terminated in
+accordance with the terms and conditions herein. Licensor may terminate
+this License if You are in breach of any term or condition of this
+License. Upon termination of this License, all licenses granted to You
+will terminate and You must promptly delete and cease use of the
+Artifact. Sections 1, 7, 8, 9, 10, 11, and 12 survive termination of
+this License.
+
+END OF TERMS AND CONDITIONS
+
+Attachment A
+
+AMD Responsible AI Use Policy
+
+AMD is committed to the responsible use of its Artificial Intelligence
+(AI) products and technologies (“AMD AI”).  AMD AI may include
+artificial intelligence or machine learning technologies that use
+algorithms to analyze data and generate output using predictions based
+on patterns in data.  This policy explains the uses that AMD
+specifically prohibits.
+
+If you use any AMD AI, you are agreeing to use the AMD AI in compliance
+with applicable laws and not for any of the following prohibited uses.
+
+Prohibited Uses:
+
+1) No Illegal Acts.  Do not use AMD AI in violation of any applicable
+national, state, local, or other jurisdictional law, rule, regulation,
+or sanction.
+
+2) No Explicit Content.  Do not use AMD AI to submit (as input),
+generate, or disseminate content depicting violent or sexually explicit
+content or to create sexual chatbots.
+
+3) No Harm.  Do not use AMD AI for any potentially harmful uses,
+   including fraud, deception, discrimination, abuse, or harassment,
+   including the following:
+
+   a) Harm or abuse of a minor, including grooming and child sexual
+      exploitation.
+
+   b) Impersonation of human beings for purposes of deception.
+
+   c) Generation or dissemination of information you know to be false
+      for the purpose of harming others.
+
+   d) Intentionally defame, disparage, or otherwise harass others.
+
+   e) Intentionally attempting to materially distort the behavior of a
+      person in a manner that causes or is likely to cause that person
+      or another person physical or psychological harm.
+
+   f) Providing medical advice or interpretation of medical results that
+      is intended to be a substitute for professional medical advice,
+      diagnosis, or treatment.
+
+   g) Engaging in the unlawful or unauthorized practice of any
+      profession, including financial, legal, medical, health, or
+      related professional practices.
+
+   h) Judgment of, discrimination against, or harm to individuals or
+      groups based on legally protected characteristics or categories,
+      online or offline social behavior, or known or predicted personal
+      or personality characteristics, including any of the foregoing
+      uses in social credit systems.
+
+4) No High-Risk Activity.  Do not use AMD AI in any high-risk activities
+ or applications that create a risk of personal injury, death, or
+severe property or environmental damage, including in weapons or
+military applications.
+
+5) No Personal Information.  Do not use AMD AI to collect, process, or
+disclose personal data, including heath or sensitive personal
+information, without the necessary rights or consents.
+
+6) No Infringement.  Do not use AMD AI to generate or disseminate any
+information that infringes upon or misappropriates the intellectual
+property rights of others, including copyright, trademark, patent, and
+trade secret rights, rights to privacy, and publicity rights.
+
+7) No Malware.  Do not use AMD AI to generate or disseminate malware or
+any other content to be used for the purpose of facilitating unpermitted
+access to, or use of, computer systems or data.
+
+8) No Obfuscation.  Do not inappropriately obfuscate or fail to disclose
+to end users the presence of AI in any application in which AMD AI is
+deployed, along with any known risks or dangers of using AI without
+appropriate safeguards, oversight and human control.
+
+9) No Reliance.  Do not rely on any information generated using AMD AI
+without assessing it for accuracy, potential for harm, or other specific
+risks applicable to the use case.
--- a/444
+++ b/444
@ -0,0 +1,444 @@
+NOTICES Instella_VL_1B 
+
+               
+        Copyright Statements
+
+        Copyright (c) 2018-2024 Advanced Micro Devices, Inc. All Rights Reserved.
+    
+        License Text https://spdx.org/licenses/Apache-2.0.html
+    
+        Apache License
+Version 2.0, January 2004
+http://www.apache.org/licenses/
+
+TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+1. Definitions.
+"License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.
+
+"Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.
+
+"Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.
+
+"You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License.
+
+"Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.
+         "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.
+
+"Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).
+
+"Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.
+
+"Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution."
+
+"Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.
+
+2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.
+3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.
+4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
+(a) You must give any other recipients of the Work or Derivative Works a copy of this License; and
+(b) You must cause any modified files to carry prominent notices stating that You changed the files; and
+(c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and
+(d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.
+You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.
+
+5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.
+6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.
+7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.
+8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
+9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.
+END OF TERMS AND CONDITIONS
+    
+                amd-AMD-OLMo-1B-SFT v-u (Apache-2.0)
+    
+    
+        Copyright Statements
+
+        Copyright (c) 2018-2024 Advanced Micro Devices, Inc. All Rights Reserved.
+    
+        License Text https://spdx.org/licenses/Apache-2.0.html
+    
+        Apache License
+Version 2.0, January 2004
+http://www.apache.org/licenses/
+
+TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+1. Definitions.
+"License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.
+
+"Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.
+
+"Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.
+
+"You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License.
+
+"Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.
+
+"Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.
+
+"Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).
+
+"Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.
+
+"Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution."
+
+"Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.
+
+2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.
+3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.
+4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
+(a) You must give any other recipients of the Work or Derivative Works a copy of this License; and
+(b) You must cause any modified files to carry prominent notices stating that You changed the files; and
+(c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and
+(d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.
+You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.
+
+5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.
+6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.
+7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.
+8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
+9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.
+END OF TERMS AND CONDITIONS
+    
+                Dependencies on FastChat v-u (Apache-2.0)
+    
+    
+        Copyright Statements
+
+        Modification Copyright&Acirc;&copy; 2025 Advanced Micro Devices, Inc. All rights reserved."
+Copyright 2023 Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li
+    
+        License Text https://spdx.org/licenses/MIT.html
+    
+        Apache License
+Version 2.0, January 2004
+http://www.apache.org/licenses/
+
+TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+1. Definitions.
+"License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.
+
+"Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.
+
+"Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.
+
+"You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License.
+
+"Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.
+
+"Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.
+
+"Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).
+
+"Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.
+
+"Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution."
+
+"Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.
+
+2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.
+3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.
+4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
+(a) You must give any other recipients of the Work or Derivative Works a copy of this License; and
+(b) You must cause any modified files to carry prominent notices stating that You changed the files; and
+(c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and
+(d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.
+You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.
+
+5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.
+6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.
+7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.
+8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
+9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.
+END OF TERMS AND CONDITIONS
+    
+                Dependencies on LLaVA-NeXT v-u (Apache-2.0)
+    
+    
+        Copyright Statements
+
+        "Modification Copyright&Acirc;&copy; 2025 Advanced Micro Devices, Inc. All rights reserved."
+ Copyright 2023 Haotian Liu
+    
+        License Text https://spdx.org/licenses/Apache-2.0.html
+    
+        Apache License
+Version 2.0, January 2004
+http://www.apache.org/licenses/
+
+TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+1. Definitions.
+"License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.
+
+"Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.
+
+"Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.
+
+"You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License.
+
+"Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.
+
+"Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.
+
+"Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).
+
+"Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.
+
+"Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution."
+
+"Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.
+
+2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.
+3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.
+4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
+(a) You must give any other recipients of the Work or Derivative Works a copy of this License; and
+(b) You must cause any modified files to carry prominent notices stating that You changed the files; and
+(c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and
+(d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.
+You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.
+
+5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.
+6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.
+7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.
+8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
+9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.
+END OF TERMS AND CONDITIONS
+    
+                Dependencies on OpenGVLab-InternVL v-u (MIT)
+    
+    
+        Copyright Statements
+
+        Modification Copyright&Acirc;&copy; 2025 Advanced Micro Devices, Inc. All rights reserved."
+Copyright (c) 2023 OpenGVLab
+    
+        License Text https://spdx.org/licenses/MIT.html
+    
+        # "Modification Copyright&Acirc;&copy; 2025 Advanced Micro Devices, Inc. All rights reserved."
+
+# --------------------------------------------------------
+# InternVL
+# Copyright (c) 2023 OpenGVLab
+# Licensed under The MIT License [see LICENSE for details]
+# --------------------------------------------------------
+    
+                LLaVA-NeXT v-u (Apache-2.0)
+    
+    
+        Copyright Statements
+
+        Copyright 2022 The HuggingFace Team. All rights reserved.
+Copyright 2023 Haotian Liu
+Copyright 2024 Duc Q. Nguyen, Haotian Liu and Bo Li
+Copyright 2022 EleutherAI and the HuggingFace Inc. team. All rights reserved
+Copyright 2023 DDPO-pytorch authors (Kevin Black), The HuggingFace Team, metric-space. All rights reserved.
+
+    
+        License Text https://spdx.org/licenses/Apache-2.0.html
+    
+        Apache License
+Version 2.0, January 2004
+http://www.apache.org/licenses/
+
+TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+1. Definitions.
+"License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.
+
+"Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.
+
+"Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.
+
+"You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License.
+
+"Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.
+
+"Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.
+
+"Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).
+
+"Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.
+
+"Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution."
+
+"Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.
+
+2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.
+3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.
+4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
+(a) You must give any other recipients of the Work or Derivative Works a copy of this License; and
+(b) You must cause any modified files to carry prominent notices stating that You changed the files; and
+(c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and
+(d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.
+You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.
+
+5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.
+6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.
+7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.
+8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
+9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.
+END OF TERMS AND CONDITIONS
+    
+                microsoft-unilm v-u (MIT)
+    
+    
+        Copyright Statements
+
+        Copyright (c) Microsoft Corporation
+    
+        License Text https://spdx.org/licenses/MIT.html
+    
+        Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the " Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+    
+                openai-CLIP v-u (MIT)
+    
+    
+        Copyright Statements
+
+        Copyright (c) 2021 OpenAI.
+    
+        License Text https://spdx.org/licenses/MIT.html
+    
+        Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the " Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+    
+                salesforce-LAVIS v-u (BSD-3-Clause)
+    
+    
+        Copyright Statements
+
+        Copyright (c) 2023, salesforce.com, inc.
+    
+        License Text https://spdx.org/licenses/BSD-3-Clause.html
+    
+        Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
+
+1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
+2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
+3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+    
+    
+Copyright Statements
+
+
+Tongyi Qianwen is licensed under the Tongyi Qianwen LICENSE AGREEMENT, Copyright (c) Alibaba Cloud. All Rights Reserved.
+
+Tongyi Qianwen LICENSE AGREEMENT
+
+Tongyi Qianwen Release Date: August 23, 2023
+
+By clicking to agree or by using or distributing any portion or element of the Tongyi Qianwen Materials, you will be deemed to have recognized and accepted the content of this Agreement, which is effective immediately.
+
+1. Definitions
+    a. This Tongyi Qianwen LICENSE AGREEMENT (this "Agreement") shall mean the terms and conditions for use, reproduction, distribution and modification of the Materials as defined by this Agreement.
+    b. "We"(or "Us") shall mean Alibaba Cloud.
+    c. "You" (or "Your") shall mean a natural person or legal entity exercising the rights granted by this Agreement and/or using the Materials for any purpose and in any field of use.
+    d. "Third Parties" shall mean individuals or legal entities that are not under common control with Us or You.
+    e. "Tongyi Qianwen" shall mean the large language models (including Qwen-VL model and Qwen-VL-Chat model), and software and algorithms, consisting of trained model weights, parameters (including optimizer states), machine-learning model code, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Us.
+    f. "Materials" shall mean, collectively, Alibaba Cloud's proprietary Tongyi Qianwen and Documentation (and any portion thereof) made available under this Agreement.
+    g. "Source" form shall mean the preferred form for making modifications, including but not limited to model source code, documentation source, and configuration files.
+    h. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation,
+ and conversions to other media types.
+
+2. Grant of Rights
+You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Alibaba Cloud's intellectual property or other rights owned by Us embodied in the Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Materials.
+
+3. Redistribution
+You may reproduce and distribute copies of the Materials or derivative works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
+    a. You shall give any other recipients of the Materials or derivative works a copy of this Agreement;
+    b. You shall cause any modified files to carry prominent notices stating that You changed the files;
+    c. You shall retain in all copies of the Materials that You distribute the following attribution notices within a "Notice" text file distributed as a part of such copies: "Tongyi Qianwen is licensed under the Tongyi Qianwen LICENSE AGREEMENT, Copyright (c) Alibaba Cloud. All Rights Reserved."; and
+    d. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such derivative works as a whole, provided Your use, reproduction, and distribution of the work otherwise complies with the terms and conditions of this Agreement.
+
+4. Restrictions
+If you are commercially using the Materials, and your product or service has more than 100 million monthly active users, You shall request a license from Us. You cannot exercise your rights under this Agreement without our express authorization.
+
+5. Rules of use
+    a. The Materials may be subject to export controls or restrictions in China, the United States or other countries or regions. You shall comply with applicable laws and regulations in your use of the Materials.
+    b. You can not use the Materials or any output therefrom to improve any other large language model (excluding Tongyi Qianwen or derivative works thereof).
+
+6. Intellectual Property
+    a. We retain ownership of all intellectual property rights in and to the Materials and derivatives made by or for Us. Conditioned upon compliance with the terms and conditions of this Agreement, with respect to any derivative works and modifications of the Materials that are made by you, you are and will be the owner of such derivative works and modifications.
+    b. No trademark license is granted to use the trade names, trademarks, service marks, or product names of Us, except as required to fulfill notice requirements under this Agreement or as required for reasonable and customary use in describing and redistributing the Materials.
+    c. If you commence a lawsuit or other proceedings (including a cross-claim or counterclaim in a lawsuit) against Us or any entity alleging that the Materials or any output therefrom, or any part of the foregoing, infringe any intellectual property or other right owned or licensable by you, then all licences granted to you under this Agreement shall terminate as of the date such lawsuit or other proceeding is commenced or brought.
+
+7. Disclaimer of Warranty and Limitation of Liability
+
+    a. We are not obligated to support, update, provide training for, or develop any further version of the Tongyi Qianwen Materials or to grant any license thereto.
+    b. THE MATERIALS ARE PROVIDED "AS IS" WITHOUT ANY EXPRESS OR IMPLIED WARRANTY OF ANY KIND INCLUDING WARRANTIES OF MERCHANTABILITY, NONINFRINGEMENT, OR FITNESS FOR A PARTICULAR PURPOSE. WE MAKE NO WARRANTY AND ASSUME NO RESPONSIBILITY FOR THE SAFETY OR STABILITY OF THE MATERIALS AND ANY OUTPUT THEREFROM.
+    c. IN NO EVENT SHALL WE BE LIABLE TO YOU FOR ANY DAMAGES, INCLUDING, BUT NOT LIMITED TO ANY DIRECT, OR INDIRECT, SPECIAL OR CONSEQUENTIAL DAMAGES ARISING FROM YOUR USE OR INABILITY TO USE THE MATERIALS OR ANY OUTPUT OF IT, NO MATTER HOW IT S CAUSED.
+    d. You will defend, indemnify and hold harmless Us from and against any claim by any third party arising out of or related to your use or distribution of the Materials.
+
+8. Survival and Termination.
+    a. The term of this Agreement shall commence upon your acceptance of this Agreement or access to the Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein.
+    b. We may terminate this Agreement if you breach any of the terms or conditions of this Agreement. Upon termination of this Agreement, you must delete and cease use of the Materials. Sections 7 and 9 shall survive the termination of this Agreement.
+
+9. Governing Law and Jurisdiction.
+    a. This Agreement and any dispute arising out of or relating to it will be governed by the laws of China, without regard to conflict of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement.
+    b. The People's Courts in Hangzhou City shall have exclusive jurisdiction over any dispute arising out of this Agreement.
+
+
+
+------------- LICENSE FOR NVIDIA Megatron-LM code  --------------
+
+Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+  * Redistributions of source code must retain the above copyright
+    notice, this list of conditions and the following disclaimer.
+  * Redistributions in binary form must reproduce the above copyright
+    notice, this list of conditions and the following disclaimer in the
+    documentation and/or other materials provided with the distribution.
+  * Neither the name of NVIDIA CORPORATION nor the names of its
+    contributors may be used to endorse or promote products derived
+    from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
+EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+------------- LICENSE FOR OpenAI tiktoken code  --------------
+
+MIT License
+
+Copyright (c) 2022 OpenAI, Shantanu Jain
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
--- a/README.md
+++ b/README.md
@ -1,3 +1,433 @@
-# Instella-VL-1B
+---
+{}
+---
+# Instella-VL-1B ✨
+Welcome to the official repository for **Instella-VL-1B**, AMD's first ever Vision-Language Model (VLM). This repository provides a detailed guide for training and inference with **Instella-VL-1B**. Developed from AMD's **Instella-1B** (previously known as [AMD OLMo 1B SFT](https://www.amd.com/en/developer/resources/technical-articles/introducing-the-first-amd-1b-language-model.html) LLM), this model is fully open-source, with both model weights and training code available for AMD GPUs (MI300). Its compact size aims to make it accessible to a broad spectrum of researchers, developers, and enthusiasts, enabling them to build upon, modify, and integrate it into their own projects.

-Instella-VL-1B
+[[GitHub](https://github.com/AMD-AIG-AIMA/InstellaVL)][[Blog](https://rocm.blogs.amd.com/artificial-intelligence/Instella-BL-1B-VLM/README.html)]
+
+## Main Results
+We compare our model with models which only releases the model weights (with * in the below table) and also models which releases weights, data curvation and all training details.
+
+<table class="tg"><thead>
+  <tr>
+    <td class="tg-0pky"></td>
+    <td class="tg-c3ow">DeepSeek-VL-1.3B *</td>
+    <td class="tg-c3ow">InternVL2-1B *</td>
+    <td class="tg-c3ow">InternVL2.5-1B *</td>
+    <td class="tg-c3ow">TinyLLaVA-2.4B</td>
+    <td class="tg-c3ow">TinyLLaVA-1.5B</td>
+    <td class="tg-c3ow">llava-onevision-1b</td>
+    <td class="tg-c3ow">MiniCPM-V-2</td>
+    <td class="tg-c3ow">Instella-VL-1B</td>
+  </tr></thead>
+<tbody>
+  <tr>
+    <td class="tg-c3ow">GQA</td>
+    <td class="tg-c3ow">--</td>
+    <td class="tg-c3ow">55.06</td>
+    <td class="tg-c3ow">56.66</td>
+    <td class="tg-c3ow">61.58</td>
+    <td class="tg-c3ow">60.28</td>
+    <td class="tg-c3ow">57.95</td>
+    <td class="tg-c3ow">--</td>
+    <td class="tg-c3ow">61.52</td>
+  </tr>
+  <tr>
+    <td class="tg-c3ow">SQA</td>
+    <td class="tg-c3ow">64.52</td>
+    <td class="tg-c3ow">89.54</td>
+    <td class="tg-c3ow">93.90</td>
+    <td class="tg-c3ow">64.30</td>
+    <td class="tg-c3ow">59.69</td>
+    <td class="tg-c3ow">59.25</td>
+    <td class="tg-c3ow">76.10</td>
+    <td class="tg-c3ow">83.74</td>
+  </tr>
+  <tr>
+    <td class="tg-c3ow">POPE</td>
+    <td class="tg-c3ow">85.80</td>
+    <td class="tg-c3ow">87.40</td>
+    <td class="tg-c3ow">89.95</td>
+    <td class="tg-c3ow">85.66</td>
+    <td class="tg-c3ow">84.77</td>
+    <td class="tg-c3ow">87.17</td>
+    <td class="tg-c3ow">86.56</td>
+    <td class="tg-c3ow">86.73</td>
+  </tr>
+  <tr>
+    <td class="tg-c3ow">MM-Bench</td>
+    <td class="tg-c3ow">64.34</td>
+    <td class="tg-c3ow">61.70</td>
+    <td class="tg-c3ow">68.40</td>
+    <td class="tg-c3ow">58.16</td>
+    <td class="tg-c3ow">51.28</td>
+    <td class="tg-c3ow">44.60</td>
+    <td class="tg-c3ow">70.44</td>
+    <td class="tg-c3ow">69.17</td>
+  </tr>
+  <tr>
+    <td class="tg-c3ow">seedbench</td>
+    <td class="tg-c3ow">65.94</td>
+    <td class="tg-c3ow">65.90</td>
+    <td class="tg-c3ow">71.30</td>
+    <td class="tg-c3ow">63.30</td>
+    <td class="tg-c3ow">60.04</td>
+    <td class="tg-c3ow">65.43</td>
+    <td class="tg-c3ow">66.90</td>
+    <td class="tg-c3ow">68.47</td>
+  </tr>
+  <tr>
+    <td class="tg-c3ow">MMMU</td>
+    <td class="tg-c3ow">28.67</td>
+    <td class="tg-c3ow">32.40</td>
+    <td class="tg-c3ow">35.60</td>
+    <td class="tg-c3ow">32.11</td>
+    <td class="tg-c3ow">29.89</td>
+    <td class="tg-c3ow">30.90</td>
+    <td class="tg-c3ow">38.55</td>
+    <td class="tg-c3ow">29.30</td>
+  </tr>
+  <tr>
+    <td class="tg-c3ow">realworldqa</td>
+    <td class="tg-c3ow">50.20</td>
+    <td class="tg-c3ow">51.90</td>
+    <td class="tg-c3ow">58.30</td>
+    <td class="tg-c3ow">52.42</td>
+    <td class="tg-c3ow">46.67</td>
+    <td class="tg-c3ow">51.63</td>
+    <td class="tg-c3ow">55.03</td>
+    <td class="tg-c3ow">58.82</td>
+  </tr>
+  <tr>
+    <td class="tg-c3ow">mmstar</td>
+    <td class="tg-c3ow">38.30</td>
+    <td class="tg-c3ow">46.18</td>
+    <td class="tg-c3ow">47.93</td>
+    <td class="tg-c3ow">37.17</td>
+    <td class="tg-c3ow">31.87</td>
+    <td class="tg-c3ow">37.38</td>
+    <td class="tg-c3ow">40.93</td>
+    <td class="tg-c3ow">43.21</td>
+  </tr>
+  <tr>
+    <td class="tg-c3ow"><span style="font-weight:bold">Average</span></td>
+    <td class="tg-c3ow">-</td>
+    <td class="tg-c3ow">61.26</td>
+    <td class="tg-c3ow">65.26</td>
+    <td class="tg-c3ow">56.84</td>
+    <td class="tg-c3ow">53.06</td>
+    <td class="tg-c3ow">54.29</td>
+    <td class="tg-c3ow">-</td>
+    <td class="tg-c3ow">62.62</td>
+  </tr>
+  <tr>
+    <td class="tg-c3ow">ocrbench</td>
+    <td class="tg-c3ow">41.40</td>
+    <td class="tg-c3ow">74.40</td>
+    <td class="tg-c3ow">74.20</td>
+    <td class="tg-c3ow">28.90</td>
+    <td class="tg-c3ow">34.40</td>
+    <td class="tg-c3ow">43.00</td>
+    <td class="tg-c3ow">60.00</td>
+    <td class="tg-c3ow">67.90</td>
+  </tr>
+  <tr>
+    <td class="tg-c3ow">TextVQA</td>
+    <td class="tg-c3ow">57.54</td>
+    <td class="tg-c3ow">69.60</td>
+    <td class="tg-c3ow">72.96</td>
+    <td class="tg-c3ow">47.05</td>
+    <td class="tg-c3ow">49.54</td>
+    <td class="tg-c3ow">49.54</td>
+    <td class="tg-c3ow">74.23</td>
+    <td class="tg-c3ow">71.23</td>
+  </tr>
+  <tr>
+    <td class="tg-c3ow">AI2D</td>
+    <td class="tg-c3ow">51.13</td>
+    <td class="tg-c3ow">62.40</td>
+    <td class="tg-c3ow">67.58</td>
+    <td class="tg-c3ow">49.58</td>
+    <td class="tg-c3ow">43.10</td>
+    <td class="tg-c3ow">57.35</td>
+    <td class="tg-c3ow">64.40</td>
+    <td class="tg-c3ow">66.65</td>
+  </tr>
+  <tr>
+    <td class="tg-c3ow">ChartQA</td>
+    <td class="tg-c3ow">47.40</td>
+    <td class="tg-c3ow">71.52</td>
+    <td class="tg-c3ow">75.76</td>
+    <td class="tg-c3ow">12.96</td>
+    <td class="tg-c3ow">15.24</td>
+    <td class="tg-c3ow">61.24</td>
+    <td class="tg-c3ow">59.80</td>
+    <td class="tg-c3ow">72.52</td>
+  </tr>
+  <tr>
+    <td class="tg-c3ow">DocVQA</td>
+    <td class="tg-c3ow">35.70</td>
+    <td class="tg-c3ow">80.94</td>
+    <td class="tg-c3ow">82.76</td>
+    <td class="tg-c3ow">25.82</td>
+    <td class="tg-c3ow">30.38</td>
+    <td class="tg-c3ow">71.22</td>
+    <td class="tg-c3ow">69.54</td>
+    <td class="tg-c3ow">80.30</td>
+  </tr>
+  <tr>
+    <td class="tg-c3ow">InfoVQA</td>
+    <td class="tg-c3ow">20.52</td>
+    <td class="tg-c3ow">46.30</td>
+    <td class="tg-c3ow">53.62</td>
+    <td class="tg-c3ow">21.35</td>
+    <td class="tg-c3ow">24.46</td>
+    <td class="tg-c3ow">41.18</td>
+    <td class="tg-c3ow">38.24</td>
+    <td class="tg-c3ow">46.40</td>
+  </tr>
+  <tr>
+    <td class="tg-c3ow">OCR Average</td>
+    <td class="tg-c3ow">42.28</td>
+    <td class="tg-c3ow">67.53</td>
+    <td class="tg-c3ow">71.15</td>
+    <td class="tg-c3ow">30.94</td>
+    <td class="tg-c3ow">32.85</td>
+    <td class="tg-c3ow">53.92</td>
+    <td class="tg-c3ow">61.04</td>
+    <td class="tg-c3ow">67.50</td>
+  </tr>
+</tbody></table>
+
+### Quick Start
+> [!NOTE]
+> Follow below packages list for setting up the inference environment.
+> ```bash
+> pip==25.0
+> wheel==0.45.1
+> setuptools==75.8.0
+> torch==2.6.0
+> torchvision==0.21.0
+> transformers==4.49.0
+> einops==0.8.0
+> ```
+
+```python
+import torch
+from transformers import AutoTokenizer, AutoProcessor, AutoConfig, AutoModelForCausalLM
+
+from PIL import Image
+import requests
+from io import BytesIO
+
+def load_image(image_file):
+    if image_file.startswith("http") or image_file.startswith("https"):
+        response = requests.get(image_file)
+        image = Image.open(BytesIO(response.content)).convert("RGB")
+    else:
+        image = Image.open(image_file).convert("RGB")
+    return image
+
+
+config = AutoConfig.from_pretrained("amd/Instella-VL-1B", trust_remote_code=True)
+tokenizer = AutoTokenizer.from_pretrained("amd/Instella-VL-1B", config=config, trust_remote_code=True)
+processor = AutoProcessor.from_pretrained("amd/Instella-VL-1B", trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained("amd/Instella-VL-1B", trust_remote_code=True).to('cuda') # or 'cpu'
+model.eval()
+  
+# For single image and text
+query="Describe the image."
+image=load_image("path/to/your_image") # can be a https:// url
+out = processor.encode(query, image, model.get_vision_tower().image_processor, tokenizer, config)
+inputs = {k: v.to(model.device) for k, v in out.items() if isinstance(v, torch.Tensor)}
+with torch.inference_mode():
+    output_ids = model.generate(inputs["input_ids"], images=inputs['image_tensor'], image_sizes=out['image_sizes'], do_sample=True, num_beams=1, temperature=0.2, max_new_tokens=1024, use_cache=True, stopping_criteria=out['stopping_criteria'], eos_token_id=out['eos_token_id'])
+outputs = processor.decode(output_ids)
+print("InstellaVL: ", outputs)
+
+# For batch of images and text.
+query=["Describe the image.", "What is the color of the dog?"]
+image=[load_image("../assets/images/instellavl.png"), load_image("../assets/images/example2_dog.jpg")]
+outs = processor.batch_encode(query, image, model.get_vision_tower().image_processor, tokenizer, config)
+
+for idx, o in enumerate(outs):
+    ins = {k: v.to(model.device) for k, v in o.items() if isinstance(v, torch.Tensor)}
+    with torch.inference_mode():
+        output_ids = model.generate(ins["input_ids"],
+                                    images=ins['image_tensor'],
+                                    image_sizes=o['image_sizes'],
+                                    do_sample=True,
+                                    num_beams=1,
+                                    temperature=0.2,
+                                    max_new_tokens=1024,
+                                    use_cache=True,
+                                    stopping_criteria=o['stopping_criteria'],
+                                    eos_token_id=o['eos_token_id'])
+    outputs = processor.decode(output_ids)
+    print("Query: ", query[idx])
+    print("InstellaVL: ", outputs)
+```
+
+<details>
+ <summary><b>TL;DR</b>: Loading from locally saved checkpoint</summary>
+ <p><strong>Note:</strong> Do <code>pip install -e . --no-deps</code> to register/include for InstellaVL repo as <code>instellavl</code> package into Python package list.</p>
+ 
+ ``` python
+ import torch
+ 
+ # Import essential modules
+ from instellavl.constants import DEFAULT_IMAGE_TOKEN, IMAGE_TOKEN_INDEX
+ from instellavl.conversation import conv_templates, SeparatorStyle
+ from instellavl.model.builder import load_pretrained_model
+ from instellavl.utils import disable_torch_init
+ from instellavl.mm_utils import process_images, tokenizer_image_token, get_model_name_from_path
+ 
+ from PIL import Image
+ 
+ import requests
+ from io import BytesIO
+ 
+ # Login into HF Hub
+ from huggingface_hub import login
+ login(token = "<Your HFtoken id>") # Enter your token 
+ 
+ def load_image(image_file):
+     if image_file.startswith("http") or image_file.startswith("https"):
+         response = requests.get(image_file)
+         image = Image.open(BytesIO(response.content)).convert("RGB")
+     else:
+         image = Image.open(image_file).convert("RGB")
+     return image
+ 
+ #
+ # ========= CHANGE IMAGE and Query only HERE ============
+ image_file = '/path/to/Instella-VL-repo/assets/images/example2_dog.jpg' # Enter the test image path
+ query = 'Describe this image.'
+ # =======================================================
+ 
+ disable_torch_init()
+ conv_mode = 'instella'
+ 
+ # Model loading
+ model_path = '<path/to/model-checkpoint-saved-locally>' # Enter your model path, should contain instellavl substring in the name.
+ model_name = get_model_name_from_path(model_path)
+ tokenizer, model, image_processor, context_len = load_pretrained_model(model_path, None, model_name, False, False)
+ model.eval()
+ model = model.to('cuda') # change to 'cpu' if not 'cuda'
+ 
+ # Image pre-processing
+ image = load_image(image_file)
+ image_tensor = process_images([image], image_processor, model.config)
+ image_tensor = image_processor.preprocess(image, return_tensors="pt")["pixel_values"].to(model.dtype)
+ 
+ # Text pre-processing - follow the below logic too when there is no Image:
+ # if images is not None and len(image_tensor) != 0 and DEFAULT_IMAGE_TOKEN not in text:
+ #     question = DEFAULT_IMAGE_TOKEN + "\n" + text
+ # else:
+ #     question = text
+ query = query.replace(DEFAULT_IMAGE_TOKEN, "").strip()
+ question = DEFAULT_IMAGE_TOKEN + "\n" + query
+ conv = conv_templates[conv_mode].copy()
+ conv.append_message(conv.roles[0], question)
+ conv.append_message(conv.roles[1], None)
+ prompt_question = conv.get_prompt()
+ 
+ # Final arrangements required
+ input_ids = tokenizer_image_token(prompt_question, tokenizer, IMAGE_TOKEN_INDEX, return_tensors="pt").unsqueeze(0)
+ keywords = [conv.sep]
+ image_sizes = [image.size]
+ stopping_criteria = [KeywordsStoppingCriteria(keywords, tokenizer, input_ids)]
+ terminators = [tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("|||IP_ADDRESS|||")]
+ 
+ with torch.inference_mode():
+     output_ids = model.generate(input_ids.to(model.device), images=image_tensor.to(model.device), image_sizes=image_sizes, do_sample=True, num_beams=1, temperature=0.2, max_new_tokens=1024, use_cache=True, stopping_criteria=stopping_criteria, eos_token_id=terminators)
+ 
+ outputs = tokenizer.decode(output_ids[0, input_ids.shape[1] :]).strip()
+ print("InstellaVL: ", outputs)
+ ```
+</details>
+
+## Model Architecture
+
+| Parts        | Parameter size   | Number of layers  | Number of heads	| Hidden size	| Patch Size  |
+| ------------- |:-------------:|:-----:|:-----:|:-----:|:-----:|
+| Vision Encoder | 300M | 24|  16 | 1024 | 14 |
+| MLP | 6.3M | 2 | - | 2048 | - |
+| LM | 1.2B | 16 |	16 |	2048 |	- |
+
+We initialize the vision encoder from [CLIP-ViT-L/14@336](https://huggingface.co/openai/clip-vit-large-patch14-336) and initialize LM from [AMD OLMo 1B SFT](https://huggingface.co/amd/AMD-OLMo-1B-SFT)
+
+## Training Stages
+
+| Stages        | MLP Warmup           | Pretraining  | Instruction Tuning  |
+| ------------- |:-------------:|:-----:|:-----:|
+| Tunable Parts | Adapter | Entire Model | Entire Model |
+
+## Hardware
+Training was conducted with up to 4 nodes, totaling 32 GPUs. Each node comprises [8 AMD Instinct™ MI300X GPUs](https://www.amd.com/en/products/accelerators/instinct/mi300/mi300x.html) 
+
+**MLP warmup**: 1 node  
+**Pretraining**: 2 nodes  
+**Finetune**: 4 nodes 
+
+## Datasets
+
+### MLP Warmup
+[BLIP558K](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain)
+
+<h3 align="center">Pretraining Stage</h3>
+
+| **Domain** | **Datasets** | **Num of Examples** | **Licenses** |
+|---|:---:|---:|:---|
+| Image Captions | [BLIP150K](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain), [COCO118K](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain), [CC3M-Recap](https://huggingface.co/datasets/lmms-lab/LLaVA-ReCap-CC3M),  [Pixmo_Cap](https://huggingface.co/datasets/allenai/pixmo-cap) | 3.52M | BSD 3-Clause for BLIP150K, COCO118K; Apache 2 for CC3M-Recap; ODC-BY-1.0 for Pixmo_Cap; see source materials for CC3M-Recap |
+| OCR | [SynthDog_EN](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Mid-Data), [SynthDog_ZH](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Mid-Data), [UReader](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Mid-Data), [ART](https://rrc.cvc.uab.es/?ch=14&com=downloads), [COCO-Text](https://bgshih.github.io/cocotext/), [HierText](https://github.com/google-research-datasets/hiertext), [Uber-Text](https://s3-us-west-2.amazonaws.com/uber-common-public/ubertext/index.html), [TextOCR](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data), [OpenVINO](https://github.com/openvinotoolkit/cvat), [MLT-17](https://rrc.cvc.uab.es/?ch=8&com=downloads) | 913K | Apache 2 for SynthDog_EN, SynthDog_ZH, UReader, TextOCR, OpenVINO; CC By 4.0 for COCO-Text; CC BY-SA 4.0 for HierText, Uber-Text; See source materials for ART, MLT-17 |
+| Doc | [DocVQA](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data), [DocStruct4M](https://huggingface.co/datasets/mPLUG/DocStruct4M) | 410K | Apache 2 |
+| Table & Chart & Plot | [Chart2Text](https://github.com/vis-nlp/Chart-to-text/tree/main/pew_dataset/dataset/imgs), [UniChart](https://huggingface.co/datasets/ahmed-masry/unichart-pretrain-data), [PlotQA](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data), [WidgetCaption](https://huggingface.co/datasets/rootsautomation/RICO-WidgetCaptioning?row=0), [Screen2Words](https://huggingface.co/datasets/rootsautomation/RICO-Screen2Words), [SciGraphQA-295K](https://huggingface.co/datasets/alexshengzhili/SciGraphQA-295K-train), [Paper2Fig100K](https://zenodo.org/records/7299423#.Y2lzonbMKUl), [MMC Instruction](https://huggingface.co/datasets/xywang1/MMC/viewer/MMC-Instruction), [M-Paper](https://huggingface.co/datasets/mPLUG/M-Paper) | 1.97M | GPL-3.0 for Chart2Text; MIT for UniChart, SciGraphQA-295K; Apache 2 for PlotQA, M-Paper; CC By 4.0 for WidgetCaption, Screen2Words, Paper2Fig100K; CC BY-SA 4.0 for MMC Instruction |
+| Text Only | [Evol-Instruct-GPT-4](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Mid-Data/tree/main/evol_instruct) | 70K | Apache 2 |
+
+<h3 align="center">Instruction-tuning Stage</h3>
+
+| **Domain** | **Datasets** | **Num of Examples** | **Licenses** |
+|---|:---:|---:|:---|
+| General | [AOKVQA, CLEVR, Hateful Memes, Image Textualization, OKVQA, ScienceQA, ShareGPT-4V, TallyQA, Visual7W, VizWiz, VQAv2, WebSight, ALLaVA Instruct, Cambrian, COCO Caption, IconQA, LLaVA-158K, LLaVAR, RefCOCO, ShareGPT-4O, Vision FLAN, VisText, VQARAD, VSR, InterGPS](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data), [Image-Paragraph-Captioning, ImageNet, COCO-GOI, COCO-ITM, Visual Dialog, SNLI-VE](https://huggingface.co/datasets/MMInstruction/M3IT), [Web-Landmark, Web-Celebrity, SAM, LAION-GPT-4V-Dataset, OODVQA]( https://huggingface.co/datasets/nyu-visionx/Cambrian-10M/tree/main), [Pixmo_Cap](https://huggingface.co/datasets/allenai/pixmo-cap), [Pixmo_Count](https://huggingface.co/datasets/allenai/pixmo-count), [Pixmo_Points](https://huggingface.co/datasets/allenai/pixmo-points), [Pixmo_Ask_Model_Anything](https://huggingface.co/datasets/allenai/pixmo-ask-model-anything),   [SVIT_Core_150K](https://huggingface.co/datasets/BAAI/SVIT), [Localized Narratives](https://huggingface.co/datasets/HuggingFaceM4/the_cauldron) | 2.66M | see source materials for Image-Paragraph-Captioning, ImageNet, COCO-GOI, COCO-ITM, Visual Dialog, SNLI-VE; ODC-BY-1.0 for Pixmo_Cap, Pixmo_Count, Pixmo_Points, Pixmo_Ask_Model_Anything; CC By 4.0 for SVIT_Core_150K, Localized Narratives; Apache 2 for rest of the datasets; |
+| Table & Chart & Screen | [AI2D, ChartQA, DocVQA, FigureQA, InfographicVQA, RoBUT-SQA, RoBUT-WTQ, TQA, UReader IE, UReader QA, Chart2Text, , Diagram Image2Text, DVQA, HiTab, LRV Chart, RoBUT WikiSQL, Screen2Words, UReader Caption, UReader KG, VisualMRC](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data), [TinyChartData](https://huggingface.co/datasets/mPLUG/TinyChartData) | 866K | Apache 2 |
+| Doc | [ArxivQA](https://huggingface.co/datasets/MMInstruction/ArxivQA), [DocDownstream-1.0](https://huggingface.co/datasets/mPLUG/DocDownstream-1.0), [DocReason25K](https://huggingface.co/datasets/mPLUG/DocReason25K), [DocStruct4M](https://huggingface.co/datasets/mPLUG/DocStruct4M), [Pixmo_Docs](https://huggingface.co/datasets/allenai/pixmo-docs) | 522K | CC BY-SA 4.0 for ArxivQA; Apache 2 for DocDownstream-1.0, DocReason25K, DocStruct4M; ODC-BY-1.0 for Pixmo_Docs |
+| General OCR | [ChromeWriting, IIIT5K, K12 Printing, Rendered Text, TextCaps, HME100K, IAM, TextOCR-GPT-4V](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data), [SynthDog-EN](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Mid-Data) | 84K | Apache 2 |
+| Math & Reasoning | [MAVIS Manual Collection, CLEVR-Math, Geo170K QA, GEOS, GeoMVerse, MapQA, Super-CLEVR, UniGeo, LRV Normal, Visual Genome, MAVIS Data Engine, Geo170K Align, Geometry3K, GeoQA+, TabMWP, GQA, RAVEN, MathVision, KVQA, VCR](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data), [FinQA](https://huggingface.co/datasets/HuggingFaceM4/the_cauldron), [Design2Code, IDK](https://huggingface.co/datasets/nyu-visionx/Cambrian-10M/) | 460K | CC By 4.0 for FinQA; Apache 2 for rest of the datasets |
+| Others | [IQA, MOCHEG, Shapes](https://huggingface.co/datasets/MMInstruction/M3IT), [ALFWorld, Q-Instruct-DB](https://huggingface.co/datasets/nyu-visionx/Cambrian-10M/) | 479K | see source materials for IQA, MOCHEG, Shapes; Apache 2 for ALFWorld, Q-Instruct-DB |
+| Text Only | [MathQA, Magpie Pro (L3 MT), Magpie Pro (Qwen2 ST), Magpie Pro (L3 ST)](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data) | 480K | Apache 2 |
+
+> [!NOTE]
+> Further, to strengthen model’s understanding of science-based and general reasoning questions, as identified through error analysis, we oversampled (almost doubled the volume) specific datasets from the SFT dataset pool as detailed below.
+> 
+> Oversampled (~2x sampling rate): ScienceQA, AI2D, PMC-VQA, Cambrian, and TQA
+>
+> Further information concerning the training datasets, including applicable licensing terms and use restrictions, may be located at the linked source location.
+
+
+For the details of training hyperparameters, please check [our github repo](https://github.com/AMD-AIG-AIMA/Instella-VL)
+
+## Contributors
+**Core contributors:** [Ximeng Sun](https://sunxm2357.github.io/), [Aditya Kumar Singh](https://rodosingh.github.io), [Gowtham Ramesh](https://www.linkedin.com/in/gowtham1/), [Zicheng Liu](https://zicliu.wixsite.com/mysite) 
+
+**Contributors:** [Pratik Prabhanjan Brahma](https://www.linkedin.com/in/pratik-p-brahma/), [Ze Wang](https://www.linkedin.com/in/ze-wang-1379601a5/), [Jiang Liu](https://joellliu.github.io/), [Jialian Wu](https://jialianwu.com/), [Prakamya Mishra](https://prakamya-mishra.github.io/), [Xiaodong Yu](https://www.xiaodongyu.me/), [Yusheng Su](https://yushengsu-thu.github.io/), [Sudhanshu Ranjan](https://www.linkedin.com/in/sudhanshu-ranjan-33a216124), [Emad Barsoum](https://www.linkedin.com/in/ebarsoum/)
+
+
+##  Bias, Risks, and Limitations
+This model is made accessible without any safety guarantees. Users should be aware that the model may generate outputs that are sensitive, inaccurate, harmful, biased, or otherwise objectionable based on user prompts. It is crucial for users to conduct comprehensive safety evaluations, implement safety filtering, and verify the model's outputs to mitigate these risks.
+
+##  License
+See Files for license and any notices. 
+
+##  Citing
+
+```bibtex
+@misc{Instella-VL-1B, 
+    title = {Instella-VL-1B: First AMD Vision Language Model}, 
+    url = {https://huggingface.co/amd/Instella-VL-1B}, 
+    author = {Ximeng Sun, Aditya Singh, Gowtham Ramesh, Jiang Liu, Ze Wang, Sudhanshu Ranjan, Pratik Prabhanjan Brahma, Prakamya Mishra,  Jialian Wu, Xiaodong Yu, Yusheng Su, Emad Barsoum, Zicheng Liu}, 
+    month = {March}, 
+    year = {2025} 
+} 
+```
--- a/chat_template.json
+++ b/chat_template.json
@ -0,0 +1,2 @@
+{"chat_template": "|||IP_ADDRESS|||\n{% for message in messages -%}{{ message['role'] + message['content']}}{%- if not loop.last -%}{{ '\\n' if loop.index % 2 == 1 else '|||IP_ADDRESS|||\\n'}}{%- endif %}{%- endfor -%}"
+}
--- a/config.json
+++ b/config.json
@ -0,0 +1,100 @@
+{
+  "_name_or_path": "/home/goramesh/local/gramesh/Instella-VL-1B/",
+  "architectures": [
+    "InstellaVLForCausalLM"
+  ],
+  "auto_map": {
+    "AutoConfig": "modeling_instellavl.InstellaVLConfig",
+    "AutoModelForCausalLM": "modeling_instellavl.InstellaVLForCausalLM"
+  },
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "clip_qkv": null,
+  "eos_token_id": 50279,
+  "hidden_act": "silu",
+  "hidden_size": 2048,
+  "image_aspect_ratio": "anyres",
+  "image_crop_resolution": null,
+  "image_grid_pinpoints": [
+    [
+      336,
+      336
+    ],
+    [
+      336,
+      672
+    ],
+    [
+      336,
+      1008
+    ],
+    [
+      336,
+      1344
+    ],
+    [
+      336,
+      1680
+    ],
+    [
+      672,
+      336
+    ],
+    [
+      672,
+      672
+    ],
+    [
+      1008,
+      336
+    ],
+    [
+      1344,
+      336
+    ],
+    [
+      1680,
+      336
+    ]
+  ],
+  "image_split_resolution": null,
+  "initializer_range": 0.02,
+  "intermediate_size": 8192,
+  "max_position_embeddings": 2048,
+  "mm_anyres_choose_method": "best_fit",
+  "mm_compact_visual_tokens": false,
+  "mm_downsample_ratio": 1,
+  "mm_hidden_size": 1024,
+  "mm_newline_position": "one_token",
+  "mm_patch_merge_type": "spatial_unpad",
+  "mm_projector_lr": null,
+  "mm_projector_type": "mlp2x_gelu",
+  "mm_resampler_type": null,
+  "mm_spatial_pool_mode": "bilinear",
+  "mm_tunable_parts": "mm_vision_tower,mm_mlp_adapter,mm_language_model",
+  "mm_use_im_patch_token": false,
+  "mm_use_im_start_end": false,
+  "mm_vision_select_feature": "patch",
+  "mm_vision_select_layer": -2,
+  "mm_vision_tower": "openai/clip-vit-large-patch14-336",
+  "mm_vision_tower_lr": null,
+  "model_type": "instellavl",
+  "num_attention_heads": 16,
+  "num_hidden_layers": 16,
+  "num_key_value_heads": 16,
+  "online_training": true,
+  "pad_token_id": 1,
+  "pos_skipping_range": 4096,
+  "rope_scaling": null,
+  "rope_theta": 10000.0,
+  "tie_word_embeddings": true,
+  "tokenizer_model_max_length": 32768,
+  "tokenizer_padding_side": "right",
+  "torch_dtype": "float16",
+  "transformers_version": "4.45.1",
+  "use_cache": true,
+  "use_mm_proj": true,
+  "use_pos_skipping": false,
+  "vision_tower_pretrained": null,
+  "vocab_size": 50282
+}
--- a/conversation.py
+++ b/conversation.py
@ -0,0 +1,334 @@
+# Modification Copyright© 2025 Advanced Micro Devices, Inc. All rights reserved.
+
+import re
+import base64
+import dataclasses
+
+from PIL import Image
+from io import BytesIO
+from enum import auto, Enum
+from typing import List, Any, Dict, Union, Tuple
+
+from transformers import AutoTokenizer
+
+
+class SeparatorStyle(Enum):
+    """Different separator style."""
+
+    SINGLE = auto()
+    MPT = auto()
+    INSTELLA = auto()
+
+
+@dataclasses.dataclass
+class Conversation:
+    r"""A class that keeps all conversation history."""
+
+    system: str
+    roles: List[str]
+    messages: List[List[str]]
+    offset: int
+    sep_style: SeparatorStyle = SeparatorStyle.SINGLE
+    sep: str = "###"
+    sep2: str = None
+    version: str = "Unknown"
+
+    tokenizer_id: str = ""
+    tokenizer: Any = None
+    # Stop criteria (the default one is EOS token)
+    stop_str: Union[str, List[str]] = None
+    # Stops generation if meeting any token in this list
+    stop_token_ids: List[int] = None
+
+    skip_next: bool = False
+
+    def get_prompt(self):
+        """
+        Generates a formatted prompt string based on the messages and separator style.
+        The function processes the messages stored in the instance, applies specific formatting rules 
+        based on the separator style, and returns the resulting prompt string.
+        
+        Returns:
+            `str`: The formatted prompt string.
+        
+        Raises:
+            `ValueError`: If an invalid separator style is specified.
+        """
+
+        messages = self.messages
+        if len(messages) > 0 and type(messages[0][1]) is tuple:
+            messages = self.messages.copy()
+            init_role, init_msg = messages[0].copy()
+            init_msg = init_msg[0]
+            if "mmtag" in self.version:
+                init_msg = init_msg.replace("<image>", "").strip()
+                messages[0] = (init_role, init_msg)
+                messages.insert(0, (self.roles[0], "<Image><image></Image>"))
+                messages.insert(1, (self.roles[1], "Received."))
+            elif not init_msg.startswith("<image>"):
+                init_msg = init_msg.replace("<image>", "").strip()
+                messages[0] = (init_role, "<image>\n" + init_msg)
+            else:
+                messages[0] = (init_role, init_msg)
+
+        if self.sep_style == SeparatorStyle.SINGLE:
+            ret = self.system + self.sep
+            for role, message in messages:
+                if message:
+                    if type(message) is tuple:
+                        message, _, _ = message
+                    ret += role + ": " + message + self.sep
+                else:
+                    ret += role + ":"
+
+        elif self.sep_style == SeparatorStyle.MPT:
+            ret = self.system + self.sep
+            for role, message in messages:
+                if message:
+                    if type(message) is tuple:
+                        message, _, _ = message
+                    ret += role + message + self.sep
+                else:
+                    ret += role
+
+        elif self.sep_style == SeparatorStyle.INSTELLA:
+            seps = [self.sep, self.sep2]
+            ret = "|||IP_ADDRESS|||" 
+            for i, (role, message) in enumerate(messages):
+                if message:
+                    if type(message) is tuple:
+                        message, _, _ = message
+                    if i % 2 == 1:
+                        message = message.strip()
+                    ret += role + message + seps[i % 2]
+                else:
+                    ret += role 
+        else:
+            raise ValueError(f"Invalid style: {self.sep_style}")
+
+        return ret
+
+    def append_message(self, role, message):
+        self.messages.append([role, message])
+
+    def process_image(self, image: Union[str, Image.Image], image_process_mode: str, return_pil: bool=False, image_format: str="PNG")->Union[str, Image.Image]:
+        r"""
+        Processes an image according to the specified mode and returns either a PIL image or a base64 encoded string.
+        
+        Args:
+            - image (Union[str, Image.Image]): The image to be processed. Can be a file path or a PIL Image object.
+            - image_process_mode (str): The mode of image processing. Options are "Pad", "Default", "Crop", or "Resize".
+            - return_pil (bool, optional): If True, returns a PIL Image object. If False, returns a base64 encoded string. Defaults to False.
+            - image_format (str, optional): The format to save the image in if returning a base64 encoded string. Defaults to "PNG".
+        
+        Returns:
+            Union[str, Image.Image]: The processed image, either as a PIL Image object or a base64 encoded string.
+        
+        Raises:
+            ValueError: If an invalid image_process_mode is provided.
+        """
+        
+        if image_process_mode == "Pad":
+
+            def expand2square(pil_img, background_color=(122, 116, 104)):
+                width, height = pil_img.size
+                if width == height:
+                    return pil_img
+                elif width > height:
+                    result = Image.new(pil_img.mode, (width, width), background_color)
+                    result.paste(pil_img, (0, (width - height) // 2))
+                    return result
+                else:
+                    result = Image.new(pil_img.mode, (height, height), background_color)
+                    result.paste(pil_img, ((height - width) // 2, 0))
+                    return result
+
+            image = expand2square(image)
+        elif image_process_mode in ["Default", "Crop"]:
+            pass
+        elif image_process_mode == "Resize":
+            image = image.resize((336, 336))
+        else:
+            raise ValueError(f"Invalid image_process_mode: {image_process_mode}")
+
+        if type(image) is not Image.Image:
+            image = Image.open(image).convert("RGB")
+
+        max_hw, min_hw = max(image.size), min(image.size)
+        aspect_ratio = max_hw / min_hw
+        max_len, min_len = 672, 448
+        shortest_edge = int(min(max_len / aspect_ratio, min_len, min_hw))
+        longest_edge = int(shortest_edge * aspect_ratio)
+        W, H = image.size
+        if H > W:
+            H, W = longest_edge, shortest_edge
+        else:
+            H, W = shortest_edge, longest_edge
+        image = image.resize((W, H))
+        if return_pil:
+            return image
+        else:
+            buffered = BytesIO()
+            image.save(buffered, format=image_format)
+            img_b64_str = base64.b64encode(buffered.getvalue()).decode()
+            return img_b64_str
+
+    def get_images(self, return_pil: bool=False, return_path: bool=False) -> List[Union[str, Image.Image]]:
+        """
+        Retrieve images from the conversation messages.
+
+        Args:
+            return_pil (bool): If True, return images as PIL objects. Defaults to False.
+            return_path (bool): If True, return the image file paths instead of processing them. Defaults to False.
+
+        Returns:
+            list: A list of images or image paths depending on the arguments.
+        """
+        images = []
+        for i, (role, msg) in enumerate(self.messages[self.offset :]):
+            if i % 2 == 0:
+                if type(msg) is tuple:
+                    msg, image, image_process_mode = msg
+                    if type(image) != list:
+                        image = [image]
+                    for img in image:
+                        if not return_path and self.is_image_file(img):
+                            img = self.process_image(img, image_process_mode, return_pil=return_pil)
+                        else:
+                            images.append(img)
+        return images
+
+    def is_image_file(self, filename: str)->bool:
+        image_extensions = [".png", ".jpg", ".jpeg", ".gif", ".bmp", ".tiff", ".webp"]
+        return any(filename.lower().endswith(ext) for ext in image_extensions)
+
+    def is_video_file(self, filename: str)->bool:
+        video_extensions = [".mp4", ".mov", ".avi", ".mkv", ".wmv", ".flv", ".mpeg", ".mpg"]
+        return any(filename.lower().endswith(ext) for ext in video_extensions)
+
+    def to_gradio_chatbot(self)->list:
+        ret = []
+        for i, (role, msg) in enumerate(self.messages[self.offset :]):
+            if i % 2 == 0:
+                if type(msg) is tuple:
+                    msg, image, image_process_mode = msg
+                    if type(image) != list:
+                        image = [image]
+                    if len(image) == 1:
+                        msg = "<image>\n" + msg.replace("<image>", "").strip()
+                    else:
+                        msg = re.sub(r"(<image>)\n(?=<image>)", r"\1 ", msg)
+
+                    img_str_list = []                         
+                    for img in image:
+                        if self.is_image_file(img):
+                            img_b64_str = self.process_image(img, "Default", return_pil=False, image_format="JPEG")
+                            img_str = f'<img src="data:image/jpeg;base64,{img_b64_str}" style="max-width: 256px; max-height: 256px; width: auto; height: auto; object-fit: contain;"/>'
+                            img_str_list.append(img_str)
+                        elif self.is_video_file(img):
+                            ret.append(((img,), None))
+
+                    msg = msg.strip()
+                    img_place_holder = ""
+                    for img_str in img_str_list:
+                        img_place_holder += f"{img_str}\n\n"
+
+                    if len(img_str_list) > 0:
+                        msg = f"{img_place_holder}\n\n{msg}"
+
+                    if len(msg) > 0:
+                        ret.append([msg, None])
+                else:
+                    ret.append([msg, None])
+            else:
+                ret[-1][-1] = msg
+        return ret
+
+    def copy(self)->"Conversation":
+        return Conversation(system=self.system, roles=self.roles, messages=[[x, y] for x, y in self.messages], offset=self.offset, sep_style=self.sep_style, sep=self.sep, sep2=self.sep2, version=self.version)
+
+    def dict(self)->Dict[str, Any]:
+        if len(self.get_images()) > 0:
+            return {
+                "system": self.system,
+                "roles": self.roles,
+                "messages": [[x, y[0] if type(y) is tuple else y] for x, y in self.messages],
+                "offset": self.offset,
+                "sep": self.sep,
+                "sep2": self.sep2,
+            }
+        return {
+            "system": self.system,
+            "roles": self.roles,
+            "messages": self.messages,
+            "offset": self.offset,
+            "sep": self.sep,
+            "sep2": self.sep2,
+        }
+
+
+conv_vicuna_v0 = Conversation(
+    system="A chat between a curious human and an artificial intelligence assistant. " "The assistant gives helpful, detailed, and polite answers to the human's questions.",
+    roles=("Human", "Assistant"),
+    messages=[
+        ["Human", "What are the key differences between renewable and non-renewable energy sources?"],
+        [
+            "Assistant",
+            "Renewable energy sources are those that can be replenished naturally in a relatively "
+            "short amount of time, such as solar, wind, hydro, geothermal, and biomass. "
+            "Non-renewable energy sources, on the other hand, are finite and will eventually be "
+            "depleted, such as coal, oil, and natural gas. Here are some key differences between "
+            "renewable and non-renewable energy sources:\n"
+            "1. Availability: Renewable energy sources are virtually inexhaustible, while non-renewable "
+            "energy sources are finite and will eventually run out.\n"
+            "2. Environmental impact: Renewable energy sources have a much lower environmental impact "
+            "than non-renewable sources, which can lead to air and water pollution, greenhouse gas emissions, "
+            "and other negative effects.\n"
+            "3. Cost: Renewable energy sources can be more expensive to initially set up, but they typically "
+            "have lower operational costs than non-renewable sources.\n"
+            "4. Reliability: Renewable energy sources are often more reliable and can be used in more remote "
+            "locations than non-renewable sources.\n"
+            "5. Flexibility: Renewable energy sources are often more flexible and can be adapted to different "
+            "situations and needs, while non-renewable sources are more rigid and inflexible.\n"
+            "6. Sustainability: Renewable energy sources are more sustainable over the long term, while "
+            "non-renewable sources are not, and their depletion can lead to economic and social instability.\n",
+        ],
+    ],
+    offset=2,
+    sep_style=SeparatorStyle.SINGLE,
+    sep="###",
+)
+
+conv_mpt = Conversation(
+    system="""<|im_start|>system
+A conversation between a user and an LLM-based AI assistant. The assistant gives helpful and honest answers.""",
+    roles=("<|im_start|>user\n", "<|im_start|>assistant\n"),
+    version="mpt",
+    messages=[],
+    offset=0,
+    sep_style=SeparatorStyle.MPT,
+    sep="<|im_end|>",
+)
+
+conv_instella = Conversation(
+    system="",
+    roles=("<|user|>\n", "<|assistant|>\n"),
+    version="instella",
+    messages=(),
+    offset=0,
+    sep_style=SeparatorStyle.INSTELLA,
+    sep="\n",
+    sep2='|||IP_ADDRESS|||\n'
+)
+
+
+default_conversation = conv_instella
+conv_templates = {
+    "default": conv_instella,
+    "mpt": conv_mpt,
+    "instella": conv_instella,
+}
+
+
+if __name__ == "__main__":
+    print(default_conversation.get_prompt())
--- a/generation_config.json
+++ b/generation_config.json
@ -0,0 +1,6 @@
+{
+  "_from_model_config": true,
+  "eos_token_id": 50279,
+  "pad_token_id": 1,
+  "transformers_version": "4.45.1"
+}
--- a/image_processing_instellavl.py
+++ b/image_processing_instellavl.py
@ -0,0 +1,30 @@
+from typing import List
+from PIL.Image import Image
+from transformers import CLIPImageProcessor
+from transformers.image_processing_utils import BaseImageProcessor
+from .mm_utils import process_images
+
+# TODO can inherit from CLIPImageProcessor instead and use the process function directly.
+class InstellaVLImageProcessor(BaseImageProcessor):
+    r"""
+    Pre-process images
+    """
+    def __init__(self, **kwargs):
+        super().__init__(**kwargs)
+
+    def process(self,
+                images: List[Image],
+                processor: CLIPImageProcessor,
+                model_cfg: dict
+    ):
+        image_tensors = process_images(images, processor, model_cfg)
+        if images is None:
+            return {
+            "pixel_values": None,
+            }
+        else:
+            return{
+                "pixel_values": image_tensors,
+            }
+
+InstellaVLImageProcessor.register_for_auto_class()
--- a/mm_utils.py
+++ b/mm_utils.py
@ -0,0 +1,519 @@
+# Modification Copyright© 2025 Advanced Micro Devices, Inc. All rights reserved.
+
+r"""This module provides various utility functions for processing images, including resizing, cropping, padding, 
+and extracting patches. It also includes functions for processing images with different resolutions and 
+tokenizing image prompts."""
+
+import re
+import ast
+import math
+import torch
+import base64
+import torch.distributed as dist
+
+from PIL import Image
+from io import BytesIO
+from typing import List, Tuple, Union, Any
+from transformers import StoppingCriteria, PreTrainedTokenizer
+
+IGNORE_INDEX = -100
+IMAGE_TOKEN_INDEX = -200
+DEFAULT_IMAGE_TOKEN = "<image>"
+DEFAULT_IMAGE_PATCH_TOKEN = "<im_patch>"
+DEFAULT_IM_START_TOKEN = "<im_start>"
+DEFAULT_IM_END_TOKEN = "<im_end>"
+
+def resize_and_center_crop(image: Image.Image, shortest_edge_length: int) -> Image.Image:
+    r"""
+    Resize the given image such that its shortest edge matches the specified length,
+    and then center crop it to a square of the same size.
+    
+    Args:
+        - image (`Image.Image`): The input image to be resized and cropped.
+        - shortest_edge_length (`int`): The length of the shortest edge after resizing.
+    
+    Returns:
+        `Image.Image`: The resized and center-cropped image.
+    """
+    
+    # Calculate new dimensions and resize
+    aspect_ratio = float(image.width) / float(image.height)
+    if (aspect_ratio > 1):
+        new_width = int(shortest_edge_length * aspect_ratio)
+        new_height = shortest_edge_length
+    else:
+        new_width = shortest_edge_length
+        new_height = int(shortest_edge_length / aspect_ratio)
+    resized_image = image.resize((new_width, new_height), Image.ANTIALIAS)
+
+    # Calculate the position and perform the center crop
+    left = (new_width - shortest_edge_length) / 2
+    top = (new_height - shortest_edge_length) / 2
+    right = (new_width + shortest_edge_length) / 2
+    bottom = (new_height + shortest_edge_length) / 2
+    cropped_image = resized_image.crop((left, top, right, bottom))
+
+    return cropped_image
+
+
+def auto_pad_images(image: Image.Image, grid_params: list) -> Image.Image:
+    r"""
+    Automatically pads an input image to match the closest aspect ratio from a list of grid parameters.
+    
+    Args:
+        - image (`Image.Image`): The input image to be padded. Must be a Pillow Image object.
+        - grid_params (`list`): A list of integers representing the grid parameters to determine the target aspect ratio.
+    
+    Returns:
+        `Image.Image`: The padded image with the closest aspect ratio from the grid parameters.
+    
+    Raises:
+        `AssertionError`: If the input is not a Pillow Image object or if the grid parameters list is empty.
+    """
+
+    assert isinstance(image, Image.Image), "Input should be a Pillow Image"
+    assert len(grid_params) > 0, "Grid parameters should not be empty"
+
+    # Step 1: Calculate and find the closest aspect ratio
+    input_width, input_height = image.size
+    input_aspect_ratio = input_width / input_height
+    candidate_resolutions = [(w / h, w, h) for w in grid_params for h in grid_params]
+    closest_aspect_ratio = min(candidate_resolutions, key=lambda x: abs(input_aspect_ratio - x[0]))
+
+    candidate_resolutions = [(x[1], x[2]) for x in candidate_resolutions if abs(x[0] - closest_aspect_ratio[0]) < 1e-3]
+
+    target_resolution = min(candidate_resolutions, key=lambda res: abs(max(input_width, input_height) / max(res) - 1))
+
+    resize_width, resize_height = target_resolution
+    if input_width > input_height:
+        resize_height = int(resize_width / input_aspect_ratio)
+    else:
+        resize_width = int(resize_height * input_aspect_ratio)
+    resized_image = image.resize((resize_width, resize_height), Image.ANTIALIAS)
+
+    # Step 5: Pad the resized image if necessary to match the target resolution
+    pad_width = target_resolution[0] - resize_width
+    pad_height = target_resolution[1] - resize_height
+    padded_image = Image.new("RGB", target_resolution, color=(0, 0, 0))
+    padded_image.paste(resized_image, (pad_width // 2, pad_height // 2))
+
+    return padded_image
+
+
+def extract_patches(image: Image.Image, patch_size: int, overlap_ratio: float) -> List[Image.Image]:
+    r"""
+    Extracts patches from a given image with specified patch size and overlap ratio.
+    
+    Args:
+        - image (`Image.Image`): The input image from which patches are to be extracted. Must be a Pillow Image.
+        - patch_size (`int`): The size of each patch (both width and height). Must be greater than 0.
+        - overlap_ratio (`float`): The ratio of overlap between adjacent patches. Must be between 0 and 1 (exclusive).
+    
+    Returns:
+        `List[Image.Image]`: A list of extracted patches as Pillow Images.
+    
+    Raises:
+        `AssertionError`: If the input image is not a Pillow Image.
+        `AssertionError`: If the patch size is not greater than 0.
+        `AssertionError`: If the overlap ratio is not between 0 and 1.
+    """
+
+    assert isinstance(image, Image.Image), "Input should be a Pillow Image"
+    assert patch_size > 0, "Patch size should be greater than 0"
+    assert 0 <= overlap_ratio < 1, "Overlap ratio should be between 0 and 1"
+
+    W, H = image.size
+    patches = []
+
+    stride = int(patch_size * (1 - overlap_ratio))
+
+    num_patches_y = (H - patch_size) // stride + 1
+    num_patches_x = (W - patch_size) // stride + 1
+
+    y_start = (H - (num_patches_y - 1) * stride - patch_size) // 2
+    x_start = (W - (num_patches_x - 1) * stride - patch_size) // 2
+
+    for y in range(y_start, y_start + num_patches_y * stride, stride):
+        for x in range(x_start, x_start + num_patches_x * stride, stride):
+            patch = image.crop((x, y, x + patch_size, y + patch_size))
+            patches.append(patch)
+
+    return patches
+
+
+def process_highres_image_crop_split(image: Image.Image, data_args, processor=None) -> torch.Tensor:
+    """
+    Process a high-resolution image by cropping and splitting it into patches.
+
+    Args:
+        - image (`PIL.Image.Image`): The input image to be processed.
+        - data_args: The data arguments containing crop and split resolutions.
+        - processor: The image processor object. If None, it will be taken from data_args.
+
+    Returns:
+        `torch.Tensor`: A tensor containing the processed image patches.
+    """
+    crop_resolution = data_args.image_crop_resolution
+    split_resolution = data_args.image_split_resolution
+    if processor is None:
+        processor = data_args.image_processor
+    image_crop = resize_and_center_crop(image, crop_resolution)
+    image_patches = extract_patches(image_crop, patch_size=split_resolution, overlap_ratio=0)
+    image_patches = [processor.preprocess(image_patch, return_tensors="pt")["pixel_values"][0] for image_patch in image_patches]
+    return torch.stack(image_patches, dim=0)
+
+
+def process_highres_image(image: Image.Image, processor, grid_pinpoints: str) -> torch.Tensor:
+    r"""
+    Processes a high-resolution image by resizing, padding, and extracting patches.
+    
+    Args:
+        - image (`Image.Image`): The input image to be processed.
+        - processor: An object that contains image processing parameters and methods.
+        - grid_pinpoints (`str`): A comma-separated string of grid sizes to consider for resizing.
+    
+    Returns:
+        torch.Tensor: A tensor containing the processed image patches.
+    """
+
+    grid_params = [int(x) for x in grid_pinpoints.split(",")]
+    width_height = max(image.size)
+    fit_grid_params = [x for x in grid_params if x >= width_height]
+    if len(fit_grid_params) == 0:
+        select_size = max(grid_params)
+    else:
+        select_size = min(fit_grid_params)
+    # FIXME: always select the 448
+    select_size = max(grid_params)
+    image_padded = expand2square(image, tuple(int(x * 255) for x in processor.image_mean))
+
+    # FIXME: this seems to be a bug that it always resizes instead of padding
+    image_original_resize = image.resize((processor.size["shortest_edge"], processor.size["shortest_edge"]))
+    image_padded = image_padded.resize((select_size, select_size))
+    image_patches = extract_patches(image_padded, patch_size=processor.size["shortest_edge"], overlap_ratio=0)
+    image_patches = [image_original_resize] + image_patches
+    image_patches = [processor.preprocess(image_patch, return_tensors="pt")["pixel_values"][0] for image_patch in image_patches]
+    return torch.stack(image_patches, dim=0)
+
+
+def select_best_resolution(original_size: tuple, possible_resolutions: List[Tuple[int, int]]) -> tuple:
+    """
+    Selects the best resolution from a list of possible resolutions based on the original size.
+
+    Args:
+        - original_size (`tuple`): The original size of the image in the format (width, height).
+        - possible_resolutions (`List[Tuple[int, int]]`): A list of possible resolutions in the format [(width1, height1), (width2, height2), ...].
+
+    Returns:
+        `tuple`: The best fit resolution in the format (width, height).
+    """
+    original_width, original_height = original_size
+    best_fit = None
+    max_effective_resolution = 0
+    min_wasted_resolution = float("inf")
+
+    for width, height in possible_resolutions:
+        # Calculate the downscaled size to keep the aspect ratio
+        scale = min(width / original_width, height / original_height)
+        downscaled_width, downscaled_height = int(original_width * scale), int(original_height * scale)
+
+        # Calculate effective and wasted resolutions
+        effective_resolution = min(downscaled_width * downscaled_height, original_width * original_height)
+        wasted_resolution = (width * height) - effective_resolution
+
+        if effective_resolution > max_effective_resolution or (effective_resolution == max_effective_resolution and wasted_resolution < min_wasted_resolution):
+            max_effective_resolution = effective_resolution
+            min_wasted_resolution = wasted_resolution
+            best_fit = (width, height)
+
+    return best_fit
+
+
+def resize_and_pad_image(image: Image.Image, target_resolution: tuple) -> Image.Image:
+    r"""
+    Resize and pad an image to a target resolution while maintaining aspect ratio.
+
+    Args:
+        - image (`Image.Image`): The input image.
+        - target_resolution (`tuple`): The target resolution (width, height) of the image.
+
+    Returns:
+        `Image.Image`: The resized and padded image.
+    """
+    original_width, original_height = image.size
+    target_width, target_height = target_resolution
+
+    # Determine which dimension (width or height) to fill
+    scale_w = target_width / original_width
+    scale_h = target_height / original_height
+
+    if scale_w < scale_h:
+        # Width will be filled completely
+        new_width = target_width
+        new_height = min(math.ceil(original_height * scale_w), target_height)
+    else:
+        # Height will be filled completely
+        new_height = target_height
+        new_width = min(math.ceil(original_width * scale_h), target_width)
+
+    # Resize the image
+    resized_image = image.resize((new_width, new_height))
+
+    # Create a new image with the target size and paste the resized image onto it
+    new_image = Image.new("RGB", (target_width, target_height), (0, 0, 0))
+    paste_x = (target_width - new_width) // 2
+    paste_y = (target_height - new_height) // 2
+    new_image.paste(resized_image, (paste_x, paste_y))
+
+    return new_image
+
+
+def divide_to_patches(image: Image.Image, patch_size: int) -> list:
+    """
+    Divides an image into patches of a specified size.
+
+    Args:
+        - image (`Image.Image`): The input image.
+        - patch_size (`int`): The size of each patch.
+
+    Returns:
+        `list`: A list of Image.Image objects representing the patches.
+    """
+    patches = []
+    width, height = image.size
+    for i in range(0, height, patch_size):
+        for j in range(0, width, patch_size):
+            box = (j, i, j + patch_size, i + patch_size)
+            patch = image.crop(box)
+            patches.append(patch)
+
+    return patches
+
+
+def get_anyres_image_grid_shape(image_size: Tuple[int, int], grid_pinpoints: Union[str, list], patch_size: int) -> Tuple[int, int]:
+    r"""
+    Calculate the shape of the image patch grid after the preprocessing for images of any resolution.
+
+    Args:
+        - image_size (`tuple`): The size of the input image in the format (width, height).
+        - grid_pinpoints (`str` or `list`): A string representation of a list of possible resolutions.
+        - patch_size (`int`): The size of each image patch.
+
+    Returns:
+        `tuple`: The shape of the image patch grid in the format (width, height).
+    """
+    if isinstance(grid_pinpoints, str) and "x" in grid_pinpoints:
+        assert patch_size in [224, 336, 384, 448, 512], "patch_size should be in [224, 336, 384, 448, 512]"
+        # Use regex to extract the range from the input string
+        matches = re.findall(r"\((\d+)x(\d+)\)", grid_pinpoints)
+        range_start = tuple(map(int, matches[0]))
+        range_end = tuple(map(int, matches[-1]))
+        # Generate a matrix of tuples from (range_start[0], range_start[1]) to (range_end[0], range_end[1])
+        grid_pinpoints = [(i, j) for i in range(range_start[0], range_end[0] + 1) for j in range(range_start[1], range_end[1] + 1)]
+        # Multiply all elements by patch_size
+        grid_pinpoints = [[dim * patch_size for dim in pair] for pair in grid_pinpoints]
+    if type(grid_pinpoints) is list:
+        possible_resolutions = grid_pinpoints
+    else:
+        possible_resolutions = ast.literal_eval(grid_pinpoints)
+    width, height = select_best_resolution(image_size, possible_resolutions)
+    return width // patch_size, height // patch_size
+
+
+def process_anyres_image(image: Image.Image, processor: Any, grid_pinpoints: Union[str, List[Tuple[int, int]]]) -> torch.Tensor:
+    r"""
+    Process an image with variable resolutions.
+
+    Args:
+        - image (`Image.Image`): The input image to be processed.
+        - processor: The image processor object.
+        - grid_pinpoints (`str`): A string representation of a list of possible resolutions.
+
+    Returns:
+        `torch.Tensor`: A tensor containing the processed image patches.
+    """
+    # Convert grid_pinpoints from string to list
+    if isinstance(grid_pinpoints, str) and "x" in grid_pinpoints:
+        try:
+            patch_size = processor.size[0]
+        except Exception as e:
+            patch_size = processor.size["shortest_edge"]
+        assert patch_size in [224, 336, 384, 448, 512], "patch_size should be in [224, 336, 384, 448, 512]"
+        # Use regex to extract the range from the input string
+        matches = re.findall(r"\((\d+)x(\d+)\)", grid_pinpoints)
+        range_start = tuple(map(int, matches[0]))
+        range_end = tuple(map(int, matches[-1]))
+        # Generate a matrix of tuples from (range_start[0], range_start[1]) to (range_end[0], range_end[1])
+        grid_pinpoints = [(i, j) for i in range(range_start[0], range_end[0] + 1) for j in range(range_start[1], range_end[1] + 1)]
+        # Multiply all elements by patch_size
+        grid_pinpoints = [[dim * patch_size for dim in pair] for pair in grid_pinpoints]
+
+    if type(grid_pinpoints) is list:
+        possible_resolutions = grid_pinpoints
+    else:
+        possible_resolutions = ast.literal_eval(grid_pinpoints)
+    best_resolution = select_best_resolution(image.size, possible_resolutions)
+    image_padded = resize_and_pad_image(image, best_resolution)
+
+    patches = divide_to_patches(image_padded, processor.crop_size["height"])
+
+    # FIXME: this seems to be a bug that it resizes instead of pad. # FIXME
+    # but to keep it consistent with previous, i will keep it as it is
+    # TODO: uncomment below to ablate with the padding
+    if isinstance(processor.size, dict):
+        shortest_edge = processor.size["shortest_edge"]
+    else:
+        shortest_edge = min(processor.size)
+    image_original_resize = image.resize((shortest_edge, shortest_edge))
+    # image_padded_square = expand2square(image, tuple(int(x*255) for x in processor.image_mean))
+    # image_original_resize = image_padded_square.resize((processor.size['shortest_edge'], processor.size['shortest_edge']))
+
+    image_patches = [image_original_resize] + patches
+    image_patches = [processor.preprocess(image_patch, return_tensors="pt")["pixel_values"][0] for image_patch in image_patches]
+    image_patches = torch.stack(image_patches, dim=0)
+    return image_patches
+
+
+def load_image_from_base64(image):
+    return Image.open(BytesIO(base64.b64decode(image)))
+
+
+def expand2square(pil_img: Image.Image, background_color: tuple) -> Image.Image:
+    r"""
+    Expands a given PIL image to a square by adding a background color.
+
+    Args:
+        - pil_img (`Image.Image`): The input PIL image to be expanded.
+        - background_color (`tuple`): The background color to use for expansion, specified as an RGB tuple.
+
+    Returns:
+        `Image.Image`: The expanded square PIL image.
+    """
+    width, height = pil_img.size
+    if width == height:
+        return pil_img
+    elif width > height:
+        result = Image.new(pil_img.mode, (width, width), background_color)
+        result.paste(pil_img, (0, (width - height) // 2))
+        return result
+    else:
+        result = Image.new(pil_img.mode, (height, height), background_color)
+        result.paste(pil_img, ((height - width) // 2, 0))
+        return result
+
+
+def process_images(images: List[Image.Image], image_processor: Any, model_cfg: Any) -> Union[torch.Tensor, List[torch.Tensor]]:
+    r"""
+    Processes a list of images based on the specified model configuration.
+
+    Args:
+        - images (`list`): A list of images to be processed.
+        - image_processor (`ImageProcessor`): An instance of the image processor to be used.
+        - model_cfg (`ModelConfig`): Configuration object containing model settings.
+
+    Returns:
+        `torch.Tensor` or list: Processed images as a tensor if all images have the same shape, 
+                              otherwise a list of processed images.
+    """
+    image_aspect_ratio = getattr(model_cfg, "image_aspect_ratio", None)
+    new_images = []
+    if image_aspect_ratio == "highres":
+        for image in images:
+            image = process_highres_image(image, image_processor, model_cfg.image_grid_pinpoints)
+            new_images.append(image)
+    elif image_aspect_ratio == "anyres" or "anyres_max" in image_aspect_ratio:
+        for image in images:
+            image = process_anyres_image(image, image_processor, model_cfg.image_grid_pinpoints)
+            new_images.append(image)
+    elif image_aspect_ratio == "crop_split":
+        for image in images:
+            image = process_highres_image_crop_split(image, model_cfg, image_processor)
+            new_images.append(image)
+    elif image_aspect_ratio == "pad":
+        for image in images:
+            image = expand2square(image, tuple(int(x * 255) for x in image_processor.image_mean))
+            image = image_processor.preprocess(image, return_tensors="pt")["pixel_values"][0]
+            new_images.append(image)
+    else:
+        return image_processor.preprocess(images, return_tensors="pt")["pixel_values"]
+    if all(x.shape == new_images[0].shape for x in new_images):
+        new_images = torch.stack(new_images, dim=0)
+    return new_images
+
+
+def tokenizer_image_token(prompt: str, tokenizer: PreTrainedTokenizer, image_token_index=IMAGE_TOKEN_INDEX, return_tensors=None)->Union[torch.Tensor, List[torch.Tensor]]:
+    r"""
+    Tokenizes a prompt containing image tokens and inserts the specified image token index at the appropriate positions.
+
+    Args:
+        - prompt (str): The input prompt string containing text and "<image>" placeholders.
+        - tokenizer (PreTrainedTokenizer): The tokenizer to use for tokenizing the text chunks.
+        - image_token_index (int): The token index to use for the image placeholders. Default is IMAGE_TOKEN_INDEX.
+        - return_tensors (str, optional): The type of tensor to return. If "pt", returns a PyTorch tensor. Default is None.
+
+    Returns:
+        list or torch.Tensor: The tokenized input IDs as a list or a PyTorch tensor if return_tensors is specified.
+    """
+    prompt_chunks = [tokenizer(chunk).input_ids for chunk in prompt.split("<image>")]
+    # FIXME: prompt_chunks = [tokenizer(chunk, return_tensors="pt", padding="longest", max_length=tokenizer.model_max_length, truncation=True).input_ids for chunk in prompt.split("<image>")]
+
+    def insert_separator(X, sep):
+        return [ele for sublist in zip(X, [sep] * len(X)) for ele in sublist][:-1]
+
+    input_ids = []
+    offset = 0
+    if len(prompt_chunks) > 0 and len(prompt_chunks[0]) > 0 and prompt_chunks[0][0] == tokenizer.bos_token_id:
+        offset = 1
+        input_ids.append(prompt_chunks[0][0])
+
+    for x in insert_separator(prompt_chunks, [image_token_index] * (offset + 1)):
+        input_ids.extend(x[offset:])
+
+    if return_tensors is not None:
+        if return_tensors == "pt":
+            return torch.tensor(input_ids, dtype=torch.long)
+        raise ValueError(f"Unsupported tensor type: {return_tensors}")
+    return input_ids
+
+
+def get_model_name_from_path(model_path: str)->str:
+    model_path = model_path.strip("/")
+    model_paths = model_path.split("/")
+    if model_paths[-1].startswith("checkpoint-"):
+        return model_paths[-2] + "_" + model_paths[-1]
+    else:
+        return model_paths[-1]
+
+
+class KeywordsStoppingCriteria(StoppingCriteria):
+    def __init__(self, keywords, tokenizer, input_ids):
+        self.keywords = keywords
+        self.keyword_ids = []
+        for keyword in keywords:
+            cur_keyword_ids = tokenizer(keyword).input_ids
+            if len(cur_keyword_ids) > 1 and cur_keyword_ids[0] == tokenizer.bos_token_id:
+                cur_keyword_ids = cur_keyword_ids[1:]
+            self.keyword_ids.append(torch.tensor(cur_keyword_ids))
+        self.tokenizer = tokenizer
+        self.start_len = input_ids.shape[1]
+
+    def __call__(self, output_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
+        assert output_ids.shape[0] == 1, "Only support batch size 1 (yet)"  # TODO
+        offset = min(output_ids.shape[1] - self.start_len, 3)
+        self.keyword_ids = [keyword_id.to(output_ids.device) for keyword_id in self.keyword_ids]
+        for keyword_id in self.keyword_ids:
+            if output_ids[0, -keyword_id.shape[0] :] == keyword_id:
+                return True
+        outputs = self.tokenizer.batch_decode(output_ids[:, -offset:], skip_special_tokens=True)[0]
+        for keyword in self.keywords:
+            if keyword in outputs:
+                return True
+        return False
+
+
+def rank0_print(*args):
+    if dist.is_initialized():
+        if dist.get_rank() == 0:
+            print(f"Rank {dist.get_rank()}: ", *args)
+    else:
+        print(*args)
--- a/model.safetensors
+++ b/model.safetensors
--- a/modeling_instellavl.py
+++ b/modeling_instellavl.py
--- a/preprocessor_config.json
+++ b/preprocessor_config.json
@ -0,0 +1,7 @@
+{
+    "auto_map": {
+        "AutoImageProcessor": "image_processing_instellavl.InstellaVLImageProcessor",
+        "AutoProcessor": "processing_instellavl.InstellaVLProcessor"
+    },
+    "processor_class": "InstellaVLProcessor"
+}
--- a/processing_instellavl.py
+++ b/processing_instellavl.py
@ -0,0 +1,212 @@
+from PIL import ImageOps
+from PIL.Image import Image
+
+import torch
+
+from typing import Union, List
+from tqdm import tqdm
+
+from transformers.image_utils import ImageInput
+from transformers.tokenization_utils_base import TextInput
+from transformers import CLIPImageProcessor
+from transformers.processing_utils import (
+    ProcessorMixin,
+)
+from transformers import AutoTokenizer, PreTrainedTokenizer
+
+from .image_processing_instellavl import InstellaVLImageProcessor
+from .mm_utils import DEFAULT_IMAGE_TOKEN, IMAGE_TOKEN_INDEX, KeywordsStoppingCriteria
+from .conversation import conv_templates
+
+def tokenizer_image_token(prompt: str, tokenizer: PreTrainedTokenizer, image_token_index=IMAGE_TOKEN_INDEX, return_tensors=None)->Union[torch.Tensor, List[torch.Tensor]]:
+    r"""
+    Tokenizes a prompt containing image tokens and inserts the specified image token index at the appropriate positions.
+
+    Args:
+        - prompt (str): The input prompt string containing text and DEFAULT_IMAGE_TOKEN="<image>" placeholders.
+        - tokenizer (PreTrainedTokenizer): The tokenizer to use for tokenizing the text chunks.
+        - image_token_index (int): The token index to use for the image placeholders. Default is IMAGE_TOKEN_INDEX.
+        - return_tensors (str, optional): The type of tensor to return. If "pt", returns a PyTorch tensor. Default is None.
+
+    Returns:
+        list or torch.Tensor: The tokenized input IDs as a list or a PyTorch tensor if return_tensors is specified.
+    """
+    prompt_chunks = [tokenizer(chunk).input_ids for chunk in prompt.split(DEFAULT_IMAGE_TOKEN)]
+
+    def insert_separator(X, sep):
+        return [ele for sublist in zip(X, [sep] * len(X)) for ele in sublist][:-1]
+
+    input_ids = []
+    offset = 0
+    if len(prompt_chunks) > 0 and len(prompt_chunks[0]) > 0 and prompt_chunks[0][0] == tokenizer.bos_token_id:
+        offset = 1
+        input_ids.append(prompt_chunks[0][0])
+
+    for x in insert_separator(prompt_chunks, [image_token_index] * (offset + 1)):
+        input_ids.extend(x[offset:])
+
+    if return_tensors is not None:
+        if return_tensors == "pt":
+            return torch.tensor(input_ids, dtype=torch.long)
+        raise ValueError(f"Unsupported tensor type: {return_tensors}")
+    return input_ids
+
+
+class InstellaVLProcessor(ProcessorMixin):
+    attributes = ["image_processor", "tokenizer"]
+    image_processor_class = "AutoImageProcessor"
+    tokenizer_class = ("GPTNeoXTokenizerFast")
+
+    def __init__(self, image_processor: InstellaVLImageProcessor = None, tokenizer: AutoTokenizer = None, **kwargs):
+        super().__init__(image_processor, tokenizer, **kwargs)
+    
+    def pad_sequence(self, input_ids: Union[List[torch.Tensor], List[List[torch.Tensor]]], batch_first: bool, padding_value: int, tokenizer: AutoTokenizer):
+        if tokenizer.padding_side == "left":
+            input_ids = [torch.flip(_input_ids, [0]) for _input_ids in input_ids]
+        input_ids = torch.nn.utils.rnn.pad_sequence(input_ids, batch_first=batch_first, padding_value=padding_value)
+        if tokenizer.padding_side == "left":
+            input_ids = torch.flip(input_ids, [1])
+        return input_ids
+    
+    def encode(self,
+        text: TextInput = None,
+        images: ImageInput = None,
+        image_processor: CLIPImageProcessor = None,
+        tokenizer: AutoTokenizer = None,
+        model_cfg: dict = None,
+    ) -> dict:
+
+        if images is not None:
+            if isinstance(images, Image):
+                # Handle images with EXIF orientation tags, which PIL will ignore by default
+                # https://github.com/python-pillow/Pillow/issues/4703
+                ImageOps.exif_transpose(images, in_place=True)
+                image_sizes = [images.size]
+                images = [images]
+            elif isinstance(images, list):
+                image_sizes = []
+                for i in images:
+                    ImageOps.exif_transpose(i, in_place=True)
+                    image_sizes.append(i.size)
+            image_tensor = self.image_processor.process(images, image_processor, model_cfg)['pixel_values']
+
+        text = text.replace(DEFAULT_IMAGE_TOKEN, "").strip()
+        if images is not None and len(image_tensor) != 0 and DEFAULT_IMAGE_TOKEN not in text:
+            question = DEFAULT_IMAGE_TOKEN + "\n" + text
+        else:
+            question = text
+        conv = conv_templates["instella"].copy()
+        conv.append_message(conv.roles[0], question)
+        conv.append_message(conv.roles[1], None)
+        prompt_question = conv.get_prompt()
+
+
+        input_ids = tokenizer_image_token(prompt_question, tokenizer, IMAGE_TOKEN_INDEX, return_tensors="pt").unsqueeze(0)
+        keywords = [conv.sep]
+        stopping_criteria = KeywordsStoppingCriteria(keywords, tokenizer, input_ids)
+        terminators = [tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("|||IP_ADDRESS|||")]
+
+        out = {
+            "input_ids": input_ids,
+            "stopping_criteria": [stopping_criteria],
+            "eos_token_id": terminators,
+        }
+        if images is not None:
+            out = {
+                "image_tensor": image_tensor,
+                "image_sizes": image_sizes,
+                **out,
+            }
+        self.tokenizer = tokenizer
+        return out
+
+    def batch_encode(self,
+        texts: List[TextInput] = None,
+        images: List[ImageInput] = None,
+        image_processor: CLIPImageProcessor = None,
+        tokenizer: AutoTokenizer = None,
+        model_cfg: dict = None,
+    ):
+        
+        if texts is None:
+            raise ValueError("Text must be provided for batch encoding.")
+
+        if images is None:
+            images = [None] * len(text)
+
+        assert isinstance(texts, list), "Since batch encoding happening, provide batch of texts in a list."  
+
+        assert len(texts) == len(images), "The number of texts and images must be equal."
+
+        batch_outs = []
+        for txt, img in tqdm(zip(texts, images), total=len(texts), desc="Total Samples to encode"):
+            batch_outs.append(self.encode(txt, img, image_processor, tokenizer, model_cfg))
+
+        return batch_outs
+        # batched_image_tensors = []
+        # batched_text_tokens = []
+        # stopping_criterias = []
+        # image_sizes = []
+        # for t, img in tqdm(zip(text, images), desc="Total Samples to encode"):
+        #     if img is not None:
+        #         if isinstance(img, Image):
+        #             ImageOps.exif_transpose(img, in_place=True)
+        #             image_sizes.append(img.size)
+        #             img = [img]
+
+        #         elif isinstance(img, list):
+        #             tmp_img_sizes = []
+        #             for i in img:
+        #                 ImageOps.exif_transpose(i, in_place=True)
+        #                 tmp_img_sizes.append(i.size)
+        #             image_sizes.append(tmp_img_sizes)
+        #         batched_image_tensors.append(self.image_processor.process(img, image_processor, model_cfg)['pixel_values'].squeeze(0))
+            
+        #     t = t.replace(DEFAULT_IMAGE_TOKEN, "").strip()
+        #     if img is not None and len(batched_image_tensors[-1]) != 0 and DEFAULT_IMAGE_TOKEN not in t:
+        #         question = DEFAULT_IMAGE_TOKEN + "\n" + t
+        #     else:
+        #         question = t
+        #     conv = conv_templates["instella"].copy()
+        #     conv.append_message(conv.roles[0], question)
+        #     conv.append_message(conv.roles[1], None)
+        #     prompt_question = conv.get_prompt()
+
+        #     input_ids = tokenizer_image_token(prompt_question, tokenizer, IMAGE_TOKEN_INDEX, return_tensors="pt")
+        #     stopping_criterias.append(KeywordsStoppingCriteria([conv.sep], tokenizer, input_ids.unsqueeze(0)))
+        #     batched_text_tokens.append(input_ids)
+        #     terminators = [tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("|||IP_ADDRESS|||")]
+
+        # # Pad the text tokens.
+        # pad_token_ids = tokenizer.pad_token_id if tokenizer.pad_token_id is not None else tokenizer.eos_token_id
+        # input_ids = self.pad_sequence(batched_text_tokens, batch_first=True, padding_value=pad_token_ids, tokenizer=tokenizer)
+        # attention_masks = input_ids.ne(pad_token_ids)
+        # batch_outs = {
+        #     "input_ids": input_ids,
+        #     "attention_mask": attention_masks,
+        #     "pad_token_id": pad_token_ids,
+        #     "stopping_criteria": stopping_criterias,
+        #     "eos_token_id": terminators,
+        # }
+        # if images is not None:
+        #     batch_outs = {
+        #         "image_tensor": batched_image_tensors,
+        #         "image_sizes": image_sizes,
+        #         **batch_outs
+        #     }
+        # self.tokenizer = tokenizer
+        # return batch_outs
+
+    def decode(self, output_ids: torch.Tensor)->str:
+        return self.tokenizer.decode(output_ids[0, :], skip_special_tokens=True).strip()
+
+    def batch_decode(self, output_ids_lst: List[torch.Tensor])->List[str]:
+        raise NotImplementedError("Batch decode is not implemented for InstellaVLProcessor")
+        # text_decoded_outs = []
+        # for out_ids in output_ids_lst:
+        #     text_decoded_outs.append(self.decode(out_ids))
+        # return text_decoded_outs
+        
+
+    
+InstellaVLProcessor.register_for_auto_class()
--- a/processor_config.json
+++ b/processor_config.json
@ -0,0 +1,6 @@
+{
+    "auto_map": {
+        "AutoProcessor": "processing_instellavl.InstellaVLProcessor"
+    },
+    "processor_class": "InstellaVLProcessor"
+}
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@ -0,0 +1,16 @@
+{
+  "eos_token": {
+    "content": "|||IP_ADDRESS|||",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|padding|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}
--- a/tokenizer.json
+++ b/tokenizer.json
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@ -0,0 +1,255 @@
+{
+  "add_bos_token": false,
+  "add_eos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<|padding|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50254": {
+      "content": "                        ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50255": {
+      "content": "                       ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50256": {
+      "content": "                      ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50257": {
+      "content": "                     ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50258": {
+      "content": "                    ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50259": {
+      "content": "                   ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50260": {
+      "content": "                  ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50261": {
+      "content": "                 ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50262": {
+      "content": "                ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50263": {
+      "content": "               ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50264": {
+      "content": "              ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50265": {
+      "content": "             ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50266": {
+      "content": "            ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50267": {
+      "content": "           ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50268": {
+      "content": "          ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50269": {
+      "content": "         ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50270": {
+      "content": "        ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50271": {
+      "content": "       ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50272": {
+      "content": "      ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50273": {
+      "content": "     ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50274": {
+      "content": "    ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50275": {
+      "content": "   ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50276": {
+      "content": "  ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50277": {
+      "content": "|||EMAIL_ADDRESS|||",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50278": {
+      "content": "|||PHONE_NUMBER|||",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50279": {
+      "content": "|||IP_ADDRESS|||",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50280": {
+      "content": "<point>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50281": {
+      "content": "</point>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": null,
+  "clean_up_tokenization_spaces": true,
+  "eos_token": "|||IP_ADDRESS|||",
+  "chat_template": "|||IP_ADDRESS|||\n{% for message in messages -%}{{ message['role'] + message['content']}}{%- if not loop.last -%}{{ '\\n' if loop.index % 2 == 1 else '|||IP_ADDRESS|||\\n'}}{%- endif %}{%- endfor -%}",
+  "model_max_length": 32768,
+  "pad_token": "<|padding|>",
+  "tokenizer_class": "GPTNeoXTokenizer",
+  "unk_token": null
+}