diff --git a/README.md b/README.md index 832f3f2..55f0414 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,135 @@ -# timesfm-1.0-200m +--- +license: apache-2.0 +library_name: timesfm +pipeline_tag: time-series-forecasting +--- -timesfm-1.0-200m \ No newline at end of file +# TimesFM + +TimesFM (Time Series Foundation Model) is a pretrained time-series foundation model developed by Google Research for time-series forecasting. + +**Resources and Technical Documentation**: + +* Paper: [A decoder-only foundation model for time-series forecasting](https://arxiv.org/abs/2310.10688), to appear in ICML 2024. +* [Google Research blog](https://research.google/blog/a-decoder-only-foundation-model-for-time-series-forecasting/) +* [GitHub repo](https://github.com/google-research/timesfm) + +**Authors**: Google Research + +This is not an officially supported Google product. + +## Checkpoint timesfm-1.0-200m + +`timesfm-1.0-200m` is the first open model checkpoint: + +- It performs univariate time series forecasting for context lengths up to 512 time points and any horizon lengths, with an optional frequency indicator. +- It focuses on point forecasts and does not support probabilistic forecasts. We experimentally offer quantile heads but they have not been calibrated after pretraining. +- It requires the context to be contiguous (i.e. no "holes"), and the context and the horizon to be of the same frequency. + +## Benchmarks + +Please refer to our result tables on the [extended benchmarks](https://github.com/google-research/timesfm/blob/master/experiments/extended_benchmarks/tfm_results.png) and the [long horizon benchmarks](https://github.com/google-research/timesfm/blob/master/experiments/long_horizon_benchmarks/tfm_long_horizon.png). + +Please look into the README files in the respective benchmark directories within `experiments/` for instructions for running TimesFM on the respective benchmarks. + +## Installation + +This HuggingFace repo hosts TimesFm checkpoints. Please visit our [GitHub repo](https://github.com/google-research/timesfm) and follow the instructions there to install the `timesfm` library for model inference. + +## Usage + +### Initialize the model and load a checkpoint. +Then the base class can be loaded as, + +```python +import timesfm + +tfm = timesfm.TimesFm( + context_len=, + horizon_len=, + input_patch_len=32, + output_patch_len=128, + num_layers=20, + model_dims=1280, + backend=, +) +tfm.load_from_checkpoint(repo_id="google/timesfm-1.0-200m") +``` + +Note that the four parameters are fixed to load the 200m model + +```python +input_patch_len=32, +output_patch_len=128, +num_layers=20, +model_dims=1280, +``` + +1. The context_len here can be set as the max context length **of the model**. You can provide a shorter series to the `tfm.forecast()` function and the model will handle it. Currently, the model handles a max context length of 512, which can be increased in later releases. The input time series can have **any context length**. Padding / truncation will be handled by the inference code if needed. + +2. The horizon length can be set to anything. We recommend setting it to the largest horizon length you would need in the forecasting tasks for your application. We generally recommend horizon length <= context length but it is not a requirement in the function call. + +### Perform inference + +We provide APIs to forecast from either array inputs or `pandas` dataframe. Both forecast methods expect (1) the input time series contexts, (2) along with their frequencies. Please look at the documentation of the functions `tfm.forecast()` and `tfm.forecast_on_df()` for detailed instructions. + +In particular, regarding the frequency, TimesFM expects a categorical indicator valued in {0, 1, 2}: + +- **0** (default): high frequency, long horizon time series. We recommend using this for time series up to daily granularity. +- **1**: medium frequency time series. We recommend using this for weekly and monthly data. +- **2**: low frequency, short horizon time series. We recommend using this for anything beyond monthly, e.g. quarterly or yearly. + +This categorical value should be directly provided with the array inputs. For dataframe inputs, we convert the conventional letter coding of frequencies to our expected categories, that + +- **0**: T, MIN, H, D, B, U +- **1**: W, M +- **2**: Q, Y + +Notice you do **NOT** have to strictly follow our recommendation here. Although this is our setup during model training and we expect it to offer the best forecast result, you can also view the frequency input as a free parameter and modify it per your specific use case. + + +Examples: + +Array inputs, with the frequencies set to low, medium, and high respectively. + +```python +import numpy as np +forecast_input = [ + np.sin(np.linspace(0, 20, 100)) + np.sin(np.linspace(0, 20, 200)), + np.sin(np.linspace(0, 20, 400)), +] +frequency_input = [0, 1, 2] + +point_forecast, experimental_quantile_forecast = tfm.forecast( + forecast_input, + freq=frequency_input, +) +``` + +`pandas` dataframe, with the frequency set to "M" monthly. + +```python +import pandas as pd + +# e.g. input_df is +# unique_id ds y +# 0 T1 1975-12-31 697458.0 +# 1 T1 1976-01-31 1187650.0 +# 2 T1 1976-02-29 1069690.0 +# 3 T1 1976-03-31 1078430.0 +# 4 T1 1976-04-30 1059910.0 +# ... ... ... ... +# 8175 T99 1986-01-31 602.0 +# 8176 T99 1986-02-28 684.0 +# 8177 T99 1986-03-31 818.0 +# 8178 T99 1986-04-30 836.0 +# 8179 T99 1986-05-31 878.0 + +forecast_df = tfm.forecast_on_df( + inputs=input_df, + freq="M", # monthly + value_name="y", + num_jobs=-1, +) +``` \ No newline at end of file diff --git a/checkpoints/checkpoint_1100000/descriptor/descriptor.pbtxt b/checkpoints/checkpoint_1100000/descriptor/descriptor.pbtxt new file mode 100644 index 0000000..15ec06a --- /dev/null +++ b/checkpoints/checkpoint_1100000/descriptor/descriptor.pbtxt @@ -0,0 +1,16 @@ +uuid: "07e37595-56cb-444a-8f8e-abbc3c2e752f" +build_data { + timestamp { + seconds: 1714379165 + } + user: "pax-dev-releaser-jobs" + hostname: "jgbv26.prod.google.com" + path: "/google/src/cloud/buildrabbit-username/buildrabbit-client/google3" + target: "//learning/multipod/pax/tools:colab_notebook" + invocation_id: "30f9e7fa-1cb7-4443-a49a-dbfb1b690df0" + changelist: 628988949 + baseline_cl: 628988949 + workspace_id: "" + client_status: BUILD_CLIENT_STATUS_UNSPECIFIED + verifiable: UNKNOWN +} diff --git a/checkpoints/checkpoint_1100000/metadata/metadata b/checkpoints/checkpoint_1100000/metadata/metadata new file mode 100644 index 0000000..ff56f80 --- /dev/null +++ b/checkpoints/checkpoint_1100000/metadata/metadata @@ -0,0 +1 @@ +{"version": 1.1, "train_state_metadata": {"mdl_vars": {"params": {"freq_emb": {"emb_var": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [3, 1280]}}, "horizon_ff_layer": {"hidden_layer": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "output_layer": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "residual_layer": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}}, "input_ff_layer": {"hidden_layer": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [64, 1280]}}}, "output_layer": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "residual_layer": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [64, 1280]}}}}, "stacked_transformer_layer": {"x_layers_0": {"ff_layer": {"ffn_layer1": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "ffn_layer2": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "layer_norm": {"bias": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}}, "layer_norm": {"scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "self_attention": {"key": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "per_dim_scale": {"per_dim_scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [80]}}, "post": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "query": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "value": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}}}, "x_layers_1": {"ff_layer": {"ffn_layer1": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "ffn_layer2": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "layer_norm": {"bias": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}}, "layer_norm": {"scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "self_attention": {"key": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "per_dim_scale": {"per_dim_scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [80]}}, "post": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "query": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "value": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}}}, "x_layers_10": {"ff_layer": {"ffn_layer1": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "ffn_layer2": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "layer_norm": {"bias": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}}, "layer_norm": {"scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "self_attention": {"key": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "per_dim_scale": {"per_dim_scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [80]}}, "post": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "query": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "value": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}}}, "x_layers_11": {"ff_layer": {"ffn_layer1": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "ffn_layer2": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "layer_norm": {"bias": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}}, "layer_norm": {"scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "self_attention": {"key": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "per_dim_scale": {"per_dim_scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [80]}}, "post": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "query": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "value": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}}}, "x_layers_12": {"ff_layer": {"ffn_layer1": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "ffn_layer2": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "layer_norm": {"bias": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}}, "layer_norm": {"scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "self_attention": {"key": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "per_dim_scale": {"per_dim_scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [80]}}, "post": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "query": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "value": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}}}, "x_layers_13": {"ff_layer": {"ffn_layer1": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "ffn_layer2": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "layer_norm": {"bias": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}}, "layer_norm": {"scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "self_attention": {"key": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "per_dim_scale": {"per_dim_scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [80]}}, "post": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "query": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "value": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}}}, "x_layers_14": {"ff_layer": {"ffn_layer1": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "ffn_layer2": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "layer_norm": {"bias": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}}, "layer_norm": {"scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "self_attention": {"key": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "per_dim_scale": {"per_dim_scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [80]}}, "post": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "query": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "value": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}}}, "x_layers_15": {"ff_layer": {"ffn_layer1": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "ffn_layer2": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "layer_norm": {"bias": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}}, "layer_norm": {"scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "self_attention": {"key": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "per_dim_scale": {"per_dim_scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [80]}}, "post": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "query": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "value": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}}}, "x_layers_16": {"ff_layer": {"ffn_layer1": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "ffn_layer2": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "layer_norm": {"bias": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}}, "layer_norm": {"scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "self_attention": {"key": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "per_dim_scale": {"per_dim_scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [80]}}, "post": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "query": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "value": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}}}, "x_layers_17": {"ff_layer": {"ffn_layer1": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "ffn_layer2": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "layer_norm": {"bias": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}}, "layer_norm": {"scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "self_attention": {"key": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "per_dim_scale": {"per_dim_scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [80]}}, "post": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "query": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "value": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}}}, "x_layers_18": {"ff_layer": {"ffn_layer1": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "ffn_layer2": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "layer_norm": {"bias": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}}, "layer_norm": {"scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "self_attention": {"key": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "per_dim_scale": {"per_dim_scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [80]}}, "post": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "query": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "value": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}}}, "x_layers_19": {"ff_layer": {"ffn_layer1": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "ffn_layer2": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "layer_norm": {"bias": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}}, "layer_norm": {"scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "self_attention": {"key": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "per_dim_scale": {"per_dim_scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [80]}}, "post": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "query": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "value": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}}}, "x_layers_2": {"ff_layer": {"ffn_layer1": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "ffn_layer2": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "layer_norm": {"bias": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}}, "layer_norm": {"scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "self_attention": {"key": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "per_dim_scale": {"per_dim_scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [80]}}, "post": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "query": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "value": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}}}, "x_layers_3": {"ff_layer": {"ffn_layer1": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "ffn_layer2": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "layer_norm": {"bias": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}}, "layer_norm": {"scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "self_attention": {"key": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "per_dim_scale": {"per_dim_scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [80]}}, "post": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "query": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "value": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}}}, "x_layers_4": {"ff_layer": {"ffn_layer1": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "ffn_layer2": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "layer_norm": {"bias": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}}, "layer_norm": {"scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "self_attention": {"key": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "per_dim_scale": {"per_dim_scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [80]}}, "post": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "query": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "value": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}}}, "x_layers_5": {"ff_layer": {"ffn_layer1": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "ffn_layer2": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "layer_norm": {"bias": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}}, "layer_norm": {"scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "self_attention": {"key": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "per_dim_scale": {"per_dim_scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [80]}}, "post": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "query": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "value": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}}}, "x_layers_6": {"ff_layer": {"ffn_layer1": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "ffn_layer2": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "layer_norm": {"bias": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}}, "layer_norm": {"scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "self_attention": {"key": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "per_dim_scale": {"per_dim_scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [80]}}, "post": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "query": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "value": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}}}, "x_layers_7": {"ff_layer": {"ffn_layer1": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "ffn_layer2": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "layer_norm": {"bias": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}}, "layer_norm": {"scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "self_attention": {"key": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "per_dim_scale": {"per_dim_scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [80]}}, "post": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "query": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "value": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}}}, "x_layers_8": {"ff_layer": {"ffn_layer1": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "ffn_layer2": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "layer_norm": {"bias": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}}, "layer_norm": {"scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "self_attention": {"key": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "per_dim_scale": {"per_dim_scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [80]}}, "post": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "query": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "value": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}}}, "x_layers_9": {"ff_layer": {"ffn_layer1": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "ffn_layer2": {"bias": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "linear": {"w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 1280]}}}, "layer_norm": {"bias": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}}, "layer_norm": {"scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}}, "self_attention": {"key": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "per_dim_scale": {"per_dim_scale": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [80]}}, "post": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "query": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}, "value": {"b": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [16, 80]}, "w": {"_array_metadata_tag": true, "dtype": "float32", "is_optax_masked_node": false, "unpadded_shape": [1280, 16, 80]}}}}}}}}} \ No newline at end of file diff --git a/checkpoints/checkpoint_1100000/state/checkpoint b/checkpoints/checkpoint_1100000/state/checkpoint new file mode 100644 index 0000000..8d2dd22 Binary files /dev/null and b/checkpoints/checkpoint_1100000/state/checkpoint differ diff --git a/configuration.json b/configuration.json new file mode 100644 index 0000000..3fc809d --- /dev/null +++ b/configuration.json @@ -0,0 +1 @@ +{"framework":"Pytorch","task":"other"} \ No newline at end of file