nomic-embed-text-v1_a140326.../README.md

69 KiB

library_name pipeline_tag tags model-index license language new_version
sentence-transformers sentence-similarity
feature-extraction
sentence-similarity
mteb
transformers
transformers.js
name results
epoch_0_model
task dataset metrics
type
Classification
type name config split revision
mteb/amazon_counterfactual MTEB AmazonCounterfactualClassification (en) en test e8379541af4e31359cca9fbcf4b00f2671dba205
type value
accuracy 76.8507462686567
type value
ap 40.592189159090495
type value
f1 71.01634655512476
task dataset metrics
type
Classification
type name config split revision
mteb/amazon_polarity MTEB AmazonPolarityClassification default test e2d317d38cd51312af73b3d32a06d1a08b442046
type value
accuracy 91.51892500000001
type value
ap 88.50346762975335
type value
f1 91.50342077459624
task dataset metrics
type
Classification
type name config split revision
mteb/amazon_reviews_multi MTEB AmazonReviewsClassification (en) en test 1399c76144fd37290681b995c656ef9b2e06e26d
type value
accuracy 47.364
type value
f1 46.72708080922794
task dataset metrics
type
Retrieval
type name config split revision
arguana MTEB ArguAna default test None
type value
map_at_1 25.178
type value
map_at_10 40.244
type value
map_at_100 41.321999999999996
type value
map_at_1000 41.331
type value
map_at_3 35.016999999999996
type value
map_at_5 37.99
type value
mrr_at_1 25.605
type value
mrr_at_10 40.422000000000004
type value
mrr_at_100 41.507
type value
mrr_at_1000 41.516
type value
mrr_at_3 35.23
type value
mrr_at_5 38.15
type value
ndcg_at_1 25.178
type value
ndcg_at_10 49.258
type value
ndcg_at_100 53.776
type value
ndcg_at_1000 53.995000000000005
type value
ndcg_at_3 38.429
type value
ndcg_at_5 43.803
type value
precision_at_1 25.178
type value
precision_at_10 7.831
type value
precision_at_100 0.979
type value
precision_at_1000 0.1
type value
precision_at_3 16.121
type value
precision_at_5 12.29
type value
recall_at_1 25.178
type value
recall_at_10 78.307
type value
recall_at_100 97.866
type value
recall_at_1000 99.57300000000001
type value
recall_at_3 48.364000000000004
type value
recall_at_5 61.451
task dataset metrics
type
Clustering
type name config split revision
mteb/arxiv-clustering-p2p MTEB ArxivClusteringP2P default test a122ad7f3f0291bf49cc6f4d32aa80929df69d5d
type value
v_measure 45.93034494751465
task dataset metrics
type
Clustering
type name config split revision
mteb/arxiv-clustering-s2s MTEB ArxivClusteringS2S default test f910caf1a6075f7329cdf8c1a6135696f37dbd53
type value
v_measure 36.64579480054327
task dataset metrics
type
Reranking
type name config split revision
mteb/askubuntudupquestions-reranking MTEB AskUbuntuDupQuestions default test 2000358ca161889fa9c082cb41daa8dcfb161a54
type value
map 60.601310529222054
type value
mrr 75.04484896451656
task dataset metrics
type
STS
type name config split revision
mteb/biosses-sts MTEB BIOSSES default test d3fb88f8f02e40887cd149695127462bbcf29b4a
type value
cos_sim_pearson 88.57797718095814
type value
cos_sim_spearman 86.47064499110101
type value
euclidean_pearson 87.4559602783142
type value
euclidean_spearman 86.47064499110101
type value
manhattan_pearson 87.7232764230245
type value
manhattan_spearman 86.91222131777742
task dataset metrics
type
Classification
type name config split revision
mteb/banking77 MTEB Banking77Classification default test 0fd18e25b25c072e09e0d92ab615fda904d66300
type value
accuracy 84.5422077922078
type value
f1 84.47657456950589
task dataset metrics
type
Clustering
type name config split revision
mteb/biorxiv-clustering-p2p MTEB BiorxivClusteringP2P default test 65b79d1d13f80053f67aca9498d9402c2d9f1f40
type value
v_measure 38.48953561974464
task dataset metrics
type
Clustering
type name config split revision
mteb/biorxiv-clustering-s2s MTEB BiorxivClusteringS2S default test 258694dd0231531bc1fd9de6ceb52a0853c6d908
type value
v_measure 32.75995857510105
task dataset metrics
type
Retrieval
type name config split revision
BeIR/cqadupstack MTEB CQADupstackAndroidRetrieval default test None
type value
map_at_1 30.008000000000003
type value
map_at_10 39.51
type value
map_at_100 40.841
type value
map_at_1000 40.973
type value
map_at_3 36.248999999999995
type value
map_at_5 38.096999999999994
type value
mrr_at_1 36.481
type value
mrr_at_10 44.818000000000005
type value
mrr_at_100 45.64
type value
mrr_at_1000 45.687
type value
mrr_at_3 42.036
type value
mrr_at_5 43.782
type value
ndcg_at_1 36.481
type value
ndcg_at_10 45.152
type value
ndcg_at_100 50.449
type value
ndcg_at_1000 52.76499999999999
type value
ndcg_at_3 40.161
type value
ndcg_at_5 42.577999999999996
type value
precision_at_1 36.481
type value
precision_at_10 8.369
type value
precision_at_100 1.373
type value
precision_at_1000 0.186
type value
precision_at_3 18.693
type value
precision_at_5 13.533999999999999
type value
recall_at_1 30.008000000000003
type value
recall_at_10 56.108999999999995
type value
recall_at_100 78.55499999999999
type value
recall_at_1000 93.659
type value
recall_at_3 41.754999999999995
type value
recall_at_5 48.296
task dataset metrics
type
Retrieval
type name config split revision
BeIR/cqadupstack MTEB CQADupstackEnglishRetrieval default test None
type value
map_at_1 30.262
type value
map_at_10 40.139
type value
map_at_100 41.394
type value
map_at_1000 41.526
type value
map_at_3 37.155
type value
map_at_5 38.785
type value
mrr_at_1 38.153
type value
mrr_at_10 46.369
type value
mrr_at_100 47.072
type value
mrr_at_1000 47.111999999999995
type value
mrr_at_3 44.268
type value
mrr_at_5 45.389
type value
ndcg_at_1 38.153
type value
ndcg_at_10 45.925
type value
ndcg_at_100 50.394000000000005
type value
ndcg_at_1000 52.37500000000001
type value
ndcg_at_3 41.754000000000005
type value
ndcg_at_5 43.574
type value
precision_at_1 38.153
type value
precision_at_10 8.796
type value
precision_at_100 1.432
type value
precision_at_1000 0.189
type value
precision_at_3 20.318
type value
precision_at_5 14.395
type value
recall_at_1 30.262
type value
recall_at_10 55.72200000000001
type value
recall_at_100 74.97500000000001
type value
recall_at_1000 87.342
type value
recall_at_3 43.129
type value
recall_at_5 48.336
task dataset metrics
type
Retrieval
type name config split revision
BeIR/cqadupstack MTEB CQADupstackGamingRetrieval default test None
type value
map_at_1 39.951
type value
map_at_10 51.248000000000005
type value
map_at_100 52.188
type value
map_at_1000 52.247
type value
map_at_3 48.211
type value
map_at_5 49.797000000000004
type value
mrr_at_1 45.329
type value
mrr_at_10 54.749
type value
mrr_at_100 55.367999999999995
type value
mrr_at_1000 55.400000000000006
type value
mrr_at_3 52.382
type value
mrr_at_5 53.649
type value
ndcg_at_1 45.329
type value
ndcg_at_10 56.847
type value
ndcg_at_100 60.738
type value
ndcg_at_1000 61.976
type value
ndcg_at_3 51.59
type value
ndcg_at_5 53.915
type value
precision_at_1 45.329
type value
precision_at_10 8.959
type value
precision_at_100 1.187
type value
precision_at_1000 0.134
type value
precision_at_3 22.612
type value
precision_at_5 15.273
type value
recall_at_1 39.951
type value
recall_at_10 70.053
type value
recall_at_100 86.996
type value
recall_at_1000 95.707
type value
recall_at_3 56.032000000000004
type value
recall_at_5 61.629999999999995
task dataset metrics
type
Retrieval
type name config split revision
BeIR/cqadupstack MTEB CQADupstackGisRetrieval default test None
type value
map_at_1 25.566
type value
map_at_10 33.207
type value
map_at_100 34.166000000000004
type value
map_at_1000 34.245
type value
map_at_3 30.94
type value
map_at_5 32.01
type value
mrr_at_1 27.345000000000002
type value
mrr_at_10 35.193000000000005
type value
mrr_at_100 35.965
type value
mrr_at_1000 36.028999999999996
type value
mrr_at_3 32.806000000000004
type value
mrr_at_5 34.021
type value
ndcg_at_1 27.345000000000002
type value
ndcg_at_10 37.891999999999996
type value
ndcg_at_100 42.664
type value
ndcg_at_1000 44.757000000000005
type value
ndcg_at_3 33.123000000000005
type value
ndcg_at_5 35.035
type value
precision_at_1 27.345000000000002
type value
precision_at_10 5.763
type value
precision_at_100 0.859
type value
precision_at_1000 0.108
type value
precision_at_3 13.71
type value
precision_at_5 9.401
type value
recall_at_1 25.566
type value
recall_at_10 50.563
type value
recall_at_100 72.86399999999999
type value
recall_at_1000 88.68599999999999
type value
recall_at_3 37.43
type value
recall_at_5 41.894999999999996
task dataset metrics
type
Retrieval
type name config split revision
BeIR/cqadupstack MTEB CQADupstackMathematicaRetrieval default test None
type value
map_at_1 16.663
type value
map_at_10 23.552
type value
map_at_100 24.538
type value
map_at_1000 24.661
type value
map_at_3 21.085
type value
map_at_5 22.391
type value
mrr_at_1 20.025000000000002
type value
mrr_at_10 27.643
type value
mrr_at_100 28.499999999999996
type value
mrr_at_1000 28.582
type value
mrr_at_3 25.083
type value
mrr_at_5 26.544
type value
ndcg_at_1 20.025000000000002
type value
ndcg_at_10 28.272000000000002
type value
ndcg_at_100 33.353
type value
ndcg_at_1000 36.454
type value
ndcg_at_3 23.579
type value
ndcg_at_5 25.685000000000002
type value
precision_at_1 20.025000000000002
type value
precision_at_10 5.187
type value
precision_at_100 0.897
type value
precision_at_1000 0.13
type value
precision_at_3 10.987
type value
precision_at_5 8.06
type value
recall_at_1 16.663
type value
recall_at_10 38.808
type value
recall_at_100 61.305
type value
recall_at_1000 83.571
type value
recall_at_3 25.907999999999998
type value
recall_at_5 31.214
task dataset metrics
type
Retrieval
type name config split revision
BeIR/cqadupstack MTEB CQADupstackPhysicsRetrieval default test None
type value
map_at_1 27.695999999999998
type value
map_at_10 37.018
type value
map_at_100 38.263000000000005
type value
map_at_1000 38.371
type value
map_at_3 34.226
type value
map_at_5 35.809999999999995
type value
mrr_at_1 32.916000000000004
type value
mrr_at_10 42.067
type value
mrr_at_100 42.925000000000004
type value
mrr_at_1000 42.978
type value
mrr_at_3 39.637
type value
mrr_at_5 41.134
type value
ndcg_at_1 32.916000000000004
type value
ndcg_at_10 42.539
type value
ndcg_at_100 47.873
type value
ndcg_at_1000 50.08200000000001
type value
ndcg_at_3 37.852999999999994
type value
ndcg_at_5 40.201
type value
precision_at_1 32.916000000000004
type value
precision_at_10 7.5840000000000005
type value
precision_at_100 1.199
type value
precision_at_1000 0.155
type value
precision_at_3 17.485
type value
precision_at_5 12.512
type value
recall_at_1 27.695999999999998
type value
recall_at_10 53.638
type value
recall_at_100 76.116
type value
recall_at_1000 91.069
type value
recall_at_3 41.13
type value
recall_at_5 46.872
task dataset metrics
type
Retrieval
type name config split revision
BeIR/cqadupstack MTEB CQADupstackProgrammersRetrieval default test None
type value
map_at_1 24.108
type value
map_at_10 33.372
type value
map_at_100 34.656
type value
map_at_1000 34.768
type value
map_at_3 30.830999999999996
type value
map_at_5 32.204
type value
mrr_at_1 29.110000000000003
type value
mrr_at_10 37.979
type value
mrr_at_100 38.933
type value
mrr_at_1000 38.988
type value
mrr_at_3 35.731
type value
mrr_at_5 36.963
type value
ndcg_at_1 29.110000000000003
type value
ndcg_at_10 38.635000000000005
type value
ndcg_at_100 44.324999999999996
type value
ndcg_at_1000 46.747
type value
ndcg_at_3 34.37
type value
ndcg_at_5 36.228
type value
precision_at_1 29.110000000000003
type value
precision_at_10 6.963
type value
precision_at_100 1.146
type value
precision_at_1000 0.152
type value
precision_at_3 16.400000000000002
type value
precision_at_5 11.552999999999999
type value
recall_at_1 24.108
type value
recall_at_10 49.597
type value
recall_at_100 73.88900000000001
type value
recall_at_1000 90.62400000000001
type value
recall_at_3 37.662
type value
recall_at_5 42.565
task dataset metrics
type
Retrieval
type name config split revision
BeIR/cqadupstack MTEB CQADupstackRetrieval default test None
type value
map_at_1 25.00791666666667
type value
map_at_10 33.287749999999996
type value
map_at_100 34.41141666666667
type value
map_at_1000 34.52583333333333
type value
map_at_3 30.734416666666668
type value
map_at_5 32.137166666666666
type value
mrr_at_1 29.305666666666664
type value
mrr_at_10 37.22966666666666
type value
mrr_at_100 38.066583333333334
type value
mrr_at_1000 38.12616666666667
type value
mrr_at_3 34.92275
type value
mrr_at_5 36.23333333333334
type value
ndcg_at_1 29.305666666666664
type value
ndcg_at_10 38.25533333333333
type value
ndcg_at_100 43.25266666666666
type value
ndcg_at_1000 45.63583333333334
type value
ndcg_at_3 33.777166666666666
type value
ndcg_at_5 35.85
type value
precision_at_1 29.305666666666664
type value
precision_at_10 6.596416666666667
type value
precision_at_100 1.0784166666666668
type value
precision_at_1000 0.14666666666666664
type value
precision_at_3 15.31075
type value
precision_at_5 10.830916666666667
type value
recall_at_1 25.00791666666667
type value
recall_at_10 49.10933333333333
type value
recall_at_100 71.09216666666667
type value
recall_at_1000 87.77725000000001
type value
recall_at_3 36.660916666666665
type value
recall_at_5 41.94149999999999
task dataset metrics
type
Retrieval
type name config split revision
BeIR/cqadupstack MTEB CQADupstackStatsRetrieval default test None
type value
map_at_1 23.521
type value
map_at_10 30.043
type value
map_at_100 30.936000000000003
type value
map_at_1000 31.022
type value
map_at_3 27.926000000000002
type value
map_at_5 29.076999999999998
type value
mrr_at_1 26.227
type value
mrr_at_10 32.822
type value
mrr_at_100 33.61
type value
mrr_at_1000 33.672000000000004
type value
mrr_at_3 30.776999999999997
type value
mrr_at_5 31.866
type value
ndcg_at_1 26.227
type value
ndcg_at_10 34.041
type value
ndcg_at_100 38.394
type value
ndcg_at_1000 40.732
type value
ndcg_at_3 30.037999999999997
type value
ndcg_at_5 31.845000000000002
type value
precision_at_1 26.227
type value
precision_at_10 5.244999999999999
type value
precision_at_100 0.808
type value
precision_at_1000 0.107
type value
precision_at_3 12.679000000000002
type value
precision_at_5 8.773
type value
recall_at_1 23.521
type value
recall_at_10 43.633
type value
recall_at_100 63.126000000000005
type value
recall_at_1000 80.765
type value
recall_at_3 32.614
type value
recall_at_5 37.15
task dataset metrics
type
Retrieval
type name config split revision
BeIR/cqadupstack MTEB CQADupstackTexRetrieval default test None
type value
map_at_1 16.236
type value
map_at_10 22.898
type value
map_at_100 23.878
type value
map_at_1000 24.009
type value
map_at_3 20.87
type value
map_at_5 22.025
type value
mrr_at_1 19.339000000000002
type value
mrr_at_10 26.382
type value
mrr_at_100 27.245
type value
mrr_at_1000 27.33
type value
mrr_at_3 24.386
type value
mrr_at_5 25.496000000000002
type value
ndcg_at_1 19.339000000000002
type value
ndcg_at_10 27.139999999999997
type value
ndcg_at_100 31.944
type value
ndcg_at_1000 35.077999999999996
type value
ndcg_at_3 23.424
type value
ndcg_at_5 25.188
type value
precision_at_1 19.339000000000002
type value
precision_at_10 4.8309999999999995
type value
precision_at_100 0.845
type value
precision_at_1000 0.128
type value
precision_at_3 10.874
type value
precision_at_5 7.825
type value
recall_at_1 16.236
type value
recall_at_10 36.513
type value
recall_at_100 57.999
type value
recall_at_1000 80.512
type value
recall_at_3 26.179999999999996
type value
recall_at_5 30.712
task dataset metrics
type
Retrieval
type name config split revision
BeIR/cqadupstack MTEB CQADupstackUnixRetrieval default test None
type value
map_at_1 24.11
type value
map_at_10 31.566
type value
map_at_100 32.647
type value
map_at_1000 32.753
type value
map_at_3 29.24
type value
map_at_5 30.564999999999998
type value
mrr_at_1 28.265
type value
mrr_at_10 35.504000000000005
type value
mrr_at_100 36.436
type value
mrr_at_1000 36.503
type value
mrr_at_3 33.349000000000004
type value
mrr_at_5 34.622
type value
ndcg_at_1 28.265
type value
ndcg_at_10 36.192
type value
ndcg_at_100 41.388000000000005
type value
ndcg_at_1000 43.948
type value
ndcg_at_3 31.959
type value
ndcg_at_5 33.998
type value
precision_at_1 28.265
type value
precision_at_10 5.989
type value
precision_at_100 0.9650000000000001
type value
precision_at_1000 0.13
type value
precision_at_3 14.335
type value
precision_at_5 10.112
type value
recall_at_1 24.11
type value
recall_at_10 46.418
type value
recall_at_100 69.314
type value
recall_at_1000 87.397
type value
recall_at_3 34.724
type value
recall_at_5 39.925
task dataset metrics
type
Retrieval
type name config split revision
BeIR/cqadupstack MTEB CQADupstackWebmastersRetrieval default test None
type value
map_at_1 22.091
type value
map_at_10 29.948999999999998
type value
map_at_100 31.502000000000002
type value
map_at_1000 31.713
type value
map_at_3 27.464
type value
map_at_5 28.968
type value
mrr_at_1 26.482
type value
mrr_at_10 34.009
type value
mrr_at_100 35.081
type value
mrr_at_1000 35.138000000000005
type value
mrr_at_3 31.785000000000004
type value
mrr_at_5 33.178999999999995
type value
ndcg_at_1 26.482
type value
ndcg_at_10 35.008
type value
ndcg_at_100 41.272999999999996
type value
ndcg_at_1000 43.972
type value
ndcg_at_3 30.804
type value
ndcg_at_5 33.046
type value
precision_at_1 26.482
type value
precision_at_10 6.462
type value
precision_at_100 1.431
type value
precision_at_1000 0.22899999999999998
type value
precision_at_3 14.360999999999999
type value
precision_at_5 10.474
type value
recall_at_1 22.091
type value
recall_at_10 45.125
type value
recall_at_100 72.313
type value
recall_at_1000 89.503
type value
recall_at_3 33.158
type value
recall_at_5 39.086999999999996
task dataset metrics
type
Retrieval
type name config split revision
BeIR/cqadupstack MTEB CQADupstackWordpressRetrieval default test None
type value
map_at_1 19.883
type value
map_at_10 26.951000000000004
type value
map_at_100 27.927999999999997
type value
map_at_1000 28.022000000000002
type value
map_at_3 24.616
type value
map_at_5 25.917
type value
mrr_at_1 21.996
type value
mrr_at_10 29.221000000000004
type value
mrr_at_100 30.024
type value
mrr_at_1000 30.095
type value
mrr_at_3 26.833000000000002
type value
mrr_at_5 28.155
type value
ndcg_at_1 21.996
type value
ndcg_at_10 31.421
type value
ndcg_at_100 36.237
type value
ndcg_at_1000 38.744
type value
ndcg_at_3 26.671
type value
ndcg_at_5 28.907
type value
precision_at_1 21.996
type value
precision_at_10 5.009
type value
precision_at_100 0.799
type value
precision_at_1000 0.11199999999999999
type value
precision_at_3 11.275
type value
precision_at_5 8.059
type value
recall_at_1 19.883
type value
recall_at_10 43.132999999999996
type value
recall_at_100 65.654
type value
recall_at_1000 84.492
type value
recall_at_3 30.209000000000003
type value
recall_at_5 35.616
task dataset metrics
type
Retrieval
type name config split revision
climate-fever MTEB ClimateFEVER default test None
type value
map_at_1 17.756
type value
map_at_10 30.378
type value
map_at_100 32.537
type value
map_at_1000 32.717
type value
map_at_3 25.599
type value
map_at_5 28.372999999999998
type value
mrr_at_1 41.303
type value
mrr_at_10 53.483999999999995
type value
mrr_at_100 54.106
type value
mrr_at_1000 54.127
type value
mrr_at_3 50.315
type value
mrr_at_5 52.396
type value
ndcg_at_1 41.303
type value
ndcg_at_10 40.503
type value
ndcg_at_100 47.821000000000005
type value
ndcg_at_1000 50.788
type value
ndcg_at_3 34.364
type value
ndcg_at_5 36.818
type value
precision_at_1 41.303
type value
precision_at_10 12.463000000000001
type value
precision_at_100 2.037
type value
precision_at_1000 0.26
type value
precision_at_3 25.798
type value
precision_at_5 19.896
type value
recall_at_1 17.756
type value
recall_at_10 46.102
type value
recall_at_100 70.819
type value
recall_at_1000 87.21799999999999
type value
recall_at_3 30.646
type value
recall_at_5 38.022
task dataset metrics
type
Retrieval
type name config split revision
dbpedia-entity MTEB DBPedia default test None
type value
map_at_1 9.033
type value
map_at_10 20.584
type value
map_at_100 29.518
type value
map_at_1000 31.186000000000003
type value
map_at_3 14.468
type value
map_at_5 17.177
type value
mrr_at_1 69.75
type value
mrr_at_10 77.025
type value
mrr_at_100 77.36699999999999
type value
mrr_at_1000 77.373
type value
mrr_at_3 75.583
type value
mrr_at_5 76.396
type value
ndcg_at_1 58.5
type value
ndcg_at_10 45.033
type value
ndcg_at_100 49.071
type value
ndcg_at_1000 56.056
type value
ndcg_at_3 49.936
type value
ndcg_at_5 47.471999999999994
type value
precision_at_1 69.75
type value
precision_at_10 35.775
type value
precision_at_100 11.594999999999999
type value
precision_at_1000 2.062
type value
precision_at_3 52.5
type value
precision_at_5 45.300000000000004
type value
recall_at_1 9.033
type value
recall_at_10 26.596999999999998
type value
recall_at_100 54.607000000000006
type value
recall_at_1000 76.961
type value
recall_at_3 15.754999999999999
type value
recall_at_5 20.033
task dataset metrics
type
Classification
type name config split revision
mteb/emotion MTEB EmotionClassification default test 4f58c6b202a23cf9a4da393831edf4f9183cad37
type value
accuracy 48.345000000000006
type value
f1 43.4514918068706
task dataset metrics
type
Retrieval
type name config split revision
fever MTEB FEVER default test None
type value
map_at_1 71.29100000000001
type value
map_at_10 81.059
type value
map_at_100 81.341
type value
map_at_1000 81.355
type value
map_at_3 79.74799999999999
type value
map_at_5 80.612
type value
mrr_at_1 76.40299999999999
type value
mrr_at_10 84.615
type value
mrr_at_100 84.745
type value
mrr_at_1000 84.748
type value
mrr_at_3 83.776
type value
mrr_at_5 84.343
type value
ndcg_at_1 76.40299999999999
type value
ndcg_at_10 84.981
type value
ndcg_at_100 86.00999999999999
type value
ndcg_at_1000 86.252
type value
ndcg_at_3 82.97
type value
ndcg_at_5 84.152
type value
precision_at_1 76.40299999999999
type value
precision_at_10 10.446
type value
precision_at_100 1.1199999999999999
type value
precision_at_1000 0.116
type value
precision_at_3 32.147999999999996
type value
precision_at_5 20.135
type value
recall_at_1 71.29100000000001
type value
recall_at_10 93.232
type value
recall_at_100 97.363
type value
recall_at_1000 98.905
type value
recall_at_3 87.893
type value
recall_at_5 90.804
task dataset metrics
type
Retrieval
type name config split revision
fiqa MTEB FiQA2018 default test None
type value
map_at_1 18.667
type value
map_at_10 30.853
type value
map_at_100 32.494
type value
map_at_1000 32.677
type value
map_at_3 26.91
type value
map_at_5 29.099000000000004
type value
mrr_at_1 37.191
type value
mrr_at_10 46.171
type value
mrr_at_100 47.056
type value
mrr_at_1000 47.099000000000004
type value
mrr_at_3 44.059
type value
mrr_at_5 45.147
type value
ndcg_at_1 37.191
type value
ndcg_at_10 38.437
type value
ndcg_at_100 44.62
type value
ndcg_at_1000 47.795
type value
ndcg_at_3 35.003
type value
ndcg_at_5 36.006
type value
precision_at_1 37.191
type value
precision_at_10 10.586
type value
precision_at_100 1.688
type value
precision_at_1000 0.22699999999999998
type value
precision_at_3 23.302
type value
precision_at_5 17.006
type value
recall_at_1 18.667
type value
recall_at_10 45.367000000000004
type value
recall_at_100 68.207
type value
recall_at_1000 87.072
type value
recall_at_3 32.129000000000005
type value
recall_at_5 37.719
task dataset metrics
type
Retrieval
type name config split revision
hotpotqa MTEB HotpotQA default test None
type value
map_at_1 39.494
type value
map_at_10 66.223
type value
map_at_100 67.062
type value
map_at_1000 67.11500000000001
type value
map_at_3 62.867
type value
map_at_5 64.994
type value
mrr_at_1 78.987
type value
mrr_at_10 84.585
type value
mrr_at_100 84.773
type value
mrr_at_1000 84.77900000000001
type value
mrr_at_3 83.592
type value
mrr_at_5 84.235
type value
ndcg_at_1 78.987
type value
ndcg_at_10 73.64
type value
ndcg_at_100 76.519
type value
ndcg_at_1000 77.51
type value
ndcg_at_3 68.893
type value
ndcg_at_5 71.585
type value
precision_at_1 78.987
type value
precision_at_10 15.529000000000002
type value
precision_at_100 1.7770000000000001
type value
precision_at_1000 0.191
type value
precision_at_3 44.808
type value
precision_at_5 29.006999999999998
type value
recall_at_1 39.494
type value
recall_at_10 77.643
type value
recall_at_100 88.825
type value
recall_at_1000 95.321
type value
recall_at_3 67.211
type value
recall_at_5 72.519
task dataset metrics
type
Classification
type name config split revision
mteb/imdb MTEB ImdbClassification default test 3d86128a09e091d6018b6d26cad27f2739fc2db7
type value
accuracy 85.55959999999999
type value
ap 80.7246500384617
type value
f1 85.52336485065454
task dataset metrics
type
Retrieval
type name config split revision
msmarco MTEB MSMARCO default dev None
type value
map_at_1 23.631
type value
map_at_10 36.264
type value
map_at_100 37.428
type value
map_at_1000 37.472
type value
map_at_3 32.537
type value
map_at_5 34.746
type value
mrr_at_1 24.312
type value
mrr_at_10 36.858000000000004
type value
mrr_at_100 37.966
type value
mrr_at_1000 38.004
type value
mrr_at_3 33.188
type value
mrr_at_5 35.367
type value
ndcg_at_1 24.312
type value
ndcg_at_10 43.126999999999995
type value
ndcg_at_100 48.642
type value
ndcg_at_1000 49.741
type value
ndcg_at_3 35.589
type value
ndcg_at_5 39.515
type value
precision_at_1 24.312
type value
precision_at_10 6.699
type value
precision_at_100 0.9450000000000001
type value
precision_at_1000 0.104
type value
precision_at_3 15.153
type value
precision_at_5 11.065999999999999
type value
recall_at_1 23.631
type value
recall_at_10 64.145
type value
recall_at_100 89.41
type value
recall_at_1000 97.83500000000001
type value
recall_at_3 43.769000000000005
type value
recall_at_5 53.169
task dataset metrics
type
Classification
type name config split revision
mteb/mtop_domain MTEB MTOPDomainClassification (en) en test d80d48c1eb48d3562165c59d59d0034df9fff0bf
type value
accuracy 93.4108527131783
type value
f1 93.1415880261038
task dataset metrics
type
Classification
type name config split revision
mteb/mtop_intent MTEB MTOPIntentClassification (en) en test ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
type value
accuracy 77.24806201550388
type value
f1 60.531916308197175
task dataset metrics
type
Classification
type name config split revision
mteb/amazon_massive_intent MTEB MassiveIntentClassification (en) en test 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
type value
accuracy 73.71553463349024
type value
f1 71.70753174900791
task dataset metrics
type
Classification
type name config split revision
mteb/amazon_massive_scenario MTEB MassiveScenarioClassification (en) en test 7d571f92784cd94a019292a1f45445077d0ef634
type value
accuracy 77.79757901815736
type value
f1 77.83719850433258
task dataset metrics
type
Clustering
type name config split revision
mteb/medrxiv-clustering-p2p MTEB MedrxivClusteringP2P default test e7a26af6f3ae46b30dde8737f02c07b1505bcc73
type value
v_measure 33.74193296622113
task dataset metrics
type
Clustering
type name config split revision
mteb/medrxiv-clustering-s2s MTEB MedrxivClusteringS2S default test 35191c8c0dca72d8ff3efcd72aa802307d469663
type value
v_measure 30.64257594108566
task dataset metrics
type
Reranking
type name config split revision
mteb/mind_small MTEB MindSmallReranking default test 3bdac13927fdc888b903db93b2ffdbd90b295a69
type value
map 30.811018518883625
type value
mrr 31.910376577445003
task dataset metrics
type
Retrieval
type name config split revision
nfcorpus MTEB NFCorpus default test None
type value
map_at_1 5.409
type value
map_at_10 13.093
type value
map_at_100 16.256999999999998
type value
map_at_1000 17.617
type value
map_at_3 9.555
type value
map_at_5 11.428
type value
mrr_at_1 45.201
type value
mrr_at_10 54.179
type value
mrr_at_100 54.812000000000005
type value
mrr_at_1000 54.840999999999994
type value
mrr_at_3 51.909000000000006
type value
mrr_at_5 53.519000000000005
type value
ndcg_at_1 43.189
type value
ndcg_at_10 35.028
type value
ndcg_at_100 31.226
type value
ndcg_at_1000 39.678000000000004
type value
ndcg_at_3 40.596
type value
ndcg_at_5 38.75
type value
precision_at_1 44.582
type value
precision_at_10 25.974999999999998
type value
precision_at_100 7.793
type value
precision_at_1000 2.036
type value
precision_at_3 38.493
type value
precision_at_5 33.994
type value
recall_at_1 5.409
type value
recall_at_10 16.875999999999998
type value
recall_at_100 30.316
type value
recall_at_1000 60.891
type value
recall_at_3 10.688
type value
recall_at_5 13.832
task dataset metrics
type
Retrieval
type name config split revision
nq MTEB NQ default test None
type value
map_at_1 36.375
type value
map_at_10 51.991
type value
map_at_100 52.91400000000001
type value
map_at_1000 52.93600000000001
type value
map_at_3 48.014
type value
map_at_5 50.381
type value
mrr_at_1 40.759
type value
mrr_at_10 54.617000000000004
type value
mrr_at_100 55.301
type value
mrr_at_1000 55.315000000000005
type value
mrr_at_3 51.516
type value
mrr_at_5 53.435
type value
ndcg_at_1 40.759
type value
ndcg_at_10 59.384
type value
ndcg_at_100 63.157
type value
ndcg_at_1000 63.654999999999994
type value
ndcg_at_3 52.114000000000004
type value
ndcg_at_5 55.986000000000004
type value
precision_at_1 40.759
type value
precision_at_10 9.411999999999999
type value
precision_at_100 1.153
type value
precision_at_1000 0.12
type value
precision_at_3 23.329
type value
precision_at_5 16.256999999999998
type value
recall_at_1 36.375
type value
recall_at_10 79.053
type value
recall_at_100 95.167
type value
recall_at_1000 98.82
type value
recall_at_3 60.475
type value
recall_at_5 69.327
task dataset metrics
type
Retrieval
type name config split revision
quora MTEB QuoraRetrieval default test None
type value
map_at_1 70.256
type value
map_at_10 83.8
type value
map_at_100 84.425
type value
map_at_1000 84.444
type value
map_at_3 80.906
type value
map_at_5 82.717
type value
mrr_at_1 80.97999999999999
type value
mrr_at_10 87.161
type value
mrr_at_100 87.262
type value
mrr_at_1000 87.263
type value
mrr_at_3 86.175
type value
mrr_at_5 86.848
type value
ndcg_at_1 80.97999999999999
type value
ndcg_at_10 87.697
type value
ndcg_at_100 88.959
type value
ndcg_at_1000 89.09899999999999
type value
ndcg_at_3 84.83800000000001
type value
ndcg_at_5 86.401
type value
precision_at_1 80.97999999999999
type value
precision_at_10 13.261000000000001
type value
precision_at_100 1.5150000000000001
type value
precision_at_1000 0.156
type value
precision_at_3 37.01
type value
precision_at_5 24.298000000000002
type value
recall_at_1 70.256
type value
recall_at_10 94.935
type value
recall_at_100 99.274
type value
recall_at_1000 99.928
type value
recall_at_3 86.602
type value
recall_at_5 91.133
task dataset metrics
type
Clustering
type name config split revision
mteb/reddit-clustering MTEB RedditClustering default test 24640382cdbf8abc73003fb0fa6d111a705499eb
type value
v_measure 56.322692497613104
task dataset metrics
type
Clustering
type name config split revision
mteb/reddit-clustering-p2p MTEB RedditClusteringP2P default test 282350215ef01743dc01b456c7f5241fa8937f16
type value
v_measure 61.895813503775074
task dataset metrics
type
Retrieval
type name config split revision
scidocs MTEB SCIDOCS default test None
type value
map_at_1 4.338
type value
map_at_10 10.767
type value
map_at_100 12.537999999999998
type value
map_at_1000 12.803999999999998
type value
map_at_3 7.788
type value
map_at_5 9.302000000000001
type value
mrr_at_1 21.4
type value
mrr_at_10 31.637999999999998
type value
mrr_at_100 32.688
type value
mrr_at_1000 32.756
type value
mrr_at_3 28.433000000000003
type value
mrr_at_5 30.178
type value
ndcg_at_1 21.4
type value
ndcg_at_10 18.293
type value
ndcg_at_100 25.274
type value
ndcg_at_1000 30.284
type value
ndcg_at_3 17.391000000000002
type value
ndcg_at_5 15.146999999999998
type value
precision_at_1 21.4
type value
precision_at_10 9.48
type value
precision_at_100 1.949
type value
precision_at_1000 0.316
type value
precision_at_3 16.167
type value
precision_at_5 13.22
type value
recall_at_1 4.338
type value
recall_at_10 19.213
type value
recall_at_100 39.562999999999995
type value
recall_at_1000 64.08
type value
recall_at_3 9.828000000000001
type value
recall_at_5 13.383000000000001
task dataset metrics
type
STS
type name config split revision
mteb/sickr-sts MTEB SICK-R default test a6ea5a8cab320b040a23452cc28066d9beae2cee
type value
cos_sim_pearson 82.42568163642142
type value
cos_sim_spearman 78.5797159641342
type value
euclidean_pearson 80.22151260811604
type value
euclidean_spearman 78.5797151953878
type value
manhattan_pearson 80.21224215864788
type value
manhattan_spearman 78.55641478381344
task dataset metrics
type
STS
type name config split revision
mteb/sts12-sts MTEB STS12 default test a0d554a64d88156834ff5ae9920b964011b16384
type value
cos_sim_pearson 85.44020710812569
type value
cos_sim_spearman 78.91631735081286
type value
euclidean_pearson 81.64188964182102
type value
euclidean_spearman 78.91633286881678
type value
manhattan_pearson 81.69294748512496
type value
manhattan_spearman 78.93438558002656
task dataset metrics
type
STS
type name config split revision
mteb/sts13-sts MTEB STS13 default test 7e90230a92c190f1bf69ae9002b8cea547a64cca
type value
cos_sim_pearson 84.27165426412311
type value
cos_sim_spearman 85.40429140249618
type value
euclidean_pearson 84.7509580724893
type value
euclidean_spearman 85.40429140249618
type value
manhattan_pearson 84.76488289321308
type value
manhattan_spearman 85.4256793698708
task dataset metrics
type
STS
type name config split revision
mteb/sts14-sts MTEB STS14 default test 6031580fec1f6af667f0bd2da0a551cf4f0b2375
type value
cos_sim_pearson 83.138851760732
type value
cos_sim_spearman 81.64101363896586
type value
euclidean_pearson 82.55165038934942
type value
euclidean_spearman 81.64105257080502
type value
manhattan_pearson 82.52802949883335
type value
manhattan_spearman 81.61255430718158
task dataset metrics
type
STS
type name config split revision
mteb/sts15-sts MTEB STS15 default test ae752c7c21bf194d8b67fd573edf7ae58183cbe3
type value
cos_sim_pearson 86.0654695484029
type value
cos_sim_spearman 87.20408521902229
type value
euclidean_pearson 86.8110651362115
type value
euclidean_spearman 87.20408521902229
type value
manhattan_pearson 86.77984656478691
type value
manhattan_spearman 87.1719947099227
task dataset metrics
type
STS
type name config split revision
mteb/sts16-sts MTEB STS16 default test 4d8694f8f0e0100860b497b999b3dbed754a0513
type value
cos_sim_pearson 83.77823915496512
type value
cos_sim_spearman 85.43566325729779
type value
euclidean_pearson 84.5396956658821
type value
euclidean_spearman 85.43566325729779
type value
manhattan_pearson 84.5665398848169
type value
manhattan_spearman 85.44375870303232
task dataset metrics
type
STS
type name config split revision
mteb/sts17-crosslingual-sts MTEB STS17 (en-en) en-en test af5e6fb845001ecf41f4c1e033ce921939a2a68d
type value
cos_sim_pearson 87.20030208471798
type value
cos_sim_spearman 87.20485505076539
type value
euclidean_pearson 88.10588324368722
type value
euclidean_spearman 87.20485505076539
type value
manhattan_pearson 87.92324770415183
type value
manhattan_spearman 87.0571314561877
task dataset metrics
type
STS
type name config split revision
mteb/sts22-crosslingual-sts MTEB STS22 (en) en test 6d1ba47164174a496b7fa5d3569dae26a6813b80
type value
cos_sim_pearson 63.06093161604453
type value
cos_sim_spearman 64.2163140357722
type value
euclidean_pearson 65.27589680994006
type value
euclidean_spearman 64.2163140357722
type value
manhattan_pearson 65.45904383711101
type value
manhattan_spearman 64.55404716679305
task dataset metrics
type
STS
type name config split revision
mteb/stsbenchmark-sts MTEB STSBenchmark default test b0fddb56ed78048fa8b90373c8a3cfc37b684831
type value
cos_sim_pearson 84.32976164578706
type value
cos_sim_spearman 85.54302197678368
type value
euclidean_pearson 85.26307149193056
type value
euclidean_spearman 85.54302197678368
type value
manhattan_pearson 85.26647282029371
type value
manhattan_spearman 85.5316135265568
task dataset metrics
type
Reranking
type name config split revision
mteb/scidocs-reranking MTEB SciDocsRR default test d3c5e1fc0b855ab6097bf1cda04dd73947d7caab
type value
map 81.44675968318754
type value
mrr 94.92741826075158
task dataset metrics
type
Retrieval
type name config split revision
scifact MTEB SciFact default test None
type value
map_at_1 56.34400000000001
type value
map_at_10 65.927
type value
map_at_100 66.431
type value
map_at_1000 66.461
type value
map_at_3 63.529
type value
map_at_5 64.818
type value
mrr_at_1 59.333000000000006
type value
mrr_at_10 67.54599999999999
type value
mrr_at_100 67.892
type value
mrr_at_1000 67.917
type value
mrr_at_3 65.778
type value
mrr_at_5 66.794
type value
ndcg_at_1 59.333000000000006
type value
ndcg_at_10 70.5
type value
ndcg_at_100 72.688
type value
ndcg_at_1000 73.483
type value
ndcg_at_3 66.338
type value
ndcg_at_5 68.265
type value
precision_at_1 59.333000000000006
type value
precision_at_10 9.3
type value
precision_at_100 1.053
type value
precision_at_1000 0.11199999999999999
type value
precision_at_3 25.889
type value
precision_at_5 16.866999999999997
type value
recall_at_1 56.34400000000001
type value
recall_at_10 82.789
type value
recall_at_100 92.767
type value
recall_at_1000 99
type value
recall_at_3 71.64399999999999
type value
recall_at_5 76.322
task dataset metrics
type
PairClassification
type name config split revision
mteb/sprintduplicatequestions-pairclassification MTEB SprintDuplicateQuestions default test d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46
type value
cos_sim_accuracy 99.75742574257426
type value
cos_sim_ap 93.52081548447406
type value
cos_sim_f1 87.33850129198966
type value
cos_sim_precision 90.37433155080214
type value
cos_sim_recall 84.5
type value
dot_accuracy 99.75742574257426
type value
dot_ap 93.52081548447406
type value
dot_f1 87.33850129198966
type value
dot_precision 90.37433155080214
type value
dot_recall 84.5
type value
euclidean_accuracy 99.75742574257426
type value
euclidean_ap 93.52081548447406
type value
euclidean_f1 87.33850129198966
type value
euclidean_precision 90.37433155080214
type value
euclidean_recall 84.5
type value
manhattan_accuracy 99.75841584158415
type value
manhattan_ap 93.4975678585854
type value
manhattan_f1 87.26708074534162
type value
manhattan_precision 90.45064377682404
type value
manhattan_recall 84.3
type value
max_accuracy 99.75841584158415
type value
max_ap 93.52081548447406
type value
max_f1 87.33850129198966
task dataset metrics
type
Clustering
type name config split revision
mteb/stackexchange-clustering MTEB StackExchangeClustering default test 6cbc1f7b2bc0622f2e39d2c77fa502909748c259
type value
v_measure 64.31437036686651
task dataset metrics
type
Clustering
type name config split revision
mteb/stackexchange-clustering-p2p MTEB StackExchangeClusteringP2P default test 815ca46b2622cec33ccafc3735d572c266efdb44
type value
v_measure 33.25569319007206
task dataset metrics
type
Reranking
type name config split revision
mteb/stackoverflowdupquestions-reranking MTEB StackOverflowDupQuestions default test e185fbe320c72810689fc5848eb6114e1ef5ec69
type value
map 49.90474939720706
type value
mrr 50.568115503777264
task dataset metrics
type
Summarization
type name config split revision
mteb/summeval MTEB SummEval default test cda12ad7615edc362dbf25a00fdd61d3b1eaf93c
type value
cos_sim_pearson 29.866828641244712
type value
cos_sim_spearman 30.077555055873866
type value
dot_pearson 29.866832988572266
type value
dot_spearman 30.077555055873866
task dataset metrics
type
Retrieval
type name config split revision
trec-covid MTEB TRECCOVID default test None
type value
map_at_1 0.232
type value
map_at_10 2.094
type value
map_at_100 11.971
type value
map_at_1000 28.158
type value
map_at_3 0.688
type value
map_at_5 1.114
type value
mrr_at_1 88
type value
mrr_at_10 93.4
type value
mrr_at_100 93.4
type value
mrr_at_1000 93.4
type value
mrr_at_3 93
type value
mrr_at_5 93.4
type value
ndcg_at_1 84
type value
ndcg_at_10 79.923
type value
ndcg_at_100 61.17
type value
ndcg_at_1000 53.03
type value
ndcg_at_3 84.592
type value
ndcg_at_5 82.821
type value
precision_at_1 88
type value
precision_at_10 85
type value
precision_at_100 63.019999999999996
type value
precision_at_1000 23.554
type value
precision_at_3 89.333
type value
precision_at_5 87.2
type value
recall_at_1 0.232
type value
recall_at_10 2.255
type value
recall_at_100 14.823
type value
recall_at_1000 49.456
type value
recall_at_3 0.718
type value
recall_at_5 1.175
task dataset metrics
type
Retrieval
type name config split revision
webis-touche2020 MTEB Touche2020 default test None
type value
map_at_1 2.547
type value
map_at_10 11.375
type value
map_at_100 18.194
type value
map_at_1000 19.749
type value
map_at_3 5.825
type value
map_at_5 8.581
type value
mrr_at_1 32.653
type value
mrr_at_10 51.32
type value
mrr_at_100 51.747
type value
mrr_at_1000 51.747
type value
mrr_at_3 47.278999999999996
type value
mrr_at_5 48.605
type value
ndcg_at_1 29.592000000000002
type value
ndcg_at_10 28.151
type value
ndcg_at_100 39.438
type value
ndcg_at_1000 50.769
type value
ndcg_at_3 30.758999999999997
type value
ndcg_at_5 30.366
type value
precision_at_1 32.653
type value
precision_at_10 25.714
type value
precision_at_100 8.041
type value
precision_at_1000 1.555
type value
precision_at_3 33.333
type value
precision_at_5 31.837
type value
recall_at_1 2.547
type value
recall_at_10 18.19
type value
recall_at_100 49.538
type value
recall_at_1000 83.86
type value
recall_at_3 7.329
type value
recall_at_5 11.532
task dataset metrics
type
Classification
type name config split revision
mteb/toxic_conversations_50k MTEB ToxicConversationsClassification default test d7c0de2777da35d6aae2200a62c6e0e5af397c4c
type value
accuracy 71.4952
type value
ap 14.793362635531409
type value
f1 55.204635551516915
task dataset metrics
type
Classification
type name config split revision
mteb/tweet_sentiment_extraction MTEB TweetSentimentExtractionClassification default test d604517c81ca91fe16a244d1248fc021f9ecee7a
type value
accuracy 61.5365025466893
type value
f1 61.81742556334845
task dataset metrics
type
Clustering
type name config split revision
mteb/twentynewsgroups-clustering MTEB TwentyNewsgroupsClustering default test 6125ec4e24fa026cec8a478383ee943acfbd5449
type value
v_measure 49.05531070301185
task dataset metrics
type
PairClassification
type name config split revision
mteb/twittersemeval2015-pairclassification MTEB TwitterSemEval2015 default test 70970daeab8776df92f5ea462b6173c0b46fd2d1
type value
cos_sim_accuracy 86.51725576682364
type value
cos_sim_ap 75.2292304265163
type value
cos_sim_f1 69.54022988505749
type value
cos_sim_precision 63.65629110039457
type value
cos_sim_recall 76.62269129287598
type value
dot_accuracy 86.51725576682364
type value
dot_ap 75.22922386081054
type value
dot_f1 69.54022988505749
type value
dot_precision 63.65629110039457
type value
dot_recall 76.62269129287598
type value
euclidean_accuracy 86.51725576682364
type value
euclidean_ap 75.22925730473472
type value
euclidean_f1 69.54022988505749
type value
euclidean_precision 63.65629110039457
type value
euclidean_recall 76.62269129287598
type value
manhattan_accuracy 86.52321630804077
type value
manhattan_ap 75.20608115037336
type value
manhattan_f1 69.60000000000001
type value
manhattan_precision 64.37219730941705
type value
manhattan_recall 75.75197889182058
type value
max_accuracy 86.52321630804077
type value
max_ap 75.22925730473472
type value
max_f1 69.60000000000001
task dataset metrics
type
PairClassification
type name config split revision
mteb/twitterurlcorpus-pairclassification MTEB TwitterURLCorpus default test 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf
type value
cos_sim_accuracy 89.34877944657896
type value
cos_sim_ap 86.71257569277373
type value
cos_sim_f1 79.10386355986088
type value
cos_sim_precision 76.91468470434214
type value
cos_sim_recall 81.4213119802895
type value
dot_accuracy 89.34877944657896
type value
dot_ap 86.71257133133368
type value
dot_f1 79.10386355986088
type value
dot_precision 76.91468470434214
type value
dot_recall 81.4213119802895
type value
euclidean_accuracy 89.34877944657896
type value
euclidean_ap 86.71257651501476
type value
euclidean_f1 79.10386355986088
type value
euclidean_precision 76.91468470434214
type value
euclidean_recall 81.4213119802895
type value
manhattan_accuracy 89.35848177901967
type value
manhattan_ap 86.69330615469126
type value
manhattan_f1 79.13867741453949
type value
manhattan_precision 76.78881807647741
type value
manhattan_recall 81.63689559593472
type value
max_accuracy 89.35848177901967
type value
max_ap 86.71257651501476
type value
max_f1 79.13867741453949
apache-2.0
en
nomic-ai/nomic-embed-text-v1.5

nomic-embed-text-v1: A Reproducible Long Context (8192) Text Embedder

nomic-embed-text-v1 is 8192 context length text encoder that surpasses OpenAI text-embedding-ada-002 and text-embedding-3-small performance on short and long context tasks.

Performance Benchmarks

Name SeqLen MTEB LoCo Jina Long Context Open Weights Open Training Code Open Data
nomic-embed-text-v1 8192 62.39 85.53 54.16
jina-embeddings-v2-base-en 8192 60.39 85.45 51.90
text-embedding-3-small 8191 62.26 82.40 58.20
text-embedding-ada-002 8191 60.99 52.7 55.25

Exciting Update!: nomic-embed-text-v1 is now multimodal! nomic-embed-vision-v1 is aligned to the embedding space of nomic-embed-text-v1, meaning any text embedding is multimodal!

Usage

Important: the text prompt must include a task instruction prefix, instructing the model which task is being performed.

For example, if you are implementing a RAG application, you embed your documents as search_document: <text here> and embed your user queries as search_query: <text here>.

Task instruction prefixes

search_document

Purpose: embed texts as documents from a dataset

This prefix is used for embedding texts as documents, for example as documents for a RAG index.

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("nomic-ai/nomic-embed-text-v1", trust_remote_code=True)
sentences = ['search_document: TSNE is a dimensionality reduction algorithm created by Laurens van Der Maaten']
embeddings = model.encode(sentences)
print(embeddings)

search_query

Purpose: embed texts as questions to answer

This prefix is used for embedding texts as questions that documents from a dataset could resolve, for example as queries to be answered by a RAG application.

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("nomic-ai/nomic-embed-text-v1", trust_remote_code=True)
sentences = ['search_query: Who is Laurens van Der Maaten?']
embeddings = model.encode(sentences)
print(embeddings)

clustering

Purpose: embed texts to group them into clusters

This prefix is used for embedding texts in order to group them into clusters, discover common topics, or remove semantic duplicates.

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("nomic-ai/nomic-embed-text-v1", trust_remote_code=True)
sentences = ['clustering: the quick brown fox']
embeddings = model.encode(sentences)
print(embeddings)

classification

Purpose: embed texts to classify them

This prefix is used for embedding texts into vectors that will be used as features for a classification model

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("nomic-ai/nomic-embed-text-v1", trust_remote_code=True)
sentences = ['classification: the quick brown fox']
embeddings = model.encode(sentences)
print(embeddings)

Sentence Transformers

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("nomic-ai/nomic-embed-text-v1", trust_remote_code=True)
sentences = ['search_query: What is TSNE?', 'search_query: Who is Laurens van der Maaten?']
embeddings = model.encode(sentences)
print(embeddings)

Transformers

import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModel

def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0]
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)

sentences = ['search_query: What is TSNE?', 'search_query: Who is Laurens van der Maaten?']

tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
model = AutoModel.from_pretrained('nomic-ai/nomic-embed-text-v1', trust_remote_code=True)
model.eval()

encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

with torch.no_grad():
    model_output = model(**encoded_input)

embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
embeddings = F.normalize(embeddings, p=2, dim=1)
print(embeddings)

The model natively supports scaling of the sequence length past 2048 tokens. To do so,

- tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
+ tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased', model_max_length=8192)


- model = AutoModel.from_pretrained('nomic-ai/nomic-embed-text-v1', trust_remote_code=True)
+ model = AutoModel.from_pretrained('nomic-ai/nomic-embed-text-v1', trust_remote_code=True, rotary_scaling_factor=2)

Transformers.js

import { pipeline } from '@xenova/transformers';

// Create a feature extraction pipeline
const extractor = await pipeline('feature-extraction', 'nomic-ai/nomic-embed-text-v1', {
    quantized: false, // Comment out this line to use the quantized version
});

// Compute sentence embeddings
const texts = ['search_query: What is TSNE?', 'search_query: Who is Laurens van der Maaten?'];
const embeddings = await extractor(texts, { pooling: 'mean', normalize: true });
console.log(embeddings);

Nomic API

The easiest way to get started with Nomic Embed is through the Nomic Embedding API.

Generating embeddings with the nomic Python client is as easy as

from nomic import embed

output = embed.text(
    texts=['Nomic Embedding API', '#keepAIOpen'],
    model='nomic-embed-text-v1',
    task_type='search_document'
)

print(output)

For more information, see the API reference

Training

Click the Nomic Atlas map below to visualize a 5M sample of our contrastive pretraining data!

image/webp

We train our embedder using a multi-stage training pipeline. Starting from a long-context BERT model, the first unsupervised contrastive stage trains on a dataset generated from weakly related text pairs, such as question-answer pairs from forums like StackExchange and Quora, title-body pairs from Amazon reviews, and summarizations from news articles.

In the second finetuning stage, higher quality labeled datasets such as search queries and answers from web searches are leveraged. Data curation and hard-example mining is crucial in this stage.

For more details, see the Nomic Embed Technical Report and corresponding blog post.

Training data to train the models is released in its entirety. For more details, see the contrastors repository

Join the Nomic Community

Citation

If you find the model, dataset, or training code useful, please cite our work

@misc{nussbaum2024nomic,
      title={Nomic Embed: Training a Reproducible Long Context Text Embedder}, 
      author={Zach Nussbaum and John X. Morris and Brandon Duderstadt and Andriy Mulyar},
      year={2024},
      eprint={2402.01613},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}