Qodo-Embed-1 is a state-of-the-art code embedding model designed for retrieval tasks in the software development domain.
It is offered in two sizes: lite (1.5B) and medium (7B). The model is optimized for natural language-to-code and code-to-code retrieval, making it highly effective for applications such as code search, retrieval-augmented generation (RAG), and contextual understanding of programming languages.
This model outperforms all previous open-source models in the COIR and MTEB leaderboards, achieving best-in-class performance with a significantly smaller size compared to competing models.
Languages Supported:
Python
C++
C#
Go
Java
Javascript
PHP
Ruby
Typescript
Model Information
Model Size: 1.5B
Embedding Dimension: 1536
Max Input Tokens: 32k
Requirements
transformers>=4.39.2
flash_attn>=2.5.6
Usage
Sentence Transformers
fromsentence_transformersimportSentenceTransformer# Download from the 🤗 Hubmodel=SentenceTransformer("Qodo/Qodo-Embed-1-1.5B")# Run inferencesentences=['accumulator = sum(item.value for item in collection)','result = reduce(lambda acc, curr: acc + curr.amount, data, 0)','matrix = [[i*j for j in range(n)] for i in range(n)]']embeddings=model.encode(sentences)print(embeddings.shape)# [3, 1536]# Get the similarity scores for the embeddingssimilarities=model.similarity(embeddings,embeddings)print(similarities.shape)# [3, 3]
Transformers
importtorchimporttorch.nn.functionalasFfromtorchimportTensorfromtransformersimportAutoTokenizer,AutoModeldeflast_token_pool(last_hidden_states:Tensor,attention_mask:Tensor)->Tensor:left_padding=(attention_mask[:,-1].sum()==attention_mask.shape[0])ifleft_padding:returnlast_hidden_states[:,-1]else:sequence_lengths=attention_mask.sum(dim=1)-1batch_size=last_hidden_states.shape[0]returnlast_hidden_states[torch.arange(batch_size,device=last_hidden_states.device),sequence_lengths]# Each query must come with a one-sentence instruction that describes the taskqueries=['how to handle memory efficient data streaming','implement binary tree traversal']documents=["""def process_in_chunks():
buffer = deque(maxlen=1000)
for record in source_iterator:
buffer.append(transform(record))
if len(buffer) >= 1000:
yield from buffer
buffer.clear()""","""class LazyLoader:
def __init__(self, source):
self.generator = iter(source)
self._cache = []
def next_batch(self, size=100):
while len(self._cache) < size:
try:
self._cache.append(next(self.generator))
except StopIteration:
break
return self._cache.pop(0) if self._cache else None""","""def dfs_recursive(root):
if not root:
return []
stack = []
stack.extend(dfs_recursive(root.right))
stack.append(root.val)
stack.extend(dfs_recursive(root.left))
return stack"""]input_texts=queries+documentstokenizer=AutoTokenizer.from_pretrained('Qodo/Qodo-Embed-1-1.5B',trust_remote_code=True)model=AutoModel.from_pretrained('Qodo/Qodo-Embed-1-1.5B',trust_remote_code=True)max_length=8192# Tokenize the input textsbatch_dict=tokenizer(input_texts,max_length=max_length,padding=True,truncation=True,return_tensors='pt')outputs=model(**batch_dict)embeddings=last_token_pool(outputs.last_hidden_state,batch_dict['attention_mask'])# normalize embeddingsembeddings=F.normalize(embeddings,p=2,dim=1)scores=(embeddings[:2]@embeddings[2:].T)*100print(scores.tolist())