Use LLaMa to create embeddings
⚠️ This is not recommended for production. For server deployment, you should never run a model in Embedbase directly (contact to learn more) ⚠️
LLaMa (opens in a new tab) is a large language model released by Facebook AI Research (FAIR) in 2023.
We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.
In this example we will implement a local embedder for Embedbase which will use the LLaMa model to create embeddings. Especially we're going to use the Vicuna (opens in a new tab).
We introduce Vicuna-13B, an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT. Preliminary evaluation using GPT-4 as a judge shows Vicuna-13B achieves more than 90%* quality of OpenAI ChatGPT and Google Bard while outperforming other models like LLaMA and Stanford Alpaca in more than 90%* of cases. The cost of training Vicuna-13B is around $300. The training and serving code, along with an online demo, are publicly available for non-commercial use.
Installation
Install the required dependencies in a virtual environment:
virtualenv env
source env/bin/activate
pip install embedbase llama-cpp-python
Download a Llama model
We will use a 4 bit quantized version of the Vicuna model to be able to run it on a CPU on consumer hardware.
⚠️ Use this model at your own risk. ⚠️
wget https://huggingface.co/eachadea/ggml-vicuna-7b-4bit/resolve/main/ggml-vicuna-7b-4bit-rev1.bin
Implement the Embedder & start embedbase
Create a new file main.py
with the following code:
from typing import List, Union
from embedbase import get_app
from embedbase.database.memory_db import MemoryDatabase
from embedbase.embedding.base import Embedder
import uvicorn
import llama_cpp
from llama_cpp import Llama
class LlamaEmbedder(Embedder):
EMBEDDING_MODEL = "ggml-vicuna-7b-4bit-rev1.bin"
def __init__(
self, model: str = EMBEDDING_MODEL, **kwargs
):
super().__init__(**kwargs)
self.model = Llama(model_path=model, embedding=True)
self.model.create_embedding("Hello world!")
@property
def dimensions(self) -> int:
"""
Return the dimensions of the embeddings
:return: dimensions of the embeddings
"""
return llama_cpp.llama_n_embd(self.model.ctx)
def is_too_big(self, text: str) -> bool:
"""
Check if text is too big to be embedded,
delegating the splitting UX to the caller
:param text: text to check
:return: True if text is too big, False otherwise
"""
return len(text) > self.model.params.n_ctx
async def embed(self, data: Union[List[str], str]) -> List[List[float]]:
"""
Embed a list of texts
:param texts: list of texts
:return: list of embeddings
"""
return [self.model.embed(e) for e in data]
embedder = LlamaEmbedder("ggml-vicuna-7b-4bit-rev1.bin")
app = (
get_app()
.use_embedder(embedder)
.use_db(MemoryDatabase(dimensions=embedder.dimensions))
.run()
)
if __name__ == "__main__":
uvicorn.run("main:app")
Start the Embedbase application with the following command:
python3 main.py
Test the endpoint
TypeScript
import { createClient } from 'embedbase-js'
const embedbase = createClient("http://localhost:8000")
const SENTENCES = [
"The lion is the king of the savannah.",
"The chimpanzee is a great ape.",
"The elephant is the largest land animal.",
];
const DATASET_ID = "animals"
const add = async () => {
const data = await embedbase
.dataset(DATASET_ID)
.batchAdd(SENTENCES.map((data) => ({ data })))
console.log(data)
}
add()
You should get a similar response:
{
"results": [
{
"data": "The lion is the king of the savannah.",
"embedding": [...],
"hash": ...,
"metadata": null
},
{
"data": "The chimpanzee is a great ape.",
"embedding": [...],
"hash": ...,
"metadata": null
},
{
"data": "The elephant is the largest land animal.",
"embedding": [...],
"hash": ...,
"metadata": null
}
]
}
Let's try to search now
TypeScript
const search = async () => {
const data = await embedbase
.dataset(DATASET_ID)
.search("Animal that lives in the savannah", { limit: 1})
console.log(data)
}
search()
You should get a similar response:
"The lion is the king of the savannah."