Back to Blog

What is LlamaIndex?

LlamaIndex RAG LLM Nutri-AI Groq Hugging Face Embeddings

LlamaIndex is an open-source framework for building applications that connect your data (documents, PDFs, databases) with language models (LLMs). It is not a model itself: it is the “orchestration” that lets you index documents, search by meaning, and give context to the LLM so it can answer.

It is mainly used for RAG (Retrieval-Augmented Generation): loading documents, creating vector indexes, retrieving relevant chunks, and passing them to the LLM to generate answers.


1. What LlamaIndex “includes”: LLM integrations

LlamaIndex does not include the models inside the package. It provides a unified interface to connect to many external providers via optional packages (llama-index-llms-*). Each provider is a separate integration.

Some LLM integrations it supports:

Provider / type Typical package Example models
Groq llama-index-llms-groq Llama 3.1, Mixtral
OpenAI llama-index-llms-openai GPT-4, GPT-3.5
Anthropic llama-index-llms-anthropic Claude
Google llama-index-llms-gemini Gemini
Hugging Face llama-index-llms-huggingface Local / remote models
Cohere llama-index-llms-cohere Command R, etc.
Mistral llama-index-llms-mistralai Mistral, Mixtral
Azure OpenAI llama-index-llms-azure-openai GPT via Azure
Ollama llama-index-llms-ollama Local models
LlamaCPP llama-index-llms-llama-cpp Local GGUF models
LiteLLM llama-index-llms-litellm Many providers

And more (Bedrock, Fireworks, etc.). The full list is in the LlamaIndex docs under “LLM integrations”.


2. Hugging Face embeddings

LlamaIndex lets you use embedding models from Hugging Face to turn text into vectors. Those vectors are used to build the index and do semantic search (find chunks relevant to a question).

  • Typical package: llama-index-embeddings-huggingface
  • The model runs locally (or on your server), with no Hugging Face API key needed for inference.
  • In Nutri-AI we use BAAI/bge-small-en-v1.5 to index PDFs and to vectorise the user’s question when searching.

3. Memory buffer (chat history)

LlamaIndex includes ChatMemoryBuffer and related classes to keep conversation history in memory during a chat session. That allows:

  • Follow-up questions (“What if I exercise a lot?”).
  • The engine to know what was said before (user and assistant).

In Nutri-AI the history is not stored on the backend: the frontend sends chat_history with each request. The function _build_memory_from_history in ai_engine.py turns that list of messages into a LlamaIndex ChatMemoryBuffer for that single request. So LlamaIndex gives you the memory buffer via ChatMemoryBuffer, ChatMessage, and MessageRole, and you rebuild the history from what the client sends.


4. What LlamaIndex is used for in Nutri-AI

All in ai_engine.py:

Use in Nutri-AI LlamaIndex module / class
LLM (generate answers) llama_index.llms.groqGroq (model llama-3.1-8b-instant)
Embeddings (vectorise text) llama_index.embeddings.huggingfaceHuggingFaceEmbedding (BAAI/bge-small-en-v1.5)
Load PDFs llama_index.coreSimpleDirectoryReader, Document
Vector index and persistence VectorStoreIndex, StorageContext, load_index_from_storage, Settings
Chat history (memory buffer) ChatMemoryBuffer, ChatMessage, MessageRole
RAG chat engine index.as_chat_engine with chat_mode="condense_plus_context" and memory

Dependencies in requirements.txt:

  • llama-index
  • llama-index-core
  • llama-index-llms-groq
  • llama-index-embeddings-huggingface
  • llama-index-readers-file

main.py does not import LlamaIndex; it only calls ai_engine.chat().


This post summarised what LlamaIndex is, its LLM and embedding integrations, the memory buffer, and how it is used in Nutri-AI. For what RAG is and how it is implemented in Nutri-AI, see What is RAG and how it is implemented in Nutri-AI. For a step-by-step tutorial to add a RAG to your project, see How to add a RAG to your project.