How to Build a code assistant with open-source LLMs using RAG Fine-tuning

RAG fine-tuning combines code retrieval with model training, addressing the limitations of outdated knowledge and hallucinations in LLMs. Our experiments with fine-tuning Mistral 7B Instruct v0.2 on Together AI Platform show that RAG fine-tuned models achieve up to 16% better accuracy than Claude 3 Opus, while offering 3.7x faster speed and an astounding 150x cost reduction.

RAG fine-tuning addresses limitations in LLMs, like outdated knowledge and hallucinations, by combining code retrieval with model training. Our experiments with fine-tuning Mistral 7B Instruct v0.2 on Together AI Platform show:

16% better accuracy than Claude 3 Opus
3.7x faster speed
150x cost reduction

Compared to GPT-4o, the models achieve:

19% quality improvement
1.1x faster speed
37.5x cost reduction

Figure 1. Comparing RAG fine-tuned Mistral 7B Instruct v0.2 to Claude 3 Opus and GPT-4o in quality, speed, and cost

The Challenge of Code Generation with LLMs

LLMs excel in various applications but struggle with hallucinations and outdated information, especially in code generation.

RAG - Retrieval Augmented Generation

Retrieval-Augmented Generation (RAG) overcomes these limitations by incorporating retrieval methods into text generation. This involves:

Indexing Phase: External knowledge sources are divided into chunks, converted into vectors, and stored in a vector database.
Querying Phase: Relevant information is retrieved and combined with the query for the generation model.

Collaborating with Morph Labs, we fine-tuned Mistral 7B Instruct v0.2, resulting in models that surpass GPT-4o and Claude 3 Opus.

Online Repository-Level Fine-Tuning with Retrieval

RAG fine-tuning improves code generation by retrieving relevant code snippets from a repository based on each query, providing contextual grounding during training.

Data

Morph Labs generated synthetic datasets for training and evaluation. Each sample contains a question, the correct answer, and code snippets retrieved via the Morph Code API.

Experiments

We conducted experiments on five codebases: Axolotl, Deepspeed, vLLM, Mapbox, and WandB.

Fine-Tuning

Fine-tuned Mistral 7B Instruct v0.2 using the Together Fine-tuning API, with four epochs, batch size of 12, and learning rate of 4e-6.

Inference

During live query handling, the model retrieves the most relevant and recent code snippets via the Morph Code API before generating a response.

Results

Evaluated using HitRate (%), we compared the RAG fine-tuned Mistral 7B Instruct v0.2 to other models, showing significant improvements.

Effectiveness of RAG Fine-tuned Mistral 7B Instruct v0.2 on five widely used AI open source codebases compared to no fine-tuned model, Claude 3 Opus, and GPT-4o

RAG fine-tuning significantly improves AI code assistants by providing repository-level context. Leveraging the Together API and the Morph Code API, we enhance the accuracy and applicability of LLM-generated code, making these models valuable tools for developers.