How to Build a code assistant with open-source LLMs using RAG Fine-tuning
RAG fine-tuning combines code retrieval with model training, addressing the limitations of outdated knowledge and hallucinations in LLMs. Our experiments with fine-tuning Mistral 7B Instruct v0.2 on Together AI Platform show that RAG fine-tuned models achieve up to 16% better accuracy than Claude 3 Opus, while offering 3.7x faster speed and an astounding 150x cost reduction.
RAG fine-tuning addresses limitations in LLMs, like outdated knowledge and hallucinations, by combining code retrieval with model training. Our experiments with fine-tuning Mistral 7B Instruct v0.2 on Together AI Platform show:
- 16% better accuracy than Claude 3 Opus
- 3.7x faster speed
- 150x cost reduction
Compared to GPT-4o, the models achieve:
- 19% quality improvement
- 1.1x faster speed
- 37.5x cost reduction
The Challenge of Code Generation with LLMs
LLMs excel in various applications but struggle with hallucinations and outdated information, especially in code generation.
RAG - Retrieval Augmented Generation
Retrieval-Augmented Generation (RAG) overcomes these limitations by incorporating retrieval methods into text generation. This involves:
- Indexing Phase: External knowledge sources are divided into chunks, converted into vectors, and stored in a vector database.
- Querying Phase: Relevant information is retrieved and combined with the query for the generation model.
Collaborating with Morph Labs, we fine-tuned Mistral 7B Instruct v0.2, resulting in models that surpass GPT-4o and Claude 3 Opus.
Online Repository-Level Fine-Tuning with Retrieval
RAG fine-tuning improves code generation by retrieving relevant code snippets from a repository based on each query, providing contextual grounding during training.
Data
Morph Labs generated synthetic datasets for training and evaluation. Each sample contains a question, the correct answer, and code snippets retrieved via the Morph Code API.
Experiments
We conducted experiments on five codebases: Axolotl, Deepspeed, vLLM, Mapbox, and WandB.
Fine-Tuning
Fine-tuned Mistral 7B Instruct v0.2 using the Together Fine-tuning API, with four epochs, batch size of 12, and learning rate of 4e-6.
Inference
During live query handling, the model retrieves the most relevant and recent code snippets via the Morph Code API before generating a response.
Results
Evaluated using HitRate (%), we compared the RAG fine-tuned Mistral 7B Instruct v0.2 to other models, showing significant improvements.
RAG fine-tuning significantly improves AI code assistants by providing repository-level context. Leveraging the Together API and the Morph Code API, we enhance the accuracy and applicability of LLM-generated code, making these models valuable tools for developers.