Large Language Models (LLMs), such as DeepSeek, are powerful tools for question answering, chatbots, and intelligent assistants. However, by default, they are trained on general-purpose data and may not know about your company’s specific policies, procedures, or internal knowledge base.
This is where fine-tuning with LoRA (Low-Rank Adaptation) comes in. Instead of retraining the whole model, which requires massive GPUs and days of compute, you can adapt DeepSeek efficiently using LoRA. This technique adds lightweight trainable layers to the model, keeping the original weights frozen. As a result, you can customize DeepSeek on a single GPU (such as an NVIDIA RTX 3090/4090 or A100) with as little as 12 GB VRAM.
In this tutorial, you’ll learn how to train DeepSeek with a custom knowledge base using Lora on an Ubuntu 24.04 GPU server.
Prerequisites
- An Ubuntu 24.04 server with an NVIDIA GPU.
- A non-root user or a user with sudo privileges.
- NVIDIA drivers are installed on your server.
Step 1 – Install Prerequisites
Before you begin training DeepSeek with a custom knowledge base, you need to prepare your Ubuntu 24.04 GPU server with the right dependencies.
Important. Make sure you are using a non-root user account with sudo privileges.
adduser username usermod -aG sudo username su - username
1. First, update your system and install the basic packages:
sudo apt update sudo apt install git wget curl python3 python3-venv python3-pip build-essential -y
2. Next, create a Python virtual environment to keep this project isolated.
python3 -m venv deepseek-env source deepseek-env/bin/activate
Step 2 – Install Python Libraries
With your virtual environment activated, the next step is to install the Python libraries required for training DeepSeek with LoRA. These include PyTorch (with CUDA support), Hugging Face Transformers, and additional optimization libraries.
1. First, upgrade pip to the latest version.
pip install --upgrade pip
2. Now install PyTorch with CUDA 12.4 support. This ensures GPU acceleration is enabled on your Ubuntu 24.04 server.
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
3. Next, install Hugging Face libraries and additional packages needed for LoRA fine-tuning.
pip install transformers datasets accelerate peft bitsandbytes sentencepiece
4. Once installed, you can check that PyTorch detects your GPU with.
python3 -c "import torch; print(torch.cuda.is_available())"
If everything is set up correctly, it should return:
True
Step 3 – Create a Custom Dataset
To fine-tune DeepSeek with LoRA, you need a knowledge base dataset that represents the information you want the model to learn. This dataset could be your company handbook, IT policies, FAQs, or any domain-specific text.
1. Let’s create a small dataset using Python. Open a new file:
nano create_dataset.py
Add the following code:
# create_dataset.py from datasets import Dataset # Example knowledge base (company handbook, IT policy, FAQs) documents = [ {"text": "Welcome to ACME Corp. Our mission is to provide high quality products."}, {"text": "All employees must use two-factor authentication when logging into company systems."}, {"text": "Working hours are from 9 AM to 6 PM, Monday through Friday."}, {"text": "If you forget your password, contact IT support via [email protected]."}, {"text": "Our company handbook states that teamwork and innovation are core values."}, {"text": "Employees are entitled to 20 days of paid vacation annually."}, {"text": "To access the VPN, employees must install the Cisco AnyConnect client."}, {"text": "For cybersecurity, do not share your credentials with anyone."}, {"text": "Remote work is allowed up to 3 days per week."}, {"text": "All company meetings are recorded and stored in the internal portal."} ] dataset = Dataset.from_list(documents) dataset.save_to_disk("custom_dataset") print("ā Sample knowledge base dataset saved in custom_dataset/")
2. Now run the script.
python3 create_dataset.py
If everything goes well, you should see:
ā Sample knowledge base dataset saved in custom_dataset/
Step 4 – Load and Test the Base DeepSeek Model
Before applying LoRA fine-tuning, it’s important to test that the base DeepSeek model loads correctly on your GPU. This also confirms that quantization works and inference runs without issues.
1. Create a new file.
nano load_model.py
Add the following code:
# load_model.py from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig model_name = "deepseek-ai/deepseek-llm-7b-base" # Use 4-bit quantization bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype="float16" ) tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, quantization_config=bnb_config, device_map="auto" ) # Test inference inputs = tokenizer("Hello DeepSeek!", return_tensors="pt").to("cuda") outputs = model.generate(**inputs, max_length=30) print(tokenizer.decode(outputs[0], skip_special_tokens=True))
2. Now run the script.
python3 load_model.py
If you hit any errors here, its likely you do not have enough VRAM on the GPU. Upgrade your instance.
If successful, you’ll see an output similar to this (it will vary slightly depending on the random generation):
Hello DeepSeek! How can I assist you today?
Step 5 – Configure LoRA for Fine-Tuning
Now that the base DeepSeek model is working, the next step is to configure LoRA (Low-Rank Adaptation). LoRA adds small trainable layers to the model, making fine-tuning possible on a single GPU without retraining the entire 7B parameter model.
1. Create a new file.
nano configure_lora.py
Add the following code:
# configure_lora.py from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig from peft import LoraConfig, get_peft_model model_name = "deepseek-ai/deepseek-llm-7b-base" # Use 4-bit quantization with CPU offload bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", # Normalized float4 bnb_4bit_compute_dtype="float16" ) # Load tokenizer tokenizer = AutoTokenizer.from_pretrained(model_name) # Load base model with quantization model = AutoModelForCausalLM.from_pretrained( model_name, quantization_config=bnb_config, device_map="auto" # automatically balances GPU/CPU ) # Configure LoRA lora_config = LoraConfig( r=16, lora_alpha=32, target_modules=["q_proj", "v_proj"], # attention layers lora_dropout=0.05, bias="none", task_type="CAUSAL_LM" ) # Attach LoRA adapters model = get_peft_model(model, lora_config) model.print_trainable_parameters()
2. Run the script.
python3 configure_lora.py
Step 6 – Train DeepSeek with LoRA
With LoRA configured, it’s time to fine-tune DeepSeek on your custom dataset. We’ll use the Hugging Face Trainer API to simplify training.
1. Create a new script.
nano train_lora.py
Add the following code.
# train_lora.py from transformers import TrainingArguments, Trainer, AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig from datasets import load_from_disk from peft import LoraConfig, get_peft_model # Load dataset dataset = load_from_disk("custom_dataset") # Model name model_name = "deepseek-ai/deepseek-llm-7b-base" # Quantization config (4-bit for 8 GB GPU) bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype="float16" ) # Load tokenizer tokenizer = AutoTokenizer.from_pretrained(model_name) # Load base model model = AutoModelForCausalLM.from_pretrained( model_name, quantization_config=bnb_config, device_map="auto" ) # Configure LoRA lora_config = LoraConfig( r=16, lora_alpha=32, target_modules=["q_proj", "v_proj"], # attention projections lora_dropout=0.05, bias="none", task_type="CAUSAL_LM" ) model = get_peft_model(model, lora_config) # Tokenization function (with labels for loss computation) def tokenize(batch): tokens = tokenizer( batch["text"], truncation=True, padding="max_length", max_length=256 ) tokens["labels"] = tokens["input_ids"].copy() return tokens dataset = dataset.map(tokenize, batched=True) # Training arguments training_args = TrainingArguments( output_dir="deepseek-lora", per_device_train_batch_size=1, # fit into 8GB GPU gradient_accumulation_steps=8, # simulate bigger batch num_train_epochs=3, learning_rate=2e-4, fp16=True, logging_steps=5, save_strategy="epoch" ) # Trainer trainer = Trainer( model=model, args=training_args, train_dataset=dataset, tokenizer=tokenizer ) trainer.train() # Save the final LoRA adapter to a clean path output_dir = "deepseek-lora-final" model.save_pretrained(output_dir) tokenizer.save_pretrained(output_dir) print(f"ā Final LoRA adapter saved to {output_dir}/")
2. Run the training script.
python3 train_lora.py
Step 7 – Evaluate Fine-Tuned Model
Now that the LoRA training is complete, let’s test the fine-tuned DeepSeek model and see if it answers based on the custom knowledge base.
1. Create a new file.
nano evaluate_model.py
Add the following code:
# evaluate_model.py from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig from peft import PeftModel model_name = "deepseek-ai/deepseek-llm-7b-base" # 4-bit quantization config bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype="float16" ) # Load tokenizer and base model base_model = AutoModelForCausalLM.from_pretrained( model_name, quantization_config=bnb_config, device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name) # Also load tokenizer from base model # Load fine-tuned LoRA adapters from the clean local path adapter_path = "deepseek-lora-final" model = PeftModel.from_pretrained(base_model, adapter_path) # Ask test questions questions = [ "What is ACME Corp's vacation policy?", "What are the official working hours?", "How do employees access the VPN?", "Who should I contact if I forget my password?" ] for q in questions: inputs = tokenizer(q, return_tensors="pt").to("cuda") outputs = model.generate(**inputs, max_length=100) print(f"\nQ: {q}\nA: {tokenizer.decode(outputs[0], skip_special_tokens=True)}")
2. Run the script.
python3 evaluate_model.py
If successful, you should see outputs similar to:
Loading checkpoint shards: 100%|āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā| 2/2 [00:11<00:00, 5.89s/it] Setting `pad_token_id` to `eos_token_id`:100001 for open-end generation. Q: What is ACME Corp's vacation policy? A: What is ACME Corp's vacation policy? # 100 Days of Code - Log ### Day 0: May 1, 2020 **Today's Progress**: Started the 100 Days of Code challenge. **Thoughts:** I'm excited to get started. ### Day 1: May 2, 2020 **Today's Progress**: Started the 1 Setting `pad_token_id` to `eos_token_id`:100001 for open-end generation. Q: What are the official working hours? A: What are the official working hours? The official working hours are from 8:00 am to 5:00 pm. What are the working days? The working days are Monday to Friday. What are the official holidays? The official holidays are New Year's Day, Labor Day, Independence Day, Eid Al-Fitr, Eid Al-Adha, and the UAE National Day. What are the official holidays? The official holidays Setting `pad_token_id` to `eos_token_id`:100001 for open-end generation. Q: How do employees access the VPN? A: How do employees access the VPN? Q: How to get the current user's IP address? I'm trying to get the current user's IP address. I've tried using the following code: $ip = $_SERVER['REMOTE_ADDR']; But it's not working. A: You can use $_SERVER['REMOTE_ADDR'] to get the IP address of the user. A: You can use $_SERVER Setting `pad_token_id` to `eos_token_id`:100001 for open-end generation. Q: Who should I contact if I forget my password? A: Who should I contact if I forget my password? # 1.0.0 * Initial release
Conclusion
In this tutorial, you learned how to fine-tune DeepSeek with a custom knowledge base using LoRA on Ubuntu 24.04 with GPU support. You started by preparing your environment, installing the required libraries, and creating a dataset that represents company-specific information. You then loaded the DeepSeek 7B base model with 4-bit quantization to save VRAM, configured LoRA adapters, and trained the model on your dataset.