How to Train DeepSeek with Custom Knowledge Base Using LoRA on Ubuntu 24.04 GPU

Table of Contents

Prerequisites
Step 1 - Install Prerequisites
Step 2 - Install Python Libraries
Step 3 - Create a Custom Dataset
Step 4 - Load and Test the Base DeepSeek Model
Step 5 - Configure LoRA for Fine-Tuning
Step 6 - Train DeepSeek with LoRA
Step 7 - Evaluate Fine-Tuned Model
Conclusion

Large Language Models (LLMs), such as DeepSeek, are powerful tools for question answering, chatbots, and intelligent assistants. However, by default, they are trained on general-purpose data and may not know about your company’s specific policies, procedures, or internal knowledge base.

This is where fine-tuning with LoRA (Low-Rank Adaptation) comes in. Instead of retraining the whole model, which requires massive GPUs and days of compute, you can adapt DeepSeek efficiently using LoRA. This technique adds lightweight trainable layers to the model, keeping the original weights frozen. As a result, you can customize DeepSeek on a single GPU (such as an NVIDIA RTX 3090/4090 or A100) with as little as 12 GB VRAM.

In this tutorial, you’ll learn how to train DeepSeek with a custom knowledge base using Lora on an Ubuntu 24.04 GPU server.

Prerequisites

An Ubuntu 24.04 server with an NVIDIA GPU.
A non-root user or a user with sudo privileges.
NVIDIA drivers are installed on your server.

Step 1 – Install Prerequisites

Before you begin training DeepSeek with a custom knowledge base, you need to prepare your Ubuntu 24.04 GPU server with the right dependencies.

Important. Make sure you are using a non-root user account with sudo privileges.

adduser username
usermod -aG sudo username
su - username

1. First, update your system and install the basic packages:

sudo apt update
sudo apt install git wget curl python3 python3-venv python3-pip build-essential -y

2. Next, create a Python virtual environment to keep this project isolated.

python3 -m venv deepseek-env
source deepseek-env/bin/activate

Step 2 – Install Python Libraries

With your virtual environment activated, the next step is to install the Python libraries required for training DeepSeek with LoRA. These include PyTorch (with CUDA support), Hugging Face Transformers, and additional optimization libraries.

1. First, upgrade pip to the latest version.

pip install --upgrade pip

2. Now install PyTorch with CUDA 12.4 support. This ensures GPU acceleration is enabled on your Ubuntu 24.04 server.

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

3. Next, install Hugging Face libraries and additional packages needed for LoRA fine-tuning.

pip install transformers datasets accelerate peft bitsandbytes sentencepiece

4. Once installed, you can check that PyTorch detects your GPU with.

python3 -c "import torch; print(torch.cuda.is_available())"

If everything is set up correctly, it should return:

True

Step 3 – Create a Custom Dataset

To fine-tune DeepSeek with LoRA, you need a knowledge base dataset that represents the information you want the model to learn. This dataset could be your company handbook, IT policies, FAQs, or any domain-specific text.

1. Let’s create a small dataset using Python. Open a new file:

nano create_dataset.py

Add the following code:

# create_dataset.py
from datasets import Dataset

# Example knowledge base (company handbook, IT policy, FAQs)
documents = [
    {"text": "Welcome to ACME Corp. Our mission is to provide high quality products."},
    {"text": "All employees must use two-factor authentication when logging into company systems."},
    {"text": "Working hours are from 9 AM to 6 PM, Monday through Friday."},
    {"text": "If you forget your password, contact IT support via [email protected]."},
    {"text": "Our company handbook states that teamwork and innovation are core values."},
    {"text": "Employees are entitled to 20 days of paid vacation annually."},
    {"text": "To access the VPN, employees must install the Cisco AnyConnect client."},
    {"text": "For cybersecurity, do not share your credentials with anyone."},
    {"text": "Remote work is allowed up to 3 days per week."},
    {"text": "All company meetings are recorded and stored in the internal portal."}
]

dataset = Dataset.from_list(documents)
dataset.save_to_disk("custom_dataset")
print("✅ Sample knowledge base dataset saved in custom_dataset/")

2. Now run the script.

python3 create_dataset.py

If everything goes well, you should see:

✅ Sample knowledge base dataset saved in custom_dataset/

Step 4 – Load and Test the Base DeepSeek Model

Before applying LoRA fine-tuning, it’s important to test that the base DeepSeek model loads correctly on your GPU. This also confirms that quantization works and inference runs without issues.

1. Create a new file.

nano load_model.py

Add the following code:

# load_model.py
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

model_name = "deepseek-ai/deepseek-llm-7b-base"

# Use 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="float16"
)

tokenizer = AutoTokenizer.from_pretrained(model_name)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto"
)

# Test inference
inputs = tokenizer("Hello DeepSeek!", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_length=30)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

2. Now run the script.

python3 load_model.py

If you hit any errors here, its likely you do not have enough VRAM on the GPU. Upgrade your instance.

If successful, you’ll see an output similar to this (it will vary slightly depending on the random generation):

Hello DeepSeek! How can I assist you today?

Step 5 – Configure LoRA for Fine-Tuning

Now that the base DeepSeek model is working, the next step is to configure LoRA (Low-Rank Adaptation). LoRA adds small trainable layers to the model, making fine-tuning possible on a single GPU without retraining the entire 7B parameter model.

1. Create a new file.

nano configure_lora.py

Add the following code:

# configure_lora.py
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model

model_name = "deepseek-ai/deepseek-llm-7b-base"

# Use 4-bit quantization with CPU offload
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",   # Normalized float4
    bnb_4bit_compute_dtype="float16"
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Load base model with quantization
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto"  # automatically balances GPU/CPU
)

# Configure LoRA
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],  # attention layers
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

# Attach LoRA adapters
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

2. Run the script.

python3 configure_lora.py

Step 6 – Train DeepSeek with LoRA

With LoRA configured, it’s time to fine-tune DeepSeek on your custom dataset. We’ll use the Hugging Face Trainer API to simplify training.

1. Create a new script.

nano train_lora.py

Add the following code.

# train_lora.py
from transformers import TrainingArguments, Trainer, AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from datasets import load_from_disk
from peft import LoraConfig, get_peft_model

# Load dataset
dataset = load_from_disk("custom_dataset")

# Model name
model_name = "deepseek-ai/deepseek-llm-7b-base"

# Quantization config (4-bit for 8 GB GPU)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="float16"
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto"
)

# Configure LoRA
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],  # attention projections
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, lora_config)

# Tokenization function (with labels for loss computation)
def tokenize(batch):
    tokens = tokenizer(
        batch["text"],
        truncation=True,
        padding="max_length",
        max_length=256
    )
    tokens["labels"] = tokens["input_ids"].copy()
    return tokens

dataset = dataset.map(tokenize, batched=True)

# Training arguments
training_args = TrainingArguments(
    output_dir="deepseek-lora",
    per_device_train_batch_size=1,       # fit into 8GB GPU
    gradient_accumulation_steps=8,       # simulate bigger batch
    num_train_epochs=3,
    learning_rate=2e-4,
    fp16=True,
    logging_steps=5,
    save_strategy="epoch"
)

# Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
    tokenizer=tokenizer
)

trainer.train()

# Save the final LoRA adapter to a clean path 
output_dir = "deepseek-lora-final" 
model.save_pretrained(output_dir) 
tokenizer.save_pretrained(output_dir) 
print(f"✅ Final LoRA adapter saved to {output_dir}/")

2. Run the training script.

python3 train_lora.py

Step 7 – Evaluate Fine-Tuned Model

Now that the LoRA training is complete, let’s test the fine-tuned DeepSeek model and see if it answers based on the custom knowledge base.

1. Create a new file.

nano evaluate_model.py

Add the following code:

# evaluate_model.py
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel

model_name = "deepseek-ai/deepseek-llm-7b-base"

# 4-bit quantization config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="float16"
)

# Load tokenizer and base model 
base_model = AutoModelForCausalLM.from_pretrained( 
    model_name, 
    quantization_config=bnb_config, 
    device_map="auto" 
) 
tokenizer = AutoTokenizer.from_pretrained(model_name) # Also load tokenizer from base model 
# Load fine-tuned LoRA adapters from the clean local path 
adapter_path = "deepseek-lora-final" 
model = PeftModel.from_pretrained(base_model, adapter_path)

# Ask test questions
questions = [
    "What is ACME Corp's vacation policy?",
    "What are the official working hours?",
    "How do employees access the VPN?",
    "Who should I contact if I forget my password?"
]

for q in questions:
    inputs = tokenizer(q, return_tensors="pt").to("cuda")
    outputs = model.generate(**inputs, max_length=100)
    print(f"\nQ: {q}\nA: {tokenizer.decode(outputs[0], skip_special_tokens=True)}")

2. Run the script.

python3 evaluate_model.py

If successful, you should see outputs similar to:

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:11<00:00,  5.89s/it]
Setting `pad_token_id` to `eos_token_id`:100001 for open-end generation.

Q: What is ACME Corp's vacation policy?
A: What is ACME Corp's vacation policy?
# 100 Days of Code - Log

### Day 0: May 1, 2020

**Today's Progress**: Started the 100 Days of Code challenge.

**Thoughts:** I'm excited to get started.

### Day 1: May 2, 2020

**Today's Progress**: Started the 1
Setting `pad_token_id` to `eos_token_id`:100001 for open-end generation.

Q: What are the official working hours?
A: What are the official working hours?
The official working hours are from 8:00 am to 5:00 pm.
What are the working days?
The working days are Monday to Friday.
What are the official holidays?
The official holidays are New Year's Day, Labor Day, Independence Day, Eid Al-Fitr, Eid Al-Adha, and the UAE National Day.
What are the official holidays?
The official holidays
Setting `pad_token_id` to `eos_token_id`:100001 for open-end generation.

Q: How do employees access the VPN?
A: How do employees access the VPN?
Q: How to get the current user's IP address? I'm trying to get the current user's IP address. I've tried using the following code:
$ip = $_SERVER['REMOTE_ADDR'];

But it's not working.

A: You can use $_SERVER['REMOTE_ADDR'] to get the IP address of the user.

A: You can use $_SERVER
Setting `pad_token_id` to `eos_token_id`:100001 for open-end generation.

Q: Who should I contact if I forget my password?
A: Who should I contact if I forget my password?
# 1.0.0

* Initial release

Conclusion

In this tutorial, you learned how to fine-tune DeepSeek with a custom knowledge base using LoRA on Ubuntu 24.04 with GPU support. You started by preparing your environment, installing the required libraries, and creating a dataset that represents company-specific information. You then loaded the DeepSeek 7B base model with 4-bit quantization to save VRAM, configured LoRA adapters, and trained the model on your dataset.

Facebook

Atlantic.Net Cloud GPU Hosting Massive Computing Power

Up in 60 Seconds!

Your subscription could not be saved. Please try again.

Your subscription has been successful.

Newsletter

Subscribe to our newsletter and stay updated.

Email Address

Provide your email address to subscribe. For e.g [email protected]

Your subscription could not be saved. Please try again.

Your subscription has been successful.

View White Papers

How to Train DeepSeek with Custom Knowledge Base Using LoRA on Ubuntu 24.04 GPU

Prerequisites

Step 1 – Install Prerequisites

Step 2 – Install Python Libraries

Step 3 – Create a Custom Dataset

Step 4 – Load and Test the Base DeepSeek Model

Step 5 – Configure LoRA for Fine-Tuning

Step 6 – Train DeepSeek with LoRA

Step 7 – Evaluate Fine-Tuned Model

Conclusion

Atlantic.Net Cloud GPU Hosting Massive Computing Power

Award-Winning Hosting Solutions & Services