Object detection is one of the most powerful applications of computer vision, and YOLOv5 (You Only Look Once, version 5) has become one of the most popular frameworks to accomplish this. It is fast, lightweight, and accurate, making it suitable for both research and real-world deployment.

In this tutorial, you’ll learn how to train YOLOv5 on a custom dataset using an Ubuntu 24.04 GPU server.

Prerequisites

  • An Ubuntu 24.04 server with an NVIDIA GPU.
  • A non-root user or a user with sudo privileges.
  • NVIDIA drivers are installed on your server.

Step 1 – Install Dependencies

Before training YOLOv5, you need to prepare the environment by installing Python, Git, and other required tools. We’ll also create a virtual environment so that your YOLOv5 setup remains clean and isolated from system packages.

1. Install Python 3, pip, virtual environment support, Git, and unzip.

apt install python3 python3-pip python3-venv git unzip -y

2. It’s a good practice to work in a virtual environment. This avoids conflicts with system-wide libraries.

python3 -m venv yolov5-env
source yolov5-env/bin/activate

After activation, you should see (yolov5-env) at the start of your terminal prompt.

3. Inside the virtual environment, upgrade pip to the latest version.

pip install --upgrade pip

Step 2 – Download and Set Up YOLOv5

With the dependencies ready, the next step is to download the official YOLOv5 repository and install its requirements.

1. Use git to clone the YOLOv5 source code.

git clone https://github.com/ultralytics/yolov5.git

2. Navigate to the downloaded directory.

cd yolov5

3. Inside the YOLOv5 folder, install all required Python packages.

pip install -r requirements.txt

This command installs PyTorch, OpenCV, and other libraries needed for training and inference. Since you’re running on a GPU server, make sure PyTorch installs with CUDA support.

Step 3 – Download the Road Sign Dataset from Kaggle

Now that YOLOv5 is installed, we need a dataset to train our model. For this tutorial, we’ll use the Road Sign Detection dataset from Kaggle, which contains 877 images with annotated bounding boxes in Pascal VOC XML format.

1. Install the Kaggle CLI tool.

pip install kaggle

2. Go to your Kaggle account → Account Settings → Create New API Token.

3. This will download a kaggle.json file (your API credentials). Transfer this JSON file to your server.

4. Move this file to the proper location and secure it with permissions

mkdir -p ~/.kaggle
cp kaggle.json ~/.kaggle/
chmod 600 ~/.kaggle/kaggle.json

5. Make a dedicated folder to store datasets.

mkdir datasets

6. Now download the dataset using Kaggle CLI.

kaggle datasets download -d andrewmvd/road-sign-detection

7. Unzip the downloaded dataset to the datasets directory.

unzip road-sign-detection.zip -d datasets/roadsign

8. After extracting, your dataset should look like this:

datasets/roadsign/
 ├── annotations/   # Pascal VOC XML annotations
 └── images/        # 877 images

Step 4 – Convert Pascal VOC Annotations to YOLO Format

The Road Sign dataset annotations are provided in Pascal VOC XML format, but YOLOv5 requires annotations in YOLO text format (.txt files). Each .txt file corresponds to one image and contains the class ID and normalized bounding box coordinates.

1. Create a new conversion script.

nano convert_voc_to_yolo.py

Paste the following code:

import os
import xml.etree.ElementTree as ET

# Classes in dataset
classes = ["trafficlight", "stop", "speedlimit", "crosswalk"]

input_dir = "datasets/roadsign/annotations"
output_dir = "datasets/roadsign/labels"
os.makedirs(output_dir, exist_ok=True)

def convert_bbox(size, box):
    dw, dh = 1.0/size[0], 1.0/size[1]
    x = (box[0] + box[1]) / 2.0
    y = (box[2] + box[3]) / 2.0
    w = box[1] - box[0]
    h = box[3] - box[2]
    return (x*dw, y*dh, w*dw, h*dh)

for file in os.listdir(input_dir):
    if not file.endswith(".xml"):
        continue
    in_file = open(os.path.join(input_dir, file))
    tree = ET.parse(in_file)
    root = tree.getroot()
    
    size = root.find("size")
    w, h = int(size.find("width").text), int(size.find("height").text)
    
    out_file = open(os.path.join(output_dir, file.replace(".xml", ".txt")), "w")
    
    for obj in root.iter("object"):
        cls = obj.find("name").text
        if cls not in classes:
            continue
        cls_id = classes.index(cls)
        xmlbox = obj.find("bndbox")
        b = (float(xmlbox.find("xmin").text), float(xmlbox.find("xmax").text),
             float(xmlbox.find("ymin").text), float(xmlbox.find("ymax").text))
        bb = convert_bbox((w,h), b)
        out_file.write(f"{cls_id} {' '.join([str(a) for a in bb])}\n")

2. Execute the script to generate YOLO-format labels.

python3 convert_voc_to_yolo.py

Now you’ll have datasets/roadsign/labels/ with .txt files.

Step 5 – Split the Dataset into Training and Validation

To train a YOLOv5 model effectively, you need to split your dataset into training (80%) and validation (20%) sets. Training images will be used to learn, while validation images will help evaluate model performance during training.

1. Make subdirectories for training and validation images and labels.

mkdir -p datasets/roadsign/images/train datasets/roadsign/images/val
mkdir -p datasets/roadsign/labels/train datasets/roadsign/labels/val

2. Create a split script.

nano split_dataset.py

Add the following code.

import os, random, shutil

image_dir = "datasets/roadsign/images"
label_dir = "datasets/roadsign/labels"

train_img = "datasets/roadsign/images/train"
val_img = "datasets/roadsign/images/val"
train_lbl = "datasets/roadsign/labels/train"
val_lbl = "datasets/roadsign/labels/val"

files = [f for f in os.listdir(image_dir) if f.endswith(".png")]
random.shuffle(files)

split = int(0.8 * len(files))
train_files, val_files = files[:split], files[split:]

def move_files(file_list, dest_img, dest_lbl):
    for f in file_list:
        base = os.path.splitext(f)[0]
        shutil.copy(os.path.join(image_dir, f), os.path.join(dest_img, f))
        shutil.copy(os.path.join(label_dir, base + ".txt"), os.path.join(dest_lbl, base + ".txt"))

move_files(train_files, train_img, train_lbl)
move_files(val_files, val_img, val_lbl)

3. Run the script.

python3 split_dataset.py

This will automatically shuffle and copy images with their corresponding label files into the correct folders.

4. After splitting, your dataset should look like this:

datasets/roadsign/
 ├── images/
 │    ├── train/
 │    └── val/
 ├── labels/
 │    ├── train/
 │    └── val/
 └── roadsign.yaml

Step 6 – Create Dataset Configuration File

YOLOv5 requires a YAML configuration file that tells the training script where to find images, how many classes exist, and what the class names are. Without this, YOLOv5 won’t know how to load your dataset.

Create a dataset configuration file.

nano datasets/roadsign/roadsign.yaml

Add the following lines.

train: datasets/roadsign/images/train
val: datasets/roadsign/images/val

nc: 4
names: ['trafficlight', 'stop', 'speedlimit', 'crosswalk']

Note: Make sure the file paths (datasets/roadsign/images/train and datasets/roadsign/images/val) are correct relative to the YOLOv5 root folder. If you placed your dataset in a different location, adjust the paths accordingly.

Step 7 – Train YOLOv5 Model

Now that the dataset is prepared and the configuration file is ready, you can train YOLOv5 on the road sign dataset.

Start training on datasets.

python3 train.py --img 640 --batch 16 --epochs 100 \
--data datasets/roadsign/roadsign.yaml --weights yolov5s.pt --device 0

Output.

Validating runs/train/exp2/weights/best.pt...
Fusing layers... 
Model summary: 157 layers, 7020913 parameters, 0 gradients, 15.8 GFLOPs
                 Class     Images  Instances          P          R      mAP50   mAP50-95: 100%|██████████| 6/6 [00:01<00:00,  3.32it/s]
                   all        176        275      0.936      0.916      0.936      0.791
          trafficlight        176         46      0.897      0.804      0.845      0.598
                  stop        176         21      0.953      0.976      0.986      0.915
            speedlimit        176        160       0.98          1      0.995       0.91
             crosswalk        176         48      0.914      0.883      0.916      0.741
Results saved to runs/train/exp2

Step 8 – Validate the Model

Once training is complete, you should validate the model to check how well it performs on unseen data. YOLOv5 provides a built-in validation script that uses your best trained weights.

Now, run the validation.

python3 val.py --weights runs/train/exp2/weights/best.pt --data datasets/roadsign/roadsign.yaml --img 640

Output.

val: data=datasets/roadsign/roadsign.yaml, weights=['runs/train/exp2/weights/best.pt'], batch_size=32, imgsz=640, conf_thres=0.001, iou_thres=0.6, max_det=300, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=False, project=runs/val, name=exp, exist_ok=False, half=False, dnn=False
YOLOv5 🚀 v7.0-430-g459d8bf0 Python-3.12.3 torch-2.8.0+cu128 CUDA:0 (NVIDIA A40-8Q, 8005MiB)

Fusing layers... 
Model summary: 157 layers, 7020913 parameters, 0 gradients, 15.8 GFLOPs
val: Scanning /root/yolov5/datasets/roadsign/labels/val.cache... 176 images, 0 backgrounds, 0 corrupt: 100%|██████████| 176/176 [00:00
R (Recall) - How many actual objects were detected.
mAP50 - Mean Average Precision at IoU threshold 0.5 (higher is better).
mAP50-95 - Average precision across IoU thresholds (stricter measure).

Step 9 – Run Inference on Road Signs

After training and validation, it’s time to see the model in action. YOLOv5 provides a detect.py script to run inference on images, videos, or even live camera streams.

Use your trained weights to detect objects in the validation set.

python3 detect.py --weights runs/train/exp2/weights/best.pt \
--source datasets/roadsign/images/val --img 640

Output.

Results saved in runs/detect/exp2/.

Step 10 – Visualize Inference Results with Matplotlib

Running YOLOv5 with detect.py saves results to disk, but sometimes you may want to display predictions directly in Python for analysis or integration into an application. We can do this with Matplotlib and OpenCV.

1. Create an inference script.

nano inference_plot.py

Paste the following code:

import torch
import matplotlib.pyplot as plt
import cv2

# Load YOLOv5 model
model = torch.hub.load('ultralytics/yolov5', 'custom', 
                       path='runs/train/exp2/weights/best.pt', force_reload=True)

# Run inference on an image
img_path = 'datasets/roadsign/images/val/road107.png'  # change to any test image
results = model(img_path)

# Results summary
results.print()   # class, confidence, bbox
results.save()    # saves to runs/detect/exp

# Display inside Python using matplotlib
img = results.render()[0]  # render() returns list of images
plt.figure(figsize=(10,10))
plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
plt.axis("off")
plt.show()

2. Run the script.

python3 inference_plot.py

Output.

Downloading: "https://github.com/ultralytics/yolov5/zipball/master" to /root/.cache/torch/hub/master.zip
YOLOv5 🚀 2025-9-11 Python-3.12.3 torch-2.8.0+cu128 CUDA:0 (NVIDIA A40-8Q, 8005MiB)

Fusing layers... 
Model summary: 157 layers, 7020913 parameters, 0 gradients, 15.8 GFLOPs
Adding AutoShape... 
/root/.cache/torch/hub/ultralytics_yolov5_master/models/common.py:906: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with amp.autocast(autocast):
image 1/1: 400x320 1 speedlimit
Speed: 22.3ms pre-process, 83.1ms inference, 119.0ms NMS per image at shape (1, 3, 640, 512)
Saved 1 image to runs/detect/exp2

Open the generated image at /root/yolov5/runs/detect/exp2/road107.jpg. You’ll see your road sign image with bounding boxes like:

🟥 trafficlight
🟦 stop
🟨 speedlimit
🟩 crosswalk
Each box has a confidence score.

You’ve now visualized YOLOv5 predictions directly inside Python. This method is useful for debugging, generating plots for reports, or integrating YOLOv5 inference into larger applications.

Conclusion

In this tutorial, you learned how to train YOLOv5 on a custom dataset using an Ubuntu 24.04 GPU server. Starting from environment setup, dataset preparation, annotation conversion, and splitting train/validation sets, you successfully trained, validated, and tested a YOLOv5 model on the Road Sign Detection dataset.

You now have a working YOLOv5 object detection pipeline. This workflow is flexible, you can replace the dataset with your own images and annotations to train YOLOv5 on any custom problem.