With the increasing number of cyber threats, organizations need robust Network Intrusion Detection Systems (NIDS) to monitor and protect their networks. Traditional signature-based detection methods are often ineffective against zero-day attacks and sophisticated threats. Machine Learning (ML) offers a powerful alternative by learning patterns from network traffic data and detecting anomalies in real time.

This article demonstrates how to implement a machine learning-based Network Intrusion Detection System (NIDS) on an Ubuntu 24.04 GPU server using TensorFlow for enhanced performance.

Prerequisites

  • An Ubuntu 24.04 server with an NVIDIA GPU.
  • A non-root user with sudo privileges.
  • NVIDIA drivers installed.

Step 1: Setting Up the Environment

1. First, install the necessary Python packages.

apt install python3 python3-pip python3-venv -y

2. Next, create and activate a virtual environment to isolate our project dependencies.

python3 -m venv nids-env
source nids-env/bin/activate

3. Install the necessary machine learning and networking packages.

pip install tensorflow scikit-learn pandas numpy scapy matplotlib seaborn joblib tqdm

4. Verify that TensorFlow can access your GPU.

python3 -c "import tensorflow as tf; print(f'TensorFlow GPU: {tf.test.is_gpu_available()}')"

Output.

TensorFlow GPU: True

Note: If you get an error here, you can install the require NVIDIA drivers using:

python3 -m pip install 'tensorflow[and-cuda]'

Step 2: Preparing the Dataset

We will use the CIC-IDS-2017 dataset from Kaggle, which contains labeled network traffic with various types of attacks.

1. Create main project folders

mkdir -p ~/nids-project/{datasets,models,scripts,data}

2. Download Network Intrusion dataset(CIC-IDS- 2017) from the Kaggle.

3. Transfer the downloaded dataset to your server:

scp Downloads/archive.zip root@server-ip:/root/nids-project/datasets/

4. Unzip the dataset:

cd ~/nids-project/datasets
unzip archive.zip

Step 3: Data Preprocessing

1. Create a preprocessing script to clean and prepare the data.

cd ~/nids-project
nano scripts/preprocess.py

Add the following code:

import pandas as pd
import numpy as np
import os
from sklearn.preprocessing import MinMaxScaler, LabelEncoder
import joblib

# Configure paths
DATASET_DIR = os.path.expanduser('~/nids-project/datasets')
MODEL_DIR = os.path.expanduser('~/nids-project/models')
DATA_DIR = os.path.expanduser('~/nids-project/data')

def load_data():
    """Load and merge all dataset files with custom labels"""
    file_mapping = {
        'Monday': 'BENIGN',
        'Tuesday': 'BruteForce',
        'Wednesday': 'DoS',
        'Thursday-WorkingHours': 'WebAttack',
        'Thursday-Afternoon': 'Infiltration',
        'Friday-WorkingHours': 'DDoS',
        'Friday-Afternoon-PortScan': 'PortScan',
        'Friday-Afternoon': 'Botnet'
    }

    combined = pd.DataFrame()

    for pattern, label in file_mapping.items():
        for file in os.listdir(DATASET_DIR):
            if pattern in file and file.endswith('.csv'):
                file_path = os.path.join(DATASET_DIR, file)
                print(f"πŸ“‚ Loading {file} as '{label}'")
                df = pd.read_csv(file_path, low_memory=False)
                df['label'] = label
                combined = pd.concat([combined, df], ignore_index=True)

    return combined

def main():
    print("πŸš€ Loading and merging dataset files...")
    df = load_data()

    if 'label' not in df.columns:
        raise ValueError("❌ Label column not found in the dataset.")

    # Extract label column first
    labels = df['label']

    # Keep only numeric features
    df = df.select_dtypes(include=['number'])

    # Detect and report columns with inf/-inf
    inf_cols = df.columns[np.isinf(df).any()]
    if len(inf_cols) > 0:
        print(f"⚠️ Columns with inf/-inf values: {inf_cols.tolist()}")

    # Replace inf/-inf with NaN and fill NaNs with 0
    df.replace([np.inf, -np.inf], np.nan, inplace=True)
    df.fillna(0, inplace=True)

    # Final safety check
    if not np.isfinite(df.values).all():
        raise ValueError("❌ Data still contains non-finite values after cleaning.")

    # Prepare features and encoded labels
    X = df
    label_encoder = LabelEncoder()
    y = label_encoder.fit_transform(labels)

    # Create directories
    os.makedirs(MODEL_DIR, exist_ok=True)
    os.makedirs(DATA_DIR, exist_ok=True)

    # Fit and save MinMaxScaler
    scaler = MinMaxScaler()
    scaler.fit(X)
    joblib.dump(scaler, os.path.join(MODEL_DIR, 'scaler.pkl'))

    # Save processed data and label encoder
    joblib.dump((X, y), os.path.join(DATA_DIR, 'processed.pkl'))
    joblib.dump(label_encoder, os.path.join(MODEL_DIR, 'label_encoder.pkl'))

    print(f"βœ… Preprocessing complete. Data saved to:\n- {DATA_DIR}/processed.pkl\n- {MODEL_DIR}/scaler.pkl\n- {MODEL_DIR}/label_encoder.pkl")

if __name__ == "__main__":
    main()

2. Run the preprocessing script.

python3 scripts/preprocess.py

This script loads the dataset, cleans it, normalizes the features, and saves the processed data for training.

πŸš€ Loading and merging dataset files...
πŸ“‚ Loading Monday-WorkingHours.pcap_ISCX.csv as 'BENIGN'
πŸ“‚ Loading Tuesday-WorkingHours.pcap_ISCX.csv as 'BruteForce'
πŸ“‚ Loading Wednesday-workingHours.pcap_ISCX.csv as 'DoS'
πŸ“‚ Loading Thursday-WorkingHours-Afternoon-Infilteration.pcap_ISCX.csv as 'WebAttack'
πŸ“‚ Loading Thursday-WorkingHours-Morning-WebAttacks.pcap_ISCX.csv as 'WebAttack'
πŸ“‚ Loading Friday-WorkingHours-Afternoon-PortScan.pcap_ISCX.csv as 'DDoS'
πŸ“‚ Loading Friday-WorkingHours-Morning.pcap_ISCX.csv as 'DDoS'
πŸ“‚ Loading Friday-WorkingHours-Afternoon-DDos.pcap_ISCX.csv as 'DDoS'
⚠️ Columns with inf/-inf values: ['Flow Bytes/s', ' Flow Packets/s']
βœ… Preprocessing complete. Data saved to:
- /root/nids-project/data/processed.pkl
- /root/nids-project/models/scaler.pkl
- /root/nids-project/models/label_encoder.pkl

Step 4: Model Training

1. Create a training script to build our intrusion detection model.

nano scripts/train.py

Add the following code.

import os
import joblib
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

# Configure paths
DATA_DIR = os.path.expanduser('~/nids-project/data')
MODEL_DIR = os.path.expanduser('~/nids-project/models')

# Load processed data
X, y = joblib.load(os.path.join(DATA_DIR, 'processed.pkl'))

# Load and apply the scaler
scaler = joblib.load(os.path.join(MODEL_DIR, 'scaler.pkl'))
X_scaled = scaler.transform(X)

# Build and compile the model
model = Sequential([
    Dense(256, activation='relu', input_shape=(X_scaled.shape[1],)),
    Dropout(0.3),
    Dense(128, activation='relu'),
    Dense(len(np.unique(y)), activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(X_scaled, y, epochs=20, validation_split=0.2)

# Save the model
model.save(os.path.join(MODEL_DIR, 'nids_model.h5'))

print(f"βœ… Model training complete and saved to {MODEL_DIR}/nids_model.h5")

2. Run the training script.

python3 scripts/train.py

This builds and trains a neural network model to classify network traffic, leveraging GPU acceleration for faster training.

Step 5: Real-time Detection

1. Create a detection script to monitor network traffic.

nano scripts/detect.py

Add the following code.

import os
import time
import numpy as np
import pandas as pd
from scapy.all import sniff, IP, TCP
from collections import deque, defaultdict
import joblib
import tensorflow as tf

# Paths
MODEL_DIR = os.path.expanduser('~/nids-project/models')

# Load model and preprocessing artifacts
model = tf.keras.models.load_model(os.path.join(MODEL_DIR, 'nids_model.h5'))
scaler = joblib.load(os.path.join(MODEL_DIR, 'scaler.pkl'))
label_encoder = joblib.load(os.path.join(MODEL_DIR, 'label_encoder.pkl'))

# Track packet rate for flood detection
packet_log = deque(maxlen=2000)

# Track port hits for each source IP
port_tracker = defaultdict(set)

# Extract features with expected column names
def extract_features(packet):
    if IP in packet:
        ip_layer = packet[IP]
        features = {
            'ip_len': ip_layer.len,
            'ip_ttl': ip_layer.ttl,
            'payload_len': len(ip_layer.payload)
        }

        if TCP in packet:
            tcp_layer = packet[TCP]
            features['tcp_flags'] = int(tcp_layer.flags)
            features['src_port'] = tcp_layer.sport
            features['dst_port'] = tcp_layer.dport
            features['proto'] = 6  # TCP protocol number

        # Pad missing features
        expected_columns = scaler.feature_names_in_
        df = pd.DataFrame([features])
        for col in expected_columns:
            if col not in df.columns:
                df[col] = 0
        return df[expected_columns]
    return None

# Main detection function
def process_packet(packet):
    global packet_log, port_tracker
    now = time.time()

    # Track packet rate for flood detection
    packet_log.append(now)
    recent_packets = [t for t in packet_log if now - t < 2] packet_rate = len(recent_packets) if packet_rate > 50:
        print(f"🚨 ALERT: High packet rate detected! ({packet_rate} packets/2s) [Potential Flood]")

    # Track destination ports per source IP for port scan detection
    if IP in packet and TCP in packet:
        src_ip = packet[IP].src
        dst_port = packet[TCP].dport
        port_tracker[src_ip].add(dst_port)

        if len(port_tracker[src_ip]) > 100:
            print(f"🚨 ALERT: Possible port scan from {src_ip} β€” hit ports: {sorted(port_tracker[src_ip])}")

    # ML-based detection
    features_df = extract_features(packet)
    if features_df is not None:
        scaled = scaler.transform(features_df)
        pred = model.predict(scaled, verbose=0)
        confidence = np.max(pred)
        pred_index = np.argmax(pred)
        label = label_encoder.inverse_transform([pred_index])[0]

        if label != 'BENIGN' and confidence > 0.9:
            print(f"🚨 ALERT: Intrusion detected - {label} (Confidence: {confidence:.2f})")
        else:
            print(f"βœ… Normal traffic - {label} (Confidence: {confidence:.2f})")

# Start sniffing
if __name__ == "__main__":
    print("πŸ›‘οΈ  Starting NIDS with ML, Flood, and Port Scan Detection...")
    sniff(iface="lo", prn=process_packet, store=False)

2. Run the detection system.

python3 scripts/detect.py

This script monitors network traffic in real-time, utilizing both rule-based detection (for flood attacks and port scans) and our trained machine learning model to identify potential intrusions.

3. Open another terminal and run a port scan.

nmap -sS -p 1-1000 localhost

4. Go back to the first terminal, you will see the detected attack in the following output:

βœ… Normal traffic - BruteForce (Confidence: 0.25)
🚨 ALERT: Possible port scan from 127.0.0.1 β€” hit ports: [21, 22, 23, 25, 53, 80, 110, 111, 113, 139, 143, 199, 256, 443, 445, 554, 587, 993, 995, 63630]

Conclusion

This implementation demonstrates how to build a comprehensive Network Intrusion Detection System (NIDS) using machine learning on Ubuntu 24.04 with GPU acceleration. The system combines traditional rule-based detection with machine learning (ML) classification for improved accuracy. The GPU acceleration significantly speeds up both training and inference, making it practical for real-world deployment.