Anomalies are data points that deviate significantly from the majority of observations. Detecting them early can prevent fraud, identify cyberattacks, and flag unusual system behavior before it causes damage. This process is known as anomaly detection, and it plays a key role in fields such as banking (credit card fraud), cybersecurity (intrusion detection), and healthcare (disease pattern discovery).
Traditionally, anomaly detection methods run on CPUs, but with today’s growing datasets, CPU-based processing can become slow and inefficient. That’s where GPU acceleration comes in. Using NVIDIA’s RAPIDS cuML library, we can leverage the power of GPUs to speed up anomaly detection tasks dramatically. Instead of waiting minutes or hours, you can analyze massive datasets in seconds.
In this tutorial, you’ll learn how to perform anomaly detection on an Ubuntu 24.04 GPU server using Python and RAPIDS cuML.
Prerequisites
- An Ubuntu 24.04 server with an NVIDIA GPU.
- A non-root user or a user with sudo privileges.
- NVIDIA drivers are installed on your server.
Step 1 – Setting Up Python Environment
Before we start coding anomaly detection, we need to prepare a proper Python environment on our Ubuntu 24.04 GPU server.
1. Install Python tools.
apt install -y python3 python3-pip python3-venv python3-pip
2. Create a project directory.
mkdir anomaly-detection && cd anomaly-detection
3. Create and activate a virtual environment.
python3 -m venv .venv source .venv/bin/activate
4. Upgrade pip and install dependencies.
pip install --upgrade pip pip install numpy pandas matplotlib seaborn joblib scikit-learn
5. Finally, install cuML (RAPIDS library for GPU-accelerated ML).
pip install cuml-cu12 --extra-index-url=https://pypi.nvidia.com
Step 2 – Building a Synthetic Anomaly Detection Example
To understand how anomaly detection works with GPU acceleration, we’ll first create a synthetic dataset. This allows us to clearly visualize the difference between normal data points and anomalies before applying the method to a real-world dataset.
1. Create a Python script.
nano anomaly_detection_gpu.py
Add the following code.
# anomaly_detection_gpu.py import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from cuml.cluster import DBSCAN # GPU-based DBSCAN import joblib # ------------------------------ # Step 1: Generate synthetic data # ------------------------------ rng = np.random.RandomState(42) # Normal points X = 0.3 * rng.randn(200, 2) X = np.r_[X + 2, X - 2] # Outliers outliers = rng.uniform(low=-6, high=6, size=(20, 2)) X = np.r_[X, outliers] df = pd.DataFrame(X, columns=["feature1", "feature2"]) # ------------------------------ # Step 2: Train DBSCAN on GPU # ------------------------------ # eps = neighborhood size, min_samples = density threshold db = DBSCAN(eps=0.5, min_samples=5) labels = db.fit_predict(df[["feature1", "feature2"]].values) # Anomalies = label -1 df["anomaly"] = (labels == -1).astype(int) print("Anomaly counts:") print(df["anomaly"].value_counts()) # ------------------------------ # Step 3: Visualization # ------------------------------ plt.figure(figsize=(8, 6)) sns.scatterplot( x="feature1", y="feature2", hue="anomaly", style="anomaly", palette={0: "blue", 1: "red"}, data=df ) plt.title("GPU Accelerated Anomaly Detection with cuML DBSCAN") plt.savefig("anomaly_results.png") plt.show() # ------------------------------ # Step 4: Save model # ------------------------------ # Save labels only (DBSCAN in cuML doesn’t support pickling yet) joblib.dump(labels, "dbscan_labels.pkl") print("Labels saved as dbscan_labels.pkl")
2. Run the script.
python3 anomaly_detection_gpu.py
You should see output similar to:
Anomaly counts: anomaly 0 402 1 18 Name: count, dtype: int64 Labels saved as dbscan_labels.pkl
This indicates that the model identified 402 normal points and 18 anomalies.
3. The script saves a visualization as anomaly_results.png. Open it and you’ll see:
Blue points – Normal data
Red points – Detected anomalies
At this point, you have a working GPU-accelerated anomaly detection pipeline on synthetic data. Next, we’ll apply the same workflow to a real-world dataset: credit card fraud detection.
Step 3 – Applying to Real-World Dataset (Credit Card Fraud)
After testing on synthetic data, let’s scale up to a real-world dataset: the Credit Card Fraud Detection dataset. This dataset contains 284,807 transactions with 29 features, making it an excellent benchmark for anomaly detection.
1. Open a new Python file.
nano anomaly_creditcard_gpu.py
Add the following code.
import pandas as pd from sklearn.datasets import fetch_openml from cuml.cluster import DBSCAN import joblib # ------------------------------ # Step 1: Load Credit Card Fraud dataset # ------------------------------ print("Downloading dataset...") data = fetch_openml(name="creditcard", version=1, as_frame=True) df = data.data print("Dataset shape:", df.shape) # ------------------------------ # Step 2: Train DBSCAN on GPU # ------------------------------ db = DBSCAN(eps=3.0, min_samples=10) # tune eps for better results labels = db.fit_predict(df.values) # Anomalies are labeled as -1 df["anomaly"] = (labels == -1).astype(int) print("Anomaly counts:") print(df["anomaly"].value_counts()) # ------------------------------ # Step 3: Save results # ------------------------------ joblib.dump(labels, "creditcard_dbscan_labels.pkl") df[["anomaly"]].to_csv("creditcard_anomalies.csv", index=False) print("Labels saved as creditcard_dbscan_labels.pkl") print("Anomaly flags saved to creditcard_anomalies.csv")
2. Run the script.
python3 anomaly_creditcard_gpu.py
You will see the following output.
Downloading dataset... Dataset shape: (284807, 29) Anomaly counts: anomaly 0 218091 1 66716 Name: count, dtype: int64 Labels saved as creditcard_dbscan_labels.pkl Anomaly flags saved to creditcard_anomalies.csv
Explanation:
- Dataset shape confirms the dataset size (284,807 rows × 29 features).
- Anomaly counts show how many transactions were flagged as suspicious (label 1) vs normal (label 0).
- creditcard_dbscan_labels.pkl – cluster labels for all transactions.
- creditcard_anomalies.csv – CSV file with anomaly flags (useful for further analysis in Excel, Pandas, or BI tools).
Conclusion
In this tutorial, you learned how to set up an Ubuntu 24.04 GPU server for anomaly detection with Python, build a synthetic dataset to test the workflow, and then scale the approach to a real-world credit card fraud dataset. Using RAPIDS cuML’s DBSCAN, we took advantage of GPU acceleration to handle clustering and anomaly detection much faster than traditional CPU-based methods.