Neural networks are utilized in various intelligent technologies, including face recognition, chatbots, and recommendation systems. Most people use tools like TensorFlow or PyTorch to build them, but these tools often hide the underlying mechanisms that make them work. To truly understand how a neural network learns, it’s best to build one from scratch.
In this guide, we’ll show you how to create a basic neural network using CuPy, a Python library that runs on your GPU. You’ll learn how the network makes predictions, improves itself, and trains faster using GPU power on an Ubuntu server.
Prerequisites
- An Ubuntu 24.04 server with an NVIDIA GPU.
- A non-root user with sudo privileges.
- NVIDIA drivers installed.
- CUDA Toolkit 12.x installation needed
Step 1: Set up the Environment
First, let’s set up our Python environment. We’ll create a virtual environment to keep our dependencies isolated.
1. Install Python and pip if not already present.
apt install python3 python3-pip python3-venv -y
2. Create and activate a virtual environment.
python3 -m venv nn-env
source nn-env/bin/activate
3. Install required packages.
pip install cupy-cuda12x matplotlib
The virtual environment ensures that our project dependencies do not conflict with system-wide Python packages. The cupy-cuda12x package is specifically optimized for CUDA 12.x.
Step 2: Neural Network Implementation
1. Create a new file, nn.py, with our neural network class. This implementation handles both forward and backward propagation.
nano nn.py
Add the below code:
import cupy as cp
class SimpleNeuralNetwork:
def __init__(self, input_size, hidden_size, output_size):
cp.random.seed(42)
self.wh = cp.random.uniform(size=(input_size, hidden_size))
self.bh = cp.zeros((1, hidden_size))
self.wo = cp.random.uniform(size=(hidden_size, output_size))
self.bo = cp.zeros((1, output_size))
def sigmoid(self, x):
return 1 / (1 + cp.exp(-x))
def sigmoid_derivative(self, x):
return x * (1 - x)
def forward(self, X):
self.hidden_input = cp.dot(X, self.wh) + self.bh
self.hidden_output = self.sigmoid(self.hidden_input)
self.final_input = cp.dot(self.hidden_output, self.wo) + self.bo
self.output = self.sigmoid(self.final_input)
return self.output
def backward(self, X, y, learning_rate):
error = y - self.output
d_output = error * self.sigmoid_derivative(self.output)
error_hidden = cp.dot(d_output, self.wo.T)
d_hidden = error_hidden * self.sigmoid_derivative(self.hidden_output)
self.wo += cp.dot(self.hidden_output.T, d_output) * learning_rate
self.bo += cp.sum(d_output, axis=0, keepdims=True) * learning_rate
self.wh += cp.dot(X.T, d_hidden) * learning_rate
self.bh += cp.sum(d_hidden, axis=0, keepdims=True) * learning_rate
return cp.mean(cp.square(error))
def save_model(self, path="model_weights"):
import os
os.makedirs(path, exist_ok=True)
cp.save(f"{path}/wh.npy", self.wh)
cp.save(f"{path}/bh.npy", self.bh)
cp.save(f"{path}/wo.npy", self.wo)
cp.save(f"{path}/bo.npy", self.bo)
def load_model(self, path="model_weights"):
self.wh = cp.load(f"{path}/wh.npy")
self.bh = cp.load(f"{path}/bh.npy")
self.wo = cp.load(f"{path}/wo.npy")
self.bo = cp.load(f"{path}/bo.npy")
The class includes methods for forward propagation, backpropagation, and model persistence. The sigmoid activation function provides non-linearity to our network.
2. Run the script.
python3 nn.py
Step 3: Training the Network
1. Create train.py to train our network on the XOR problem. This classic problem helps validate our implementation.
nano train.py
Add the below code.
import cupy as cp
import matplotlib.pyplot as plt
from nn import SimpleNeuralNetwork
# XOR dataset
X = cp.array([[0, 0],
[0, 1],
[1, 0],
[1, 1]])
y = cp.array([[0], [1], [1], [0]])
# Initialize model
nn = SimpleNeuralNetwork(input_size=2, hidden_size=4, output_size=1)
# Training parameters
epochs = 10000
lr = 0.1
losses = []
# Training loop
for epoch in range(epochs):
nn.forward(X)
loss = nn.backward(X, y, lr)
losses.append(cp.asnumpy(loss)) # Convert to NumPy for plotting
if epoch % 1000 == 0:
print(f"Epoch {epoch} – Loss: {loss:.5f}")
# Save model weights
nn.save_model()
print("Training complete. Weights saved in 'model_weights/'.")
# Plot training loss
plt.plot(losses)
plt.title("Training Loss over Epochs")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.grid(True)
plt.savefig("training_loss.png")
plt.show()
The XOR problem is non-linearly separable, making it a perfect test for our neural network. We will track and visualize the loss over the training epochs.
2. Execute the training script with:
python3 train.py
This command initiates the training loop that optimizes our network weights. You should see the loss decreasing over time as the network learns.
Epoch 0 – Loss: 0.35371
Epoch 1000 – Loss: 0.25000
Epoch 2000 – Loss: 0.24955
Epoch 3000 – Loss: 0.24767
Epoch 4000 – Loss: 0.23425
Epoch 5000 – Loss: 0.17842
Epoch 6000 – Loss: 0.06636
Epoch 7000 – Loss: 0.02008
Epoch 8000 – Loss: 0.01001
Epoch 9000 – Loss: 0.00636
Training complete. Weights saved in 'model_weights/'.
Step 4: Testing the Network
1. Create test.py to evaluate our trained model. This verifies if our network learned the XOR function correctly:
nano test.py
Add the following code.
import cupy as cp
from nn import SimpleNeuralNetwork
# XOR input
X = cp.array([[0, 0],
[0, 1],
[1, 0],
[1, 1]])
# Load model
nn = SimpleNeuralNetwork(input_size=2, hidden_size=4, output_size=1)
nn.load_model()
# Predict
output = nn.forward(X)
print("Predicted Output:")
print(cp.round(output, 3).get()) # Move to CPU and print
2. Run the test script to see the predictions:
python3 test.py
The output shows how well our network approximates the XOR function. Values close to 0 or 1 indicate successful learning.
Predicted Output:
[[0.07 ]
[0.935]
[0.935]
[0.07 ]]
Conclusion
In this tutorial, we’ve successfully implemented a neural network from scratch using CuPy on an Ubuntu GPU server. By leveraging GPU acceleration through CuPy, we achieved significant performance improvements over traditional CPU-based implementations. The comprehensive solution, from environment setup to training and testing, demonstrates how accessible GPU computing can be for deep learning tasks.