How to Train Transformers for Time Series Forecasting on a Ubuntu 24.04 GPU Server

Table of Contents

Prerequisites
Step 1 - Set Up the Python Environment
Step 2 - Prepare a Weather Dataset
Step 3 - Build a PyTorch Dataset for Sliding Windows
Step 4 - Define a Lightweight Transformer Model
Step 5 - Train the Model
Step 6 - Evaluate Quick Predictions
Conclusion

Time series forecasting has countless applications, from predicting stock prices to anticipating energy demand. In this tutorial, you’ll learn how to train a Transformer model to forecast daily temperatures using weather data on an Ubuntu 24.04 GPU server. Transformers, originally designed for natural language processing, have proven to be powerful for sequential data tasks, including time series prediction.

Prerequisites

An Ubuntu 24.04 server with an NVIDIA GPU.
A non-root user or a user with sudo privileges.
NVIDIA drivers are installed on your server.

Step 1 – Set Up the Python Environment

We’ll begin by preparing a clean Python environment for our project. This ensures all dependencies are isolated and avoids conflicts with other packages on your system.

1. First, install Python, venv, pip, and Git.

apt install -y python3 python3-venv python3-pip git

2. Next, create a dedicated virtual environment for the Transformer weather forecasting project.

python3 -m venv weather-transformer-env
source weather-transformer-env/bin/activate

3. Upgrade pip inside the environment to avoid installation issues.

pip install --upgrade pip

4. Now install PyTorch with CUDA 12.1 support for GPU acceleration, along with other required packages.

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install transformers datasets scikit-learn matplotlib pandas

5. Finally, verify that your GPU is accessible to PyTorch.

python3 -c "import torch; print(torch.cuda.is_available())"

If everything is set up correctly, you should see.

True

This means your GPU is ready for training the Transformer model.

Step 2 – Prepare a Weather Dataset

Before training the Transformer model, we need time series data. In this example, we’ll generate a synthetic weather dataset that simulates 2,000 days of daily temperature readings. This lets you focus on building and training the model without having to source and clean real-world data first.

1. Create a new file called prepare_weather_data.py.

nano prepare_weather_data.py

Add the below code.

import pandas as pd
import numpy as np

# Generate synthetic daily temperature data
np.random.seed(42)
dates = pd.date_range(start="2015-01-01", periods=2000)
temperatures = 20 + 10 * np.sin(np.linspace(0, 20, 2000)) + np.random.randn(2000) * 2

df = pd.DataFrame({"date": dates, "temperature": temperatures})
df.to_csv("weather_data.csv", index=False)
print("Weather dataset saved as weather_data.csv")

This script:

Creates a date range starting from January 1, 2015.
Uses a sine wave with added noise to mimic seasonal temperature variations.
Stores the result in a CSV file called weather_data.csv.

2. Run the script.

python3 prepare_weather_data.py

At this point, you have a complete CSV file ready for loading into PyTorch. Later, you can easily swap this with real weather data from sources like NOAA or Meteostat by keeping the same column structure.

Step 3 – Build a PyTorch Dataset for Sliding Windows

Our Transformer model needs sequential input. For time series forecasting, this means feeding it a fixed-length window of past temperature readings and asking it to predict the next value.

We’ll create a custom PyTorch Dataset that takes a sequence length and returns (input_window, target_value) pairs.

1. Create a file named dataset_loader.py.

nano dataset_loader.py

Add the following code.

import torch
import pandas as pd
from torch.utils.data import Dataset

class WeatherDataset(Dataset):
    def __init__(self, series, seq_length=30):
        self.seq_length = seq_length
        self.series = torch.tensor(series, dtype=torch.float32)

    def __len__(self):
        return len(self.series) - self.seq_length

    def __getitem__(self, idx):
        return (
            self.series[idx:idx + self.seq_length],
            self.series[idx + self.seq_length]
        )

if __name__ == "__main__":
    df = pd.read_csv("weather_data.csv")
    dataset = WeatherDataset(df["temperature"].values)
    print("Sample Input:", dataset[0][0])
    print("Sample Target:", dataset[0][1])

2. Run the script.

python3 dataset_loader.py

You should see.

Sample Input: tensor([20.9934, 19.8235, 21.4955, 23.3462, 19.9318, 20.0318, 23.7584, 22.2346,
        19.8606, 21.9844, 20.0720, 20.1669, 21.6816, 17.4704, 17.9463, 20.3705,
        19.5683, 22.3212, 19.9751, 19.0649, 24.9190, 21.6341, 22.3184, 19.4314,
        21.2894, 22.6971, 20.2701, 23.4200, 21.5636, 22.2775])
Sample Target: tensor(21.7532)

Now the dataset is ready to feed into the Transformer model during training.

Step 4 – Define a Lightweight Transformer Model

With the dataset ready, we can design a Transformer-based neural network to handle our time series forecasting task. Transformers excel at capturing long-range dependencies, which makes them a strong choice for sequential data like weather patterns.

Create a file named transformer_model.py.

nano transformer_model.py

Add the following code.

import torch
import torch.nn as nn

class TransformerWeatherForecast(nn.Module):
    def __init__(self, seq_length=30, d_model=64, nhead=4, num_layers=3):
        super().__init__()
        self.input_proj = nn.Linear(1, d_model)
        encoder_layer = nn.TransformerEncoderLayer(d_model=d_model, nhead=nhead, batch_first=True)
        self.transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers=num_layers)
        self.fc_out = nn.Linear(d_model, 1)

    def forward(self, x):
        x = x.unsqueeze(-1)  # [batch, seq_len, 1]
        x = self.input_proj(x)
        x = self.transformer_encoder(x)
        out = self.fc_out(x[:, -1, :])
        return out

This PyTorch model, TransformerWeatherForecast, uses a Transformer encoder to learn temporal patterns in time series data. It projects input values to a higher dimension, processes them with multi-head attention, and outputs a prediction from the last time step.

Step 5 – Train the Model

Now that we have both the dataset and model, we can train the Transformer to predict the next day’s temperature from the previous 30 days.

1. Create a file named train.py.

nano train.py

Add the following code.

import torch
import torch.nn as nn
from torch.utils.data import DataLoader, random_split
import pandas as pd
from dataset_loader import WeatherDataset
from transformer_model import TransformerWeatherForecast

# Hyperparameters
SEQ_LENGTH = 30
BATCH_SIZE = 32
EPOCHS = 20
LR = 0.001

# Load dataset
df = pd.read_csv("weather_data.csv")
dataset = WeatherDataset(df["temperature"].values, seq_length=SEQ_LENGTH)

# Train-test split
train_size = int(len(dataset) * 0.8)
test_size = len(dataset) - train_size
train_ds, test_ds = random_split(dataset, [train_size, test_size])

train_loader = DataLoader(train_ds, batch_size=BATCH_SIZE, shuffle=True)
test_loader = DataLoader(test_ds, batch_size=BATCH_SIZE)

# Model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = TransformerWeatherForecast(seq_length=SEQ_LENGTH).to(device)

# Loss & optimizer
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=LR)

# Training loop
for epoch in range(EPOCHS):
    model.train()
    total_loss = 0
    for x, y in train_loader:
        x, y = x.to(device), y.to(device)
        optimizer.zero_grad()
        output = model(x)
        loss = criterion(output.squeeze(), y)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()

    print(f"Epoch [{epoch+1}/{EPOCHS}], Loss: {total_loss/len(train_loader):.4f}")

# Save model
torch.save(model.state_dict(), "transformer_weather_forecast.pth")
print("Model saved as transformer_weather_forecast.pth")

This code trains a Transformer-based model for weather temperature forecasting:

Imports libraries & modules – Uses PyTorch, pandas, and custom WeatherDataset & TransformerWeatherForecast.
Sets hyperparameters – Defines sequence length, batch size, epochs, and learning rate.
Loads and prepares data – Reads temperature data from weather_data.csv and creates a dataset of input sequences.
Splits dataset – Uses 80% for training and 20% for testing with PyTorch DataLoader.
Initializes model – Loads the Transformer model on GPU if available.
Defines loss & optimizer – Uses Mean Squared Error (MSE) loss and Adam optimizer.
Training loop – Iterates over epochs, performs forward/backward passes, updates weights, and logs loss.
Saves the model – Stores the trained weights as transformer_weather_forecast.pth.

2. Run the training script.

python3 train.py

Output.

Epoch [1/20], Loss: 259.9079
Epoch [2/20], Loss: 159.8631
Epoch [3/20], Loss: 92.6229
Epoch [4/20], Loss: 62.2326
Epoch [5/20], Loss: 54.8255
Epoch [6/20], Loss: 55.8948
Epoch [7/20], Loss: 54.4462
Epoch [8/20], Loss: 53.6575
Epoch [9/20], Loss: 54.1364
Epoch [10/20], Loss: 53.8897
Epoch [11/20], Loss: 54.0865
Epoch [12/20], Loss: 53.5845
Epoch [13/20], Loss: 53.2018
Epoch [14/20], Loss: 54.0120
Epoch [15/20], Loss: 53.6179
Epoch [16/20], Loss: 54.2151
Epoch [17/20], Loss: 54.2123
Epoch [18/20], Loss: 53.6800
Epoch [19/20], Loss: 51.3220
Epoch [20/20], Loss: 54.1718
Model saved as transformer_weather_forecast.pth

You’ll notice the loss drops significantly in the first few epochs, then stabilizes. This is a good sign that the model is learning the temperature patterns.

Step 6 – Evaluate Quick Predictions

After training, it’s time to check how well the model performs on unseen data. We’ll load the saved weights and run predictions for a few sequences from the dataset.

1. Create a file named evaluate.py.

nano evaluate.py

Add the following code.

import torch
from torch.utils.data import DataLoader
import pandas as pd
from dataset_loader import WeatherDataset
from transformer_model import TransformerWeatherForecast

SEQ_LENGTH = 30
df = pd.read_csv("weather_data.csv")
dataset = WeatherDataset(df["temperature"].values, seq_length=SEQ_LENGTH)
loader = DataLoader(dataset, batch_size=1)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = TransformerWeatherForecast(seq_length=SEQ_LENGTH).to(device)
model.load_state_dict(torch.load("transformer_weather_forecast.pth"))
model.eval()

with torch.no_grad():
    for i, (x, y) in enumerate(loader):
        x, y = x.to(device), y.to(device)
        pred = model(x)
        print(f"Actual: {y.item():.2f}°C, Predicted: {pred.item():.2f}°C")
        if i == 10:
            break

2. Run the evaluation script.

python3 evaluate.py

Output.

Actual: 21.75°C, Predicted: 21.09°C
Actual: 26.76°C, Predicted: 21.09°C
Actual: 23.12°C, Predicted: 21.09°C
Actual: 21.13°C, Predicted: 21.09°C
Actual: 24.98°C, Predicted: 21.09°C
Actual: 20.99°C, Predicted: 21.09°C
Actual: 23.94°C, Predicted: 21.09°C
Actual: 19.70°C, Predicted: 21.09°C
Actual: 21.05°C, Predicted: 21.09°C
Actual: 24.20°C, Predicted: 21.09°C
Actual: 25.37°C, Predicted: 21.09°C

Here, the model is producing consistent predictions around 21.09°C, which indicates it has learned a general mean trend but not yet captured more variation.

Conclusion

In this tutorial, you learned how to build, train, and evaluate a Transformer model for time series forecasting using PyTorch on an Ubuntu 24.04 GPU server. We walked through setting up the environment, generating a synthetic weather dataset, creating a custom PyTorch Dataset for sliding windows, defining a lightweight Transformer architecture, training it with GPU acceleration, and evaluating its predictions.

Facebook

Atlantic.Net Cloud GPU Hosting Massive Computing Power

Up in 60 Seconds!

Your subscription could not be saved. Please try again.

Your subscription has been successful.

Newsletter

Subscribe to our newsletter and stay updated.

Email Address

Provide your email address to subscribe. For e.g [email protected]

Your subscription could not be saved. Please try again.

Your subscription has been successful.

View White Papers

How to Train Transformers for Time Series Forecasting on a Ubuntu 24.04 GPU Server

Prerequisites

Step 1 – Set Up the Python Environment

Step 2 – Prepare a Weather Dataset

Step 3 – Build a PyTorch Dataset for Sliding Windows

Step 4 – Define a Lightweight Transformer Model

Step 5 – Train the Model

Step 6 – Evaluate Quick Predictions

Conclusion

Atlantic.Net Cloud GPU Hosting Massive Computing Power

Award-Winning Hosting Solutions & Services