Recurrent Neural Networks (RNNs) are powerful deep learning models, particularly well-suited for sequence labeling tasks such as named entity recognition, part-of-speech tagging, or speech recognition. When combined with GPU acceleration, RNNs can process sequences much faster than on CPUs alone.
In this guide, we’ll walk through setting up an Ubuntu 24.04 GPU server for sequence labeling tasks using RNNs with Python and TensorFlow/Keras.
Prerequisites
- An Ubuntu 24.04 server with an NVIDIA GPU.
- A non-root user with sudo privileges.
- NVIDIA drivers installed.
- Compatible NVIDIA CUDA Toolkit and cuDNN library installed.
Step 1: Setting Up the Environment
First, let’s install the necessary system packages and create a Python virtual environment.
1. Install required system packages.
apt install -y python3-pip python3-dev python3-venv build-essential libcupti-dev git wget unzip
2. Create and activate a Python virtual environment.
python3 -m venv rnn_env
source rnn_env/bin/activate
3. Upgrade pip and install the required Python packages.
pip install --upgrade pip
pip install tensorflow numpy pandas matplotlib scikit-learn seqeval
4. Let’s check that TensorFlow can detect your GPU.
python3 -c "import tensorflow as tf; print('Num GPUs:', len(tf.config.list_physical_devices('GPU')))"
Output.
Num GPUs: 1
Step 2: Prepare the Project Structure
1. Create a directory structure for our sequence labeling project.
mkdir -p ~/sequence_labeling/{data,models,utils}
cd ~/sequence_labeling
2. Download the dataset. We’ll use the CoNLL-2003 dataset for this example.
wget https://data.deepai.org/conll2003.zip -P data/
3. Extract the downloaded file to the data directory.
unzip data/conll2003.zip -d data/
Step 3: Create Data Loading Utilities
In this section, we will create helper functions to load and preprocess the CoNLL dataset.
1. Create a data loader script to handle the dataset.
nano utils/data_loader.py
Add the following code.
import numpy as np
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.utils import to_categorical
def load_conll_data(file_path):
tokens, labels = [], []
with open(file_path, 'r', encoding='utf-8') as f:
current_tokens, current_labels = [], []
for line in f:
line = line.strip()
if not line:
if current_tokens:
tokens.append(current_tokens)
labels.append(current_labels)
current_tokens, current_labels = [], []
else:
parts = line.split()
current_tokens.append(parts[0])
current_labels.append(parts[-1])
return tokens, labels
def prepare_data(train_path, test_path):
train_tokens, train_labels = load_conll_data(train_path)
test_tokens, test_labels = load_conll_data(test_path)
# Create vocabulary and label mappings
word2idx = {w: i+2 for i, w in enumerate(set([w for s in train_tokens for w in s]))}
word2idx[''] = 0
word2idx[''] = 1
label2idx = {l: i for i, l in enumerate(set([l for s in train_labels for l in s]))}
idx2label = {i: l for l, i in label2idx.items()}
# Convert tokens and labels to indices
train_sequences = [[word2idx.get(w, word2idx['']) for w in s] for s in train_tokens]
train_labels = [[label2idx[l] for l in s] for s in train_labels]
test_sequences = [[word2idx.get(w, word2idx['']) for w in s] for s in test_tokens]
test_labels = [[label2idx[l] for l in s] for s in test_labels]
# Pad sequences
max_len = max(len(s) for s in train_sequences)
train_sequences = pad_sequences(train_sequences, maxlen=max_len, padding='post')
train_labels = pad_sequences(train_labels, maxlen=max_len, padding='post')
test_sequences = pad_sequences(test_sequences, maxlen=max_len, padding='post')
test_labels = pad_sequences(test_labels, maxlen=max_len, padding='post')
# Convert labels to categorical
num_classes = len(label2idx)
train_labels = [to_categorical(i, num_classes=num_classes) for i in train_labels]
test_labels = [to_categorical(i, num_classes=num_classes) for i in test_labels]
return {
'word2idx': word2idx,
'label2idx': label2idx,
'idx2label': idx2label,
'train_sequences': train_sequences,
'train_labels': train_labels,
'test_sequences': test_sequences,
'test_labels': test_labels,
'max_len': max_len,
'num_classes': num_classes,
'vocab_size': len(word2idx)
}
This script loads and tokenizes the dataset, converts it to indexed sequences, and pads them for training.
2. Run the data loader script.
python3 utils/data_loader.py
Step 4: Build and Train the RNN Model
Here we build a BiLSTM-based sequence labeling model and train it using our processed data.
1. Create a training script.
nano train.py
Add the following training code.
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Embedding, Bidirectional, LSTM, TimeDistributed, Dense
from utils.data_loader import prepare_data
import os
import numpy as np
import pickle
def build_rnn_model(vocab_size, max_len, num_classes):
input_layer = Input(shape=(max_len,))
embedding = Embedding(input_dim=vocab_size, output_dim=128)(input_layer)
lstm = Bidirectional(LSTM(units=64, return_sequences=True))(embedding)
output = TimeDistributed(Dense(num_classes, activation='softmax'))(lstm)
model = Model(inputs=input_layer, outputs=output)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
return model
def main():
# Prepare data
data = prepare_data('data/train.txt', 'data/test.txt')
# Build model
model = build_rnn_model(data['vocab_size'], data['max_len'], data['num_classes'])
model.summary()
# Train model
history = model.fit(
np.array(data['train_sequences']),
np.array(data['train_labels']),
validation_data=(np.array(data['test_sequences']), np.array(data['test_labels'])),
batch_size=32,
epochs=10
)
# Save model and metadata
os.makedirs('models', exist_ok=True)
model.save('models/rnn_sequence_labeler.h5')
with open('models/metadata.pkl', 'wb') as f:
pickle.dump({
'word2idx': data['word2idx'],
'label2idx': data['label2idx'],
'idx2label': data['idx2label'],
'max_len': data['max_len']
}, f)
print("Model training complete and saved to models/ directory")
if __name__ == "__main__":
main()
This script defines the RNN model and trains it using BiLSTM layers for better context understanding in both directions.
2. Run the training script.
python3 train.py
Output.
Model: "functional"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type) ┃ Output Shape ┃ Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ input_layer (InputLayer) │ (None, 113) │ 0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ embedding (Embedding) │ (None, 113, 128) │ 3,024,128 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ bidirectional (Bidirectional) │ (None, 113, 128) │ 98,816 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ time_distributed (TimeDistributed) │ (None, 113, 9) │ 1,161 │
└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
Total params: 3,124,105 (11.92 MB)
Trainable params: 3,124,105 (11.92 MB)
Non-trainable params: 0 (0.00 B)
Epoch 1/10
I0000 00:00:1744894143.190118 11843 cuda_dnn.cc:529] Loaded cuDNN version 90700
1/1 ━━━━━━━━━━━━━━━━━━━━ 4s 4s/step - accuracy: 0.6667 - loss: 1.0917 - val_accuracy: 0.2222 - val_loss: 1.1006
Epoch 2/10
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 62ms/step - accuracy: 0.6667 - loss: 1.0773 - val_accuracy: 0.3333 - val_loss: 1.1023
Epoch 3/10
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 60ms/step - accuracy: 0.6667 - loss: 1.0628 - val_accuracy: 0.2222 - val_loss: 1.1041
Epoch 4/10
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 61ms/step - accuracy: 0.6667 - loss: 1.0481 - val_accuracy: 0.2222 - val_loss: 1.1062
Epoch 5/10
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 60ms/step - accuracy: 0.6667 - loss: 1.0328 - val_accuracy: 0.2222 - val_loss: 1.1084
Epoch 6/10
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 58ms/step - accuracy: 0.6667 - loss: 1.0168 - val_accuracy: 0.2222 - val_loss: 1.1110
Epoch 7/10
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 58ms/step - accuracy: 0.6667 - loss: 0.9998 - val_accuracy: 0.2222 - val_loss: 1.1139
Epoch 8/10
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 59ms/step - accuracy: 0.6667 - loss: 0.9818 - val_accuracy: 0.2222 - val_loss: 1.1173
Epoch 9/10
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 60ms/step - accuracy: 0.6667 - loss: 0.9627 - val_accuracy: 0.2222 - val_loss: 1.1212
Epoch 10/10
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 62ms/step - accuracy: 0.6667 - loss: 0.9424 - val_accuracy: 0.2222 - val_loss: 1.1257
Step 5: Evaluate the Model
This section assesses the performance of the trained model on the test data, using accuracy and classification reports.
1. Create an evaluation script.
nano test.py
Add the following code.
import tensorflow as tf
import numpy as np
import pickle
from sklearn.metrics import classification_report
from seqeval.metrics import classification_report as seqeval_report
def load_model_and_metadata():
model = tf.keras.models.load_model('models/rnn_sequence_labeler.h5')
with open('models/metadata.pkl', 'rb') as f:
metadata = pickle.load(f)
return model, metadata
def evaluate_model():
model, metadata = load_model_and_metadata()
# Reload test data
from utils.data_loader import prepare_data
data = prepare_data('data/train.txt', 'data/test.txt')
# Predict on test data
y_pred = model.predict(np.array(data['test_sequences']))
y_pred = np.argmax(y_pred, axis=-1)
y_true = np.argmax(np.array(data['test_labels']), axis=-1)
# Convert indices to labels for each sequence
true_labels = []
pred_labels = []
for i in range(len(y_true)):
true_seq = []
pred_seq = []
for j in range(len(y_true[i])):
true_label = metadata['idx2label'].get(y_true[i][j], 'O')
pred_label = metadata['idx2label'].get(y_pred[i][j], 'O')
# Skip padding tokens
if true_label != 'O' or pred_label != 'O':
true_seq.append(true_label)
pred_seq.append(pred_label)
true_labels.append(true_seq)
pred_labels.append(pred_seq)
# Print token-level classification report
print("Token-level Classification Report:")
flat_true = [label for seq in true_labels for label in seq]
flat_pred = [label for seq in pred_labels for label in seq]
print(classification_report(flat_true, flat_pred))
# Print sequence-level report (requires seqeval)
print("\nSequence-level Classification Report:")
print(seqeval_report(true_labels, pred_labels))
if __name__ == "__main__":
evaluate_model()
This script reloads the saved model and computes detailed evaluation metrics on the test set using both sklearn and seqeval libraries.
2. Run the evaluation.
python3 test.py
Step 6: Create a Prediction Script
1. Finally, let’s create a script to make predictions on new text.
nano predict.py
Add the following code.
import tensorflow as tf
import numpy as np
import pickle
from tensorflow.keras.preprocessing.sequence import pad_sequences
def load_model_and_metadata():
model = tf.keras.models.load_model('models/rnn_sequence_labeler.h5')
with open('models/metadata.pkl', 'rb') as f:
metadata = pickle.load(f)
return model, metadata
def predict_sequence(text):
model, metadata = load_model_and_metadata()
tokens = text.split()
sequence = [metadata['word2idx'].get(w, metadata['word2idx']['']) for w in tokens]
padded = pad_sequences([sequence], maxlen=metadata['max_len'], padding='post')
prediction = model.predict(padded)
prediction = np.argmax(prediction, axis=-1)[0]
return [(token, metadata['idx2label'].get(idx, 'O')) for token, idx in zip(tokens, prediction)]
if __name__ == "__main__":
while True:
text = input("\nEnter text to analyze (or 'quit' to exit): ")
if text.lower() == 'quit':
break
results = predict_sequence(text)
print("\nPredicted labels:")
for token, label in results:
print(f"{token}: {label}")
This script allows users to enter a sentence and receive predicted labels for each word using the trained RNN model.
2. Test the prediction script.
python3 predict.py
You will be asked to enter the text to analyze.
Enter text to analyze (or 'quit' to exit): Microsoft is based in USA
Write the text “Microsoft is based in USA” and press the Enter key. You will see the output below.
I0000 00:00:1744894809.978962 13089 cuda_dnn.cc:529] Loaded cuDNN version 90700
1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 1s/step
Predicted labels:
Microsoft: B-ORG
is: O
based: O
in: O
USA: B-LOC
Conclusion
In this guide, you learned how to set up a GPU-powered RNN model for sequence labeling tasks on an Ubuntu 24.04 server. We walked through the setup of the environment, data loading, model training, evaluation, and prediction. This comprehensive pipeline can be further enhanced with advanced techniques, such as CRF layers, attention mechanisms, or transformer-based models, to achieve even greater accuracy.