Deploying deep learning models outside of powerful cloud servers is now easier than ever, thanks to TensorFlow Lite (TFLite). When you convert your Keras model to TFLite, you unlock the ability to run AI-powered apps on mobile devices, Raspberry Pi, and a wide range of edge hardware.
Many developers and data scientists want to use the same trained model across different platforms, but traditional TensorFlow models are often too large and resource-intensive for these environments.
That’s where TensorFlow Lite comes in: it gives you smaller, faster models that retain much of the original model’s accuracy.
In this guide, you’ll learn how to take a Keras model, convert it step-by-step to the TFLite format, and test it, all on an Ubuntu 24.04 GPU server.
Prerequisites
- An Ubuntu 24.04 server with an NVIDIA GPU.
- A non-root user or a user with sudo privileges.
- NVIDIA drivers are installed on your server.
Step 1: Install TensorFlow
First, update all system packages to the latest version.
apt update -y
Next, install Python and other required dependencies.
apt install python3 python3-venv python3-pip -y
Create and activate the virtual environment.
python3 -m venv keras2tflite-env
source keras2tflite-env/bin/activate
Upgrade pip to the latest version.
pip install --upgrade pip
Install TensorFlow.
pip install tensorflow
Step 2: Create and Save a Keras Model
In this section, you’ll build and save a simple Keras model for demonstration.
Create a create_keras_model.py file.
nano create_keras_model.py
Add the following code.
import tensorflow as tf
from tensorflow import keras
# Build a basic model (MNIST-style, for simplicity)
model = keras.Sequential([
keras.layers.Input(shape=(28, 28)),
keras.layers.Flatten(),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(10, activation='softmax')
])
# Compile the model (no need to train for conversion demo)
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Save the Keras model in HDF5 format
model.save('my_keras_model.h5')
print("Keras model saved as 'my_keras_model.h5'")
This script builds a basic neural network and saves it to a file named my_keras_model.h5. You can replace this with your own trained model if needed.
Now, run the script.
python3 create_keras_model.py
Output.
Keras model saved as 'my_keras_model.h5'
Step 3: Convert Keras Model to TensorFlow Lite
Now, you’ll convert the Keras model to the TensorFlow Lite format.
Create a Python script.
nano convert_to_tflite.py
Add the following code.
import tensorflow as tf
# Load the Keras model you saved
model = tf.keras.models.load_model('my_keras_model.h5')
# Convert to TFLite format
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
# Save as a .tflite file
with open('model.tflite', 'wb') as f:
f.write(tflite_model)
print("TFLite model saved as 'model.tflite'")
This code loads your Keras. h5 model, converts it to TFLite, and saves it as a model.tflite. This is the model you’ll deploy to mobile or edge devices.
Run the script.
python3 convert_to_tflite.py
Output.
TFLite model saved as 'model.tflite'
Step 4: Optimize Model with Quantization
If you want your model to be even smaller and faster, you can optimize it with quantization.
Create a script.
nano convert_with_quantization.py
Add the following code.
import tensorflow as tf
# Load the Keras model again
model = tf.keras.models.load_model('my_keras_model.h5')
# Enable quantization during conversion
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
quant_tflite_model = converter.convert()
# Save quantized model
with open('model_quant.tflite', 'wb') as f:
f.write(quant_tflite_model)
print("Quantized TFLite model saved as 'model_quant.tflite'")
This script uses TensorFlow’s built-in optimization to reduce the model’s size and improve inference speed, with minimal accuracy loss. The output file is model_quant.tflite.
Run the script.
python3 convert_with_quantization.py
Output.
Quantized TFLite model saved as 'model_quant.tflite'
Step 5: Run Inference with TFLite Interpreter
You should always test your TFLite model to ensure it works as expected. Here’s how to do a quick check.
Let’s create a script to test the model.
nano test_tflite_inference.py
Add the following code.
import numpy as np
import tensorflow as tf
# Load the TFLite model and allocate tensors
interpreter = tf.lite.Interpreter(model_path='model.tflite')
interpreter.allocate_tensors()
# Get details for input and output
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
# Create a dummy input (batch size 1, 28x28 zeros)
input_data = np.zeros((1, 28, 28), dtype=np.float32)
interpreter.set_tensor(input_details[0]['index'], input_data)
# Run inference
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
print("TFLite model output:", output_data)
This script loads your TFLite model and runs it with dummy data to make sure everything works. If you see a probability array as output, your conversion was successful!
Run the script.
python3 test_tflite_inference.py
Output.
TFLite model output: [[0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1]]
The output shows the model’s prediction probabilities for each of the 10 possible classes, with each value here being 0.1 since we used a dummy input of all zeros.
Conclusion
You’ve just walked through the complete process of converting a Keras model to TensorFlow Lite, right from installation and setup to final testing on Ubuntu 24.04. With these steps, you can confidently deploy your AI solutions to mobile phones, single-board computers, and embedded devices.