Apache Spark is a free and open-source analytics engine used for executing data engineering, data science, and machine learning on a single server or cluster. It is a distributed processing system that utilizes in-memory caching to perform processing tasks on very large data sets. Some of the key features of Apache Spark are shown below.

  • Fault tolerance.
  • Dynamic In Nature.
  • Lazy Evaluation.
  • Real-Time Stream Processing.
  • Speed.
  • Reusability.
  • Advanced Analytics.
  • In Memory Computing.

This post will show you how to install Apache Spark on Fedora.

Step 1 – Install Java JDK

Apache Spark is written in Java, so Java JDK must be installed on your server. You can install it with the following command.

dnf install java-11-openjdk-devel -y

Once Java is installed, you can verify the Java installation using the following command.

java --version

Output:

openjdk 11.0.15 2022-04-19
OpenJDK Runtime Environment 18.9 (build 11.0.15+10)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.15+10, mixed mode, sharing)

Step 2 – Install Apache Spark

First, download the latest version of Apache Spark using the wget command.

wget https://dlcdn.apache.org/spark/spark-3.4.1/spark-3.4.1-bin-hadoop3.tgz

After the successful download, extract the downloaded file with the following command.

tar -xvf spark-3.4.1-bin-hadoop3.tgz

Next, move the extracted directory to the /opt directory.

mv spark-3.4.1-bin-hadoop3 /opt/spark

Next, add a user to run Spark then set the ownership of the Spark directory.

useradd spark
chown -R spark:spark /opt/spark

Step 3 – Create a Systemd Service File

Next, create a systemd service file to manage the Spark master service.

nano /etc/systemd/system/spark-master.service

Add the following configurations.

[Unit]
Description=Apache Spark Master
After=network.target

[Service]
Type=forking
User=spark
Group=spark
ExecStart=/opt/spark/sbin/start-master.sh
ExecStop=/opt/spark/sbin/stop-master.sh

[Install]
WantedBy=multi-user.target

Save and close the file, then create a service file for Spark slave.

nano /etc/systemd/system/spark-slave.service

Add the following configurations.

[Unit]

Description=Apache Spark Slave

After=network.target

[Service]
Type=forking
User=spark
Group=spark
ExecStart=/opt/spark/sbin/start-slave.sh spark://your-server-ip:7077
ExecStop=/opt/spark/sbin/stop-slave.sh

[Install]
WantedBy=multi-user.target

Save and close the file, then reload the systemd daemon.

systemctl daemon-reload

Next, start and enable the Spark master service.

systemctl start spark-master
systemctl enable spark-master

You can check the Spark master status using the following command.

systemctl status spark-master

Output:

● spark-master.service - Apache Spark Master
     Loaded: loaded (/etc/systemd/system/spark-master.service; disabled; vendor preset: disabled)
     Active: active (running) since Sat 2023-06-24 04:26:29 EDT; 4s ago
    Process: 16172 ExecStart=/opt/spark/sbin/start-master.sh (code=exited, status=0/SUCCESS)
   Main PID: 16184 (java)
      Tasks: 34 (limit: 9497)
     Memory: 209.8M
        CPU: 17.803s
     CGroup: /system.slice/spark-master.service
             └─16184 /usr/lib/jvm/java-11-openjdk-11.0.15.0.10-1.fc34.x86_64/bin/java -cp /opt/spark/conf/:/opt/spark/jars/* -Xmx1g org.apache.spark.deploy.master.Mast>

Jun 24 04:26:25 fedora systemd[1]: Starting Apache Spark Master...
Jun 24 04:26:25 fedora start-master.sh[16178]: starting org.apache.spark.deploy.master.Master, logging to /opt/spark/logs/spark-spark-org.apache.spark.deploy.master.Ma>
Jun 24 04:26:29 fedora systemd[1]: Started Apache Spark Master.

Step 4 – Access Spark Web Interface

At this point, Spark is installed and running on port 8080. Now, open your web browser and access Apache Spark using the URL http://your-server-ip:8080. You should see the Spark dashboard on the following screen.

apache spark dashboard

Now, start and enable the Spark slave service with the following command.

systemctl start spark-slave
systemctl enable spark-slave

Now, go back to the Spark web interface and reload the page. You should see that a new worker node has been added to the Spark.

see added worker node

Conclusion

In this tutorial, you learned how to install Apache Spark on Fedora. You can now start creating your machine learning and data science project and deploy it using Apache Spark. You can now deploy Apache Spark on dedicated server hosting from Atlantic.Net!