Atlantic.Net Blog

How to Install and Setup Apache Spark on Debian 10

Hitesh Jethva
by Atlantic.Net (273 posts) under Dedicated Server Hosting, Tutorials
0 Comments

Apache Spark is a free, open-source, general-purpose framework for clustered computing. It is specially designed for speed and is used in machine learning to stream processing to complex SQL queries. It is capable of analyzing large datasets across multiple computers and processes the data in parallel. Apache Spark provides APIs for multiple programming languages including Python, R, and Scala. It also supports higher-level tools including GraphX, Spark SQL, MLlib, and more.

In this post, we will show you how to install and configure Apache Spark on Debian 10.

Prerequisites

  • A fresh Debian 10 server on the Atlantic.Net Cloud Platform
  • A root password configured on your server

Step 1 – Create Atlantic.Net Cloud Server

First, log in to your Atlantic.Net Cloud Server. Create a new server, choosing Debian 10 as the operating system with at least 2GB RAM. Connect to your Cloud Server via SSH and log in using the credentials highlighted at the top of the page.

Once you are logged in to your Debian 10 server, run the following command to update your base system with the latest available packages.

apt-get update -y

Step 2 – Install Java

Before starting, you will need to install Java to run Apache Spark. You can install it using the following command:

apt-get install default-jdk -y

After installing Java, verify the Java installation using the following command:

java --version

You should see the following output:

openjdk 11.0.11 2021-04-20
OpenJDK Runtime Environment (build 11.0.11+9-post-Debian-1deb10u1)
OpenJDK 64-Bit Server VM (build 11.0.11+9-post-Debian-1deb10u1, mixed mode, sharing)

Step 3 – Install Scala

You will also need to install Scala to run Apache Spark. You can install it using the following command:

apt-get install scala -y

Once the Scala is installed, verify the Scala installation using the following command:

scala -version

You should get the following output:

Scala code runner version 2.11.12 -- Copyright 2002-2017, LAMP/EPFL

Step 4 – Install Apache Spark

First, you will need to download the latest version of Apache Spark from its official website. You can download it with the following command:

wget https://downloads.apache.org/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz

Once the download is completed, extract the downloaded file with the following command:

tar -xvzf spark-3.1.2-bin-hadoop3.2.tgz

Next, move the extracted directory to /opt:

mv spark-3.1.2-bin-hadoop3.2 /opt/spark

Next, you will need to define an environment variable to run Spark.

You can define it inside the ~/.bashrc file:

nano ~/.bashrc

Add the following line:

export SPARK_HOME=/opt/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

Save and close the file, then activate the environment variable with the following command:

source ~/.bashrc

Step 5 – Start Apache Spark Cluster

At this point, Apache spark is installed. You can now start the Apache Spark using the following command:

start-master.sh

You should get the following output:

starting org.apache.spark.deploy.master.Master, logging to /opt/spark/logs/spark-root-org.apache.spark.deploy.master.Master-1-debian10.out

By default, Apache Spark listens on port 8080. You can check it with the following command:

ss -tunelp | grep 8080

You should get the following output:

tcp   LISTEN 0      1                                   *:8080            *:*    users:(("java",pid=5931,fd=302)) ino:24026 sk:9 v6only:0 <->                   

Step 6 – Start Apache Spark Worker Process

Next, start the Apache Spark worker process with the following command:

start-worker.sh spark://debian10:7077

You should get the following output:

starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-debian10.out

Step 7 – Access Apache Spark

You can now access the Apache Spark web interface using the URL http://your-server-ip:8080. You should see the Apache Spark dashboard on the following screen:
Apache Spark Dashboard

Step 8 – Access Apache Spark Shell

Apache Spark also provides a command-line interface to manage Apache Spark. You can access it using the following command:

spark-shell

Once you are connected, you should get the following shell:

Spark context Web UI available at http://debian10:4040
Spark context available as 'sc' (master = local[*], app id = local-1627197681924).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.1.2
      /_/
         
Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 11.0.11)
Type in expressions to have them evaluated.
Type :help for more information.

scala> 

If you want to stop the Apache Spark cluster, run the following command:

stop-master.sh

To stop the Apache Spark worker, run the following command:

stop-worker.sh

Conclusion

Congratulations! You have successfully installed and configured Apache Spark on Debian 10. This guide will help you to perform basic tests before you start configuring a Spark cluster and performing advanced actions. Try it on your dedicated server today!

Get A Free To Use Cloud VPS

Free Tier Includes:
G3.2GB Cloud VPS Free to Use for One Year
50 GB of Block Storage Free to Use for One Year
50 GB of Snapshots Free to Use for One Year


Looking for a Hosting Solution?

We Provide Cloud, Dedicated, & Colocation.

  • Seven Global Data Center Locations.
  • Flexible Private, Public, & Hybrid Hosting.
  • 24x7x365 Security, Support, & Monitoring.
Contact Us Now! Med Tech Award FTC
SOC Audit HIPAA Audit HITECH Audit

Recent Posts

Get started with 12 months of free cloud VPS hosting

Free Tier includes:
G3.2GB Cloud VPS Server Free to Use for One Year
50 GB of Block Storage Free to Use for One Year
50 GB of Snapshots Free to Use for One Year


New York, NY

100 Delawanna Ave, Suite 1

Clifton, NJ 07014

United States

San Francisco, CA

2820 Northwestern Pkwy,

Santa Clara, CA 95051

United States

Dallas, TX

2323 Bryan Street,

Dallas, Texas 75201

United States

Ashburn, VA

1807 Michael Faraday Ct,

Reston, VA 20190

United States

Orlando, FL

440 W Kennedy Blvd, Suite 3

Orlando, FL 32810

United States

Toronto, Canada

20 Pullman Ct, Scarborough,

Ontario M1X 1E4

Canada

London, UK

14 Liverpool Road, Slough,

Berkshire SL1 4QZ

United Kingdom

Resources