Tutorial on installing Apache Spark on Ubuntu 20.04

888u

Last update at :2024-02-01,Edit by888u

Apache Spark was very popular in academia in the past few years, but it may not be so popular in recent years, because almost all the research problems seem to have been studied, and recently people have begun to study new distributed frameworks. This article introduces tutorials on how to install Apache Spark on the Ubuntu 20.04 Linux operating system, including tutorials on installing Java, installing Apache Spark, and accessing through the Apache Spark web interface. Apache Spark is an open source, general-purpose, multi-language analytics engine for large-scale data processing. It works on single and multiple nodes by leveraging the RAM in the cluster to perform fast data queries on large amounts of data. It provides batch data processing and real-time streaming, and supports high-level APIs in languages ​​​​such as Python, SQL, Scala, Java, or R. The framework provides in-memory technology that enables it to store queries and data directly in the main memory of the cluster nodes.

1. Install Java

Update system packages:

$ sudo apt update

Install Java:

$ sudo apt install default-jdk -y

Confirm Java installation:

$java-version

2. Install Apache Spark

Install necessary packages:

$ sudo apt install curl mlocate git scala -y

Download Apache Spark, the latest version can be downloaded here: https://spark.apache.org/downloads.html

$ curl -O https://archive.apache.org/dist/spark/spark-3.2.0/spark-3.2.0-bin-hadoop3.2.tgz

Unzip the Spark installation package:

$ sudo tar xvf spark-3.2.0-bin-hadoop3.2.tgz

Create installation directory:

$ sudo mkdir /opt/spark

Move files to the installation directory:

$ sudo mv spark-3.2.0-bin-hadoop3.2/* /opt/spark

Modify directory permissions:

$ sudo chmod -R 777 /opt/spark

Edit the bashrc configuration file and add the Apache Spark installation directory to the system path:

$ sudo nano ~/.bashrc

Add the following two lines of code to the end of the file:

export SPARK_HOME=/opt/spark export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

Save the file and make it active:

$ source ~/.bashrc

Start a standalone master server:

$ start-master.sh

Find your server hostname from the dashboard by visiting http://ServerIPaddress:8080. It might look like this:

URL: spark://my-server-development:7077

Start the Apache Spark worker process. Change spark://ubuntu:7077 with your server hostname.

$ start-slave.sh spark://ubuntu:7077

3. Access the Apache Spark Web interface

Go to the browser address bar to access the web interface and enter http://ServerIPaddress:8080 to access the web installation wizard. For example:

http://192.0.2.10:8080

At this point, you have installed Apache Spark on your server. You can now access the main dashboard to start managing your cluster.

Recommended plan for bricklayers

Warm reminder: If you have difficulty choosing, just choose the CN2 GIA-E plan in the middle. The quarterly payment is $49.99, and you can switch between up to 12 computer rooms at will. plan Memory CPU harddisk Traffic/month bandwidth engine room price Buy Banwagonhost Discount Code: BWH3HYATVBJW Real-time inventory detection of the full solution for brick movers
CN2 (cheapest) 1GB 1 core 20GB 1TB 1Gbps DC3 CN2 DC8 ZNET $49.99/year Buy
CN2 2GB 1 core 40GB 2TB 1Gbps $52.99/half year $99.99/year Buy
CN2 GIA-E (Most recommended) 1GB 2 cores 20GB 1TB 2.5Gbps DC6 CN2 GIA-E DC9 CN2 GIA Japan SoftBank JPOS_1 Netherlands EUNL_9 $49.99/quarter $169.99/year Buy
CN2 GIA-E 2GB 3 core 40GB 2TB 2.5Gbps $89.99/quarter $299.99/year Buy
HK 2GB 2 cores 40GB 0.5TB 1Gbps Hong Kong, China CN2 GIA $89.99/month $899.99/year Buy
HK 4GB 4 core 80GB 1TB 1Gbps $155.99/month $1559.99/year Buy
TOKYO 2GB 2 cores 40GB 0.5TB 1.2Gbps Tokyo, Japan CN2 GIA $89.99/month $899.99/year Buy
TOKYO 4GB 4 core 80GB 1TB 1.2Gbps $155.99/month $1559.99/year Buy

Recommended site searches: US virtual host purchase, space registration, forum host, cloud host rental, server rental US high defense, virtual host purchase, Ministry of Industry and Information Technology ICP registration, expired registration domain name query, foreign virtual space, station group server rental,

Tutorial on installing Apache Spark on Ubuntu 20.04

All copyrights belong to 888u unless special state
取消
微信二维码
微信二维码
支付宝二维码