Last update at :2024-02-01,Edit by888u
Apache Spark was very popular in academia in the past few years, but it may not be so popular in recent years, because almost all the research problems seem to have been studied, and recently people have begun to study new distributed frameworks. This article introduces tutorials on how to install Apache Spark on the Ubuntu 20.04 Linux operating system, including tutorials on installing Java, installing Apache Spark, and accessing through the Apache Spark web interface. Apache Spark is an open source, general-purpose, multi-language analytics engine for large-scale data processing. It works on single and multiple nodes by leveraging the RAM in the cluster to perform fast data queries on large amounts of data. It provides batch data processing and real-time streaming, and supports high-level APIs in languages such as Python, SQL, Scala, Java, or R. The framework provides in-memory technology that enables it to store queries and data directly in the main memory of the cluster nodes.
1. Install Java
Update system packages:
$ sudo apt updateInstall Java:
$ sudo apt install default-jdk -yConfirm Java installation:
$java-version2. Install Apache Spark
Install necessary packages:
$ sudo apt install curl mlocate git scala -yDownload Apache Spark, the latest version can be downloaded here: https://spark.apache.org/downloads.html
$ curl -O https://archive.apache.org/dist/spark/spark-3.2.0/spark-3.2.0-bin-hadoop3.2.tgzUnzip the Spark installation package:
$ sudo tar xvf spark-3.2.0-bin-hadoop3.2.tgzCreate installation directory:
$ sudo mkdir /opt/sparkMove files to the installation directory:
$ sudo mv spark-3.2.0-bin-hadoop3.2/* /opt/sparkModify directory permissions:
$ sudo chmod -R 777 /opt/sparkEdit the bashrc configuration file and add the Apache Spark installation directory to the system path:
$ sudo nano ~/.bashrcAdd the following two lines of code to the end of the file:
export SPARK_HOME=/opt/spark export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbinSave the file and make it active:
$ source ~/.bashrcStart a standalone master server:
$ start-master.shFind your server hostname from the dashboard by visiting http://ServerIPaddress:8080. It might look like this:
URL: spark://my-server-development:7077Start the Apache Spark worker process. Change spark://ubuntu:7077 with your server hostname.
$ start-slave.sh spark://ubuntu:70773. Access the Apache Spark Web interface
Go to the browser address bar to access the web interface and enter http://ServerIPaddress:8080 to access the web installation wizard. For example:
http://192.0.2.10:8080At this point, you have installed Apache Spark on your server. You can now access the main dashboard to start managing your cluster.
Recommended plan for bricklayers
Warm reminder: If you have difficulty choosing, just choose the CN2 GIA-E plan in the middle. The quarterly payment is $49.99, and you can switch between up to 12 computer rooms at will.CN2 (cheapest) | 1GB | 1 core | 20GB | 1TB | 1Gbps | DC3 CN2 DC8 ZNET | $49.99/year | Buy |
CN2 | 2GB | 1 core | 40GB | 2TB | 1Gbps | $52.99/half year $99.99/year | Buy | |
CN2 GIA-E (Most recommended) | 1GB | 2 cores | 20GB | 1TB | 2.5Gbps | DC6 CN2 GIA-E DC9 CN2 GIA Japan SoftBank JPOS_1 Netherlands EUNL_9 | $49.99/quarter $169.99/year | Buy |
CN2 GIA-E | 2GB | 3 core | 40GB | 2TB | 2.5Gbps | $89.99/quarter $299.99/year | Buy | |
HK | 2GB | 2 cores | 40GB | 0.5TB | 1Gbps | Hong Kong, China CN2 GIA | $89.99/month $899.99/year | Buy |
HK | 4GB | 4 core | 80GB | 1TB | 1Gbps | $155.99/month $1559.99/year | Buy | |
TOKYO | 2GB | 2 cores | 40GB | 0.5TB | 1.2Gbps | Tokyo, Japan CN2 GIA | $89.99/month $899.99/year | Buy |
TOKYO | 4GB | 4 core | 80GB | 1TB | 1.2Gbps | $155.99/month $1559.99/year | Buy | |
Recommended site searches: US virtual host purchase, space registration, forum host, cloud host rental, server rental US high defense, virtual host purchase, Ministry of Industry and Information Technology ICP registration, expired registration domain name query, foreign virtual space, station group server rental,
发表评论