HDFS 클러스터 환경 구성
1. 하둡 다운로드, 압축해제, 이름변경(on spark-master-01)
$ cd /dahy
$ wget https://archive.apache.org/dist/hadoop/common/hadoop-3.3.3/hadoop-3.3.3.tar.gz
$ tar xvfz hadoop-3.3.3.tar.gz
$ mv hadoop-3.3.3 hadoop3
2. hadoop-env.sh 설정(on spark-master-01)
$ vi /dahy/hadoop3/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/dahy/jdk8
3. core-site.xml 설정(on spark-master-01)
$ vi /dahy/hadoop3/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://spark-master-01:9000/</value>
</property>
<!-- static user.... --> <!-- Permission denied: user=dr.who, access=WRITE, inode="/":spark:supergroup:drwxr-xr-x -->
<property>
<name>hadoop.http.staticuser.user</name>
<value>spark</value>
</property>
</configuration>
4. hdfs-site.xml 설정(on spark-master-01)
$ vi /dahy/hadoop3/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///dahy/hadoop3/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///dahy/hadoop3/dfs/data</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file:///dahy/hadoop3/dfs/namesecondary</value>
</property>
</configuration>
5. yarn-site.xml 설정(on spark-master-01)
$ vi /dahy/hadoop3/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>spark-master-01</value>
</property>
<!-- port 변경이 필요할 경우.... (resourcemanager....) --> <!-- Stop attacks: get-shell(YARN) etc. -->
<property>
<name>yarn.resourcemanager.webapp.address</name>
<!--
<value>spark-master-01:8088</value>
-->
<value>spark-master-01:8188</value>
</property>
</configuration>
6. workers 설정(on spark-master-01)
$ vi /dahy/hadoop3/etc/hadoop/workers
spark-worker-01
spark-worker-02
spark-worker-03
7. jdk8, hadoop3 디렉토리를 worker서버 3곳에 복붙
# jdk8, hadoop3 디렉토리를 worker서버 3곳에 복붙
scp -r /dahy/jdk8 dahy@spark-worker-01:/dahy/
scp -r /dahy/jdk8 dahy@spark-worker-02:/dahy/
scp -r /dahy/jdk8 dahy@spark-worker-03:/dahy/
scp -r /dahy/hadoop3 dahy@spark-worker-01:/dahy/
scp -r /dahy/hadoop3 dahy@spark-worker-02:/dahy/
scp -r /dahy/hadoop3 dahy@spark-worker-03:/dahy/
# jps 명령어 세팅 복붙(source ~/.profile)
scp -r ~/.profile dahy@spark-worker-01:~/
scp -r ~/.profile dahy@spark-worker-02:~/
scp -r ~/.profile dahy@spark-worker-03:~/
8. 파일시스템 초기화(최초 한번)
$ /dahy/hadoop3/bin/hdfs namenode -format
SPARK 클러스터 구성
1. spark-env.sh 설정(on spark-master-01)
$ cp /dahy/spark3/conf/spark-env.sh.template /dahy/spark3/conf/spark-env.sh
$ vi /dahy/spark3/conf/spark-env.sh
JAVA_HOME=/dahy/jdk8
2. workers 설정(on spark-master-01)
$ cp /dahy/spark3/conf/workers.template /dahy/spark3/conf/workers
$ vi /dahy/spark3/conf/workers
spark-worker-01
spark-worker-02
spark-worker-03
3. worker에 복붙
scp -r /dahy/jdk8 dahy@spark-worker-01:/dahy/
scp -r /dahy/jdk8 dahy@spark-worker-02:/dahy/
scp -r /dahy/jdk8 dahy@spark-worker-03:/dahy/
scp -r /dahy/spark3 dahy@spark-worker-01:/dahy/
scp -r /dahy/spark3 dahy@spark-worker-02:/dahy/
scp -r /dahy/spark3 dahy@spark-worker-03:/dahy/
HDFS 명령어
1. 시작과 종료
$/dahy/hadoop3/sbin/start-dfs.sh
$ /dahy/hadoop3/sbin/stop-dfs.sh
2. 웹 UI
http://spark-master-01:9870
3. 프로세스
NameNode : master
SecondaryNameNode : master
DataNode : worker
4. 목록
$ /dahy/hadoop3/bin/hdfs dfs -ls /
5. 디렉토리 생성
$ /dahy/hadoop3/bin/hdfs dfs -mkdir -p /test/cli
6. 파일 업로드
$ /dahy/hadoop3/bin/hdfs dfs -put -f /dahy/jdk8 /test/cli/
7. 파일 다운로드
$ /dahy/hadoop3/bin/hdfs dfs -get /test/cli /dahy/
8. 복붙, 제거
$ /dahy/hadoop3/bin/hdfs dfs -mv /test/cli/jdk8/jre /test/cli/jdk8/../
$ /dahy/hadoop3/bin/hdfs dfs -rm -r -f /test
YARN 설정파일 구성(on spark-master-01)
$ mkdir -p /dahy/spark3/conf2
$ vi /dahy/spark3/conf2/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://spark-master-01:9000/</value>
</property>
</configuration>
$ vi /dahy/spark3/conf2/yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>spark-master-01</value>
</property>
<!-- port 변경이 필요할 경우.... (resourcemanager....) --> <!-- Stop attacks: get-shell(YARN) etc. -->
<property>
<name>yarn.resourcemanager.webapp.address</name>
<!--
<value>spark-master-01:8088</value>
-->
<value>spark-master-01:8188</value>
</property>
</configuration>
$ vi /dahy/hadoop3/etc/hadoop/capacity-scheduler.xml
<configuration>
....
<property>
<name>yarn.scheduler.capacity.resource-calculator</name>
<!--<value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value>-->
<value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
<description>
The ResourceCalculator implementation to be used to compare
Resources in the scheduler.
The default i.e. DefaultResourceCalculator only uses Memory while
DominantResourceCalculator uses dominant-resource to compare
multi-dimensional resources such as Memory, CPU etc.
</description>
</property>
....
</configuration>
capacity-scheduler.xml를 worker에 복붙
scp -r /dahy/hadoop3/etc/hadoop/capacity-scheduler.xml dahy@spark-worker-01:/dahy/hadoop3/etc/hadoop/
scp -r /dahy/hadoop3/etc/hadoop/capacity-scheduler.xml dahy@spark-worker-02:/dahy/hadoop3/etc/hadoop/
scp -r /dahy/hadoop3/etc/hadoop/capacity-scheduler.xml dahy@spark-worker-03:/dahy/hadoop3/etc/hadoop/
SPARK 환경설정
spark-env.sh 설정
$vi /dahy/spark3/conf/spark-env.sh
JAVA_HOME=/dahy/jdk8
SPARK_MASTER_PORT=7177 # default: 7077
SPARK_MASTER_WEBUI_PORT=8180 # default: 8080
SPARK_WORKER_PORT=7178 # default: random
SPARK_WORKER_WEBUI_PORT=8181 # default: 8081
SPARK_WORKER_CORES=8 # default: all available
SPARK_WORKER_MEMORY=8G # default: machine's total RAM minus 1 GiB
SPARK_PUBLIC_DNS=${HOSTNAME}
SPARK_MASTER_OPTS="-Dspark.deploy.defaultCores=5"
SPARK_LOCAL_DIRS=/dahy/spark3/local
scp -r /dahy/spark3/conf/spark-env.sh dahy@spark-worker-01:/dahy/spark3/conf/
scp -r /dahy/spark3/conf/spark-env.sh dahy@spark-worker-02:/dahy/spark3/conf/
scp -r /dahy/spark3/conf/spark-env.sh dahy@spark-worker-03:/dahy/spark3/conf/
구분 | 시작 | 종료 | 웹UI | 프로세스 |
HDFS | /dahy/hadoop3/sbin/start-dfs.sh | /dahy/hadoop3/sbin/stop-dfs.sh | http://spark-master-01:9870 | NameNode , DataNode |
YARN | /dahy/hadoop3/sbin/start-yarn.sh $ YARN_CONF_DIR=/dahy/spark3/conf2 ./bin/spark-shell --master yarn --executor-memory 4G --executor-cores 4 --num-executors 3 |
/dahy/hadoop3/sbin/stop-yarn.sh | http://spark-master-01:8188 (-> default port: 8088) Spark App 웹UI:http://spark-master-01:4040 |
ResourceManager, NodeManager, YarnCoarseGrainedExecutorBackend |
STANDALONE | /dahy/spark3/sbin/start-all.sh /dahy/spark3/sbin/start-master.sh --port 7177 --webui-port 8180 --host ${HOSTNAME} /dahy/spark3/sbin/start-worker.sh spark://spark-master-01:7177 --cores 8 --memory 8G --host ${HOSTNAME} |
/dahy/spark3/sbin/stop-all.sh /dahy/spark3/sbin/stop-master.sh /dahy/spark3/sbin/stop-worker.sh |
http://spark-master-01:8188 | Master, Worker, CoarseGrainedExecutorBackend |
제플린 | /dahy/zeppelin0/bin/zeppelin-daemon.sh restart | 9090 | ||
히스토리서버 | ./sbin/start-history-server.sh | /dahy/spark3/sbin/stop-history-server.sh | http://spark-master-01:18080/ |
HistoryServer |
댓글