본문 바로가기
data processing/hadoop

Hadoop 클러스터 환경 구성

by nothing-error 2022. 12. 28.

HDFS 클러스터 환경 구성

 

1. 하둡 다운로드, 압축해제, 이름변경(on spark-master-01)

$ cd /dahy
$ wget https://archive.apache.org/dist/hadoop/common/hadoop-3.3.3/hadoop-3.3.3.tar.gz
$ tar xvfz hadoop-3.3.3.tar.gz
$ mv hadoop-3.3.3 hadoop3

 

 

2. hadoop-env.sh 설정(on spark-master-01)

$ vi /dahy/hadoop3/etc/hadoop/hadoop-env.sh

export JAVA_HOME=/dahy/jdk8

 

3. core-site.xml 설정(on spark-master-01)

$ vi /dahy/hadoop3/etc/hadoop/core-site.xml

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://spark-master-01:9000/</value>
    </property>
    <!-- static user.... --> <!-- Permission denied: user=dr.who, access=WRITE, inode="/":spark:supergroup:drwxr-xr-x -->
    <property>
        <name>hadoop.http.staticuser.user</name>
        <value>spark</value>
    </property>
</configuration>

 

 

 

4. hdfs-site.xml 설정(on spark-master-01)

$ vi /dahy/hadoop3/etc/hadoop/hdfs-site.xml

<configuration>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:///dahy/hadoop3/dfs/name</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:///dahy/hadoop3/dfs/data</value>
    </property>
    <property>
        <name>dfs.namenode.checkpoint.dir</name>
        <value>file:///dahy/hadoop3/dfs/namesecondary</value>
    </property>
</configuration>

 

5. yarn-site.xml 설정(on spark-master-01)

$ vi /dahy/hadoop3/etc/hadoop/yarn-site.xml

<configuration>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>spark-master-01</value>
    </property>
    <!-- port 변경이 필요할 경우.... (resourcemanager....) --> <!-- Stop attacks: get-shell(YARN) etc. -->
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <!--
        <value>spark-master-01:8088</value>
        -->
        <value>spark-master-01:8188</value>
    </property>
</configuration>

 

 

6. workers 설정(on spark-master-01)

$ vi /dahy/hadoop3/etc/hadoop/workers

spark-worker-01
spark-worker-02
spark-worker-03

 

7. jdk8, hadoop3 디렉토리를 worker서버 3곳에 복붙

# jdk8, hadoop3 디렉토리를 worker서버 3곳에 복붙

scp -r /dahy/jdk8 dahy@spark-worker-01:/dahy/
scp -r /dahy/jdk8 dahy@spark-worker-02:/dahy/
scp -r /dahy/jdk8 dahy@spark-worker-03:/dahy/

scp -r /dahy/hadoop3 dahy@spark-worker-01:/dahy/
scp -r /dahy/hadoop3 dahy@spark-worker-02:/dahy/
scp -r /dahy/hadoop3 dahy@spark-worker-03:/dahy/



# jps 명령어 세팅 복붙(source ~/.profile)

scp -r ~/.profile dahy@spark-worker-01:~/
scp -r ~/.profile dahy@spark-worker-02:~/
scp -r ~/.profile dahy@spark-worker-03:~/

 

8. 파일시스템 초기화(최초 한번)

$ /dahy/hadoop3/bin/hdfs namenode -format

 

SPARK 클러스터 구성

1. spark-env.sh 설정(on spark-master-01)

$ cp /dahy/spark3/conf/spark-env.sh.template /dahy/spark3/conf/spark-env.sh
$ vi /dahy/spark3/conf/spark-env.sh

JAVA_HOME=/dahy/jdk8

2. workers 설정(on spark-master-01)

$ cp /dahy/spark3/conf/workers.template /dahy/spark3/conf/workers
$ vi /dahy/spark3/conf/workers

spark-worker-01
spark-worker-02
spark-worker-03

3. worker에 복붙

scp -r /dahy/jdk8 dahy@spark-worker-01:/dahy/
scp -r /dahy/jdk8 dahy@spark-worker-02:/dahy/
scp -r /dahy/jdk8 dahy@spark-worker-03:/dahy/

scp -r /dahy/spark3 dahy@spark-worker-01:/dahy/
scp -r /dahy/spark3 dahy@spark-worker-02:/dahy/
scp -r /dahy/spark3 dahy@spark-worker-03:/dahy/

 

HDFS 명령어

1. 시작과 종료

$/dahy/hadoop3/sbin/start-dfs.sh 

$ /dahy/hadoop3/sbin/stop-dfs.sh

 

2. 웹 UI

http://spark-master-01:9870

 

3. 프로세스 

NameNode : master

SecondaryNameNode : master

DataNode : worker

 

4. 목록

$ /dahy/hadoop3/bin/hdfs dfs -ls /

5. 디렉토리 생성
$ /dahy/hadoop3/bin/hdfs dfs -mkdir -p /test/cli

6. 파일 업로드
$ /dahy/hadoop3/bin/hdfs dfs -put -f /dahy/jdk8 /test/cli/


7. 파일 다운로드
$ /dahy/hadoop3/bin/hdfs dfs -get /test/cli /dahy/

8. 복붙, 제거
$ /dahy/hadoop3/bin/hdfs dfs -mv /test/cli/jdk8/jre /test/cli/jdk8/../
$ /dahy/hadoop3/bin/hdfs dfs -rm -r -f /test

 

 

 

YARN 설정파일 구성(on spark-master-01)

$ mkdir -p /dahy/spark3/conf2

$ vi /dahy/spark3/conf2/core-site.xml

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://spark-master-01:9000/</value>
    </property>
</configuration>

$ vi /dahy/spark3/conf2/yarn-site.xml

<configuration>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>spark-master-01</value>
    </property>
    <!-- port 변경이 필요할 경우.... (resourcemanager....) --> <!-- Stop attacks: get-shell(YARN) etc. -->
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <!--
        <value>spark-master-01:8088</value>
        -->
        <value>spark-master-01:8188</value>
    </property>
</configuration>

 

$ vi /dahy/hadoop3/etc/hadoop/capacity-scheduler.xml

 

<configuration>
....
    <property>
        <name>yarn.scheduler.capacity.resource-calculator</name>
        <!--<value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value>-->
        <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
        <description>
	The ResourceCalculator implementation to be used to compare
	Resources in the scheduler.
	The default i.e. DefaultResourceCalculator only uses Memory while
	DominantResourceCalculator uses dominant-resource to compare
	multi-dimensional resources such as Memory, CPU etc.
        </description>
    </property>
....
</configuration>

 

capacity-scheduler.xml를 worker에 복붙

scp -r /dahy/hadoop3/etc/hadoop/capacity-scheduler.xml dahy@spark-worker-01:/dahy/hadoop3/etc/hadoop/
scp -r /dahy/hadoop3/etc/hadoop/capacity-scheduler.xml dahy@spark-worker-02:/dahy/hadoop3/etc/hadoop/
scp -r /dahy/hadoop3/etc/hadoop/capacity-scheduler.xml dahy@spark-worker-03:/dahy/hadoop3/etc/hadoop/

 

 

 

SPARK 환경설정

spark-env.sh 설정

$vi /dahy/spark3/conf/spark-env.sh

JAVA_HOME=/dahy/jdk8
SPARK_MASTER_PORT=7177  # default: 7077
SPARK_MASTER_WEBUI_PORT=8180  # default: 8080
SPARK_WORKER_PORT=7178  # default: random
SPARK_WORKER_WEBUI_PORT=8181  # default: 8081
SPARK_WORKER_CORES=8  # default: all available
SPARK_WORKER_MEMORY=8G  # default: machine's total RAM minus 1 GiB
SPARK_PUBLIC_DNS=${HOSTNAME}
SPARK_MASTER_OPTS="-Dspark.deploy.defaultCores=5"
SPARK_LOCAL_DIRS=/dahy/spark3/local

scp -r /dahy/spark3/conf/spark-env.sh dahy@spark-worker-01:/dahy/spark3/conf/
scp -r /dahy/spark3/conf/spark-env.sh dahy@spark-worker-02:/dahy/spark3/conf/
scp -r /dahy/spark3/conf/spark-env.sh dahy@spark-worker-03:/dahy/spark3/conf/

 

 

 

구분 시작 종료 웹UI 프로세스
HDFS /dahy/hadoop3/sbin/start-dfs.sh  /dahy/hadoop3/sbin/stop-dfs.sh http://spark-master-01:9870 NameNode , DataNode
YARN /dahy/hadoop3/sbin/start-yarn.sh


$ YARN_CONF_DIR=/dahy/spark3/conf2 ./bin/spark-shell --master yarn --executor-memory 4G --executor-cores 4 --num-executors 3
/dahy/hadoop3/sbin/stop-yarn.sh http://spark-master-01:8188 (-> default port: 8088)

Spark App 웹UI:http://spark-master-01:4040



ResourceManager,
NodeManager,

YarnCoarseGrainedExecutorBackend
STANDALONE /dahy/spark3/sbin/start-all.sh

/dahy/spark3/sbin/start-master.sh --port 7177 --webui-port 8180 --host ${HOSTNAME}

/dahy/spark3/sbin/start-worker.sh spark://spark-master-01:7177 --cores 8 --memory 8G  --host ${HOSTNAME}
/dahy/spark3/sbin/stop-all.sh

/dahy/spark3/sbin/stop-master.sh

/dahy/spark3/sbin/stop-worker.sh
http://spark-master-01:8188 Master, Worker,

CoarseGrainedExecutorBackend 
제플린 /dahy/zeppelin0/bin/zeppelin-daemon.sh restart   9090  
히스토리서버 ./sbin/start-history-server.sh /dahy/spark3/sbin/stop-history-server.sh
http://spark-master-01:18080/
HistoryServer
         

 

댓글