一、版本建议
Centos | V7.5 |
Java | V1.8 |
Hadoop | V2.7.6 |
Hive | V2.3.3 |
Mysql | V5.7 |
Spark | V2.3 |
Scala | V2.12.6 |
Flume | V1.80 |
Sqoop | V1.4.5 |
二、Hadoop
JDK地址:
Hadoop地址:
Hive地址:
Spark地址:
Scala地址:
Flume地址:
HBase地址:
sqoop地址:
三、修改IP
#临时ip设置
$ ifconfig eth0 192.168.116.100 netmask 255.255.255.0
$ vim /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0BOOTPROTO=static #####HWADDR=00:0C:29:3C:BF:E7IPV6INIT=yesNM_CONTROLLED=yesONBOOT=yes ###TYPE=Ethernet##UUID=ce22eeca-ecde-4536-8cc2-ef0dc36d4a8cIPADDR=192.168.116.100 ###NETMASK=255.255.255.0 ###GATEWAY=192.168.116.2 ###DNS1=219.141.136.10 ###
# 网卡重启
$ service network restart
四、Centos基本组件安装
$ yum install net-tools.x86_64 vim* wget.x86_64 ntp -y
五、修改主机名
$ vi /etc/sysconfig/network
NETWORKING=yesHOSTNAME=master
#修改hosts文件
$ vi /etc/hosts
#添加新的一行内容(注意:ip为自己本机的ip地址,比如192.168.116.100)192.168.116.100 master
六、关闭防火墙
$ systemctl stop firewalld.service #停止firewall
$ systemctl disable firewalld.service #禁止firewall开机启动
$ service iptables status #查看防火墙状态
$ service iptables stop #关闭防火墙
$ chkconfig iptables --list #查看防火墙开机启动状态
$ chkconfig iptables off #关闭防火墙开机启动
七、ssh免密码
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
# 验证配置
$ ssh master
八、安装JDK
1、解压jdk
#创建文件夹
$ mkdir -p /home/hadoop/opt
#解压
$ tar -zxvf jdk-8u181-linux-x64.tar.gz -C /home/hadoop/opt
2、将java添加到环境变量中
$ vim /etc/profile
#在文件最后添加export JAVA_HOME=/home/hadoop/opt/jdk1.8.0_181export PATH=$PATH:$JAVA_HOME/bin
#刷新配置
$ source /etc/profile
九、Hadoop组件安装
1、解压
# 创建软件下载目录
$ mkdir -p /home/hadoop/opt
$ cd /home/hadoop/opt
# 解压
$ tar -zxvf hadoop-2.7.6.tar.gz -C /home/hadoop/opt
2、环境变量配置
$ vi /etc/profile
#在文件最后添加export HADOOP_HOME=/home/hadoop/opt/hadoop-2.7.6export HADOOP_CONF_DIR=/home/hadoop/opt/hadoop-2.7.6/etc/hadoopexport YARN_CONF_DIR=/home/hadoop/opt/hadoop-2.7.6/etc/hadoopexport PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
#刷新配置
$ source /etc/profile
3、修改配置/home/hadoop/opt/hadoop-2.7.6/etc/hadoop/hadoop-env.sh
$ vi /home/hadoop/opt/hadoop-2.7.6/etc/hadoop/hadoop-env.sh
修改该行内容 export JAVA_HOME=/home/hadoop/opt/jdk1.8.0_181
4、修改配置文件,在目录/home/hadoop/opt/hadoop-2.7.6/etc/hadoop/下创建目录
$ mkdir -p /home/hadoop/opt/hadoop-2.7.6/hdfs_tmp
$ mkdir -p /home/hadoop/opt/hadoop-2.7.6/hdfs/name
$ mkdir -p /home/hadoop/opt/hadoop-2.7.6/hdfs/data
$ cd /home/hadoop/opt/hadoop-2.7.6/etc/hadoop/
5、编辑配置信息文件
$ vi core-site.xml
fs.defaultFS hdfs://master:9000 hadoop.tmp.dir /home/hadoop/opt/hadoop-2.7.6/hdfs_tmp io.file.buffer.size 4096
$ vi hdfs-site.xml
dfs.namenode.name.dir /home/hadoop/opt/hadoop-2.7.6/hdfs/name dfs.datanode.data.dir /home/hadoop/opt/hadoop-2.7.6/hdfs/data dfs.replication 2 dfs.namenode.secondary.http-address master:9001 dfs.webhdfs.enabled true
$ cp mapred-site.xml.template mapred-site.xml
$ vi mapred-site.xml
configuration>mapreduce.framework.name yarn mapreduce.jobhistory.address master:10020 mapreduce.jobhistory.webapp.address master:19888
$ vi yarn-site.xml (yarn的配置安装系统为4G内存配置)
yarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.auxservices.mapreduce.shuffle.class org.apache.hadoop.mapred.ShuffleHandler yarn.resourcemanager.address master:8032 yarn.resourcemanager.scheduler.address master:8030 yarn.resourcemanager.resource-tracker.address master:8031 yarn.resourcemanager.admin.address master:8033 yarn.log-aggregation-enable true yarn.log-aggregation.retain-seconds 86400 yarn.resourcemanager.webapp.address master:8088 yarn.log.server.url http://master:19888/jobhistory/logs yarn.scheduler.maximum-allocation-mb 3072 yarn.scheduler.minimum-allocation-mb 1024 yarn.nodemanager.resource.memory-mb 3072 yarn.nodemanager.resource.cpu-vcores 1 mapreduce.map.memory.mb 1024 mapreduce.map.java.opts -Xmx819m mapreduce.reduce.memory.mb 2048 mapreduce.reduce.java.opts -Xmx1638m yarn.app.mapreduce.am.resource.mb 2048 yarn.app.mapreduce.am.command-opts -Xmx1638m mapreduce.task.io.sort.mb 409 mapreduce.job.ubertask.enable true yarn.nodemanager.pmem-check-enabled false yarn.nodemanager.vmem-check-enabled false
$ vi slaves
master
6、Hadoop初始化并启动集群
# 初始化namenode datanode
$ cd /home/hadoop/opt/hadoop-2.7.6/
$ bin/hdfs namenode -format
# 启动namenode datanode
$ sbin/start-dfs.sh
# 关闭namenode datanode
$ sbin/stop-dfs.sh
#启动Yarn资源服务
$ sbin/start-yarn.sh
# 关闭 Yarn资源服务
$ sbin/stop-yarn.sh
# 测试hdfs 和 MapReduce
$ cd /home/hadoop/opt/hadoop-2.7.6/
$ bin/hdfs dfs -mkdir /input
$ bin/hdfs dfs -mkdir /test
$ bin/hdfs dfs -put etc/hadoop /input
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.6.jar grep /input_test /output6 'dfs[a-z.]+'
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.6.jar wordcount /input.txt /output3
# 查询hdfs 文件
方式一:
$ bin/hdfs dfs -get output output
$ cat output/*
方式二:
$ bin/hdfs dfs -cat output/*
7、网页测试访问
Namenode:
ResourceManager : http://192.168.116.100:8088/
十、Hive组件安装
1、安装mysql
参照:
2、解压
$ tar -zxvf apache-hive-2.3.3-bin.tar.gz -C /home/hadoop/opt
3、环境变量配置
$ vi /etc/profile
#在文件最后添加export HIVE_HOME=/home/hadoop/opt/apache-hive-2.3.3-binexport PATH=$PATH:$HIVE_HOME/bin
#刷新配置
$ source /etc/profile
4、在$HIVE_HOME/conf目录下编辑hive-site.xml
$ cd /home/hadoop/opt/apache-hive-2.3.3-bin/conf
$ cp hive-default.xml.template hive-site.xml
$ vi hive-site.xml
javax.jdo.option.ConnectionURL jdbc:mysql://192.168.116.100:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8 javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver javax.jdo.option.ConnectionUserName root javax.jdo.option.ConnectionPassword root hive.metastore.uris thrift://master:9083 hive.server2.thrift.bind.host 192.168.116.100 hive.server2.thrift.port 10000
$ vi hive-env.sh
#修改HADOOP_HOMEHADOOP_HOME=/home/hadoop/opt/hadoop-2.7.6
5、放入数据库驱动
下载mysql-connector-java-5.1.39-bin.jar 包,复制放到/home/hadoop/opt/hive/lib目录下就可以了
6、在hive2.0以后的版本,初始化hive指令
$ schematool -dbType mysql -initSchema
7、测试demo
导数据创建表# hive_data.txt1,test01,23,address012,test02,45,address023,test03,8,addresss01$ hive$ create table test(id string,name string ,addr string) row format delimited fields terminated by ',';$ LOAD DATA LOCAL INPATH '/home/hadoop/opt/hive_data.txt' INTO TABLE test;
8、hive的远程连接
#编辑hadoop的core-site.xml文件
$ vi /home/hadoop/opt/hadoop-2.7.6/etc/hadoop/core-site.xml
#添加如下内容hadoop.proxyuser.root.hosts * #若提示 hadoop.proxyuser.root.groups * hadoop.proxyuser.hadoop.groups hadoop Allow the superuser oozie to impersonate any members of the group group1 and group2 hadoop.proxyuser.hadoop.hosts 192.168.116.100,127.0.0.1,localhost The superuser can connect only from host1 and host2 to impersonate a user
9、启动服务
$ nohup hive --service metastore > metastore.log 2>&1 &
$ nohup hive --service hiveserver2 > hiveserver2.log 2>&1 &
#测试
$ beeline
beeline> !connect jdbc:hive2://localhost:10000 user pwd
sql> show databases;
十一、Hbase组件安装
1、配置环境变量(解压略)
$ vi /etc/profile
export HBASE_HOME=/home/hadoop/opt/hbase-1.2.6export PATH=$HBASE_HOME/bin:$PATH
2、配置hbase-env.sh
$ cd /home/hadoop/opt/hbase-1.2.6/conf
$ vi hbase-env.sh
export JAVA_HOME=/home/hadoop/opt/jdk1.8.0_181export HBASE_MANAGES_ZK=true
3、配置hbase-site.xml
$ cd /home/hadoop/opt/hbase-1.2.6/conf
$ vi hbase-site.xml
hbase.rootdir hdfs://master:9000/hbase hbase.master.info.port 60010 dfs.replication 1
4、启动Hbase
$ cd /home/hadoop/opt/hbase-1.2.6/bin
$ ./start-hbase.sh
5、hbase shell
$ cd /home/hadoop/opt/hbase-1.2.6/bin
$ hbase shell
名称 | 命令表达式 |
创建表 | create '表名称', '列族名称1','列族名称2','列族名称N' |
添加记录 | put '表名称', '行名称', '列名称:', '值' |
查看记录 | get '表名称', '行名称' |
查看表中的记录总数 | count '表名称' |
删除记录 | delete '表名' ,'行名称' , '列名称' |
删除一张表 | 先要屏蔽该表,才能对该表进行删除,第一步 disable '表名称' 第二步 drop '表名称' |
查看所有记录 | scan "表名称" |
查看某个表某个列中所有数据 | scan "表名称" , {COLUMNS=>'列族名称:列名称'} |
更新记录 | 就是重写一遍进行覆盖 |
#创建表>create 'users','user_id','address','info'#表users,有三个列族user_id,address,info#列出全部表>list #得到表的描述>describe 'users'#创建表>create 'users_tmp','user_id','address','info'#删除表>disable 'users_tmp'>drop 'users_tmp'#添加数据put 'users','xiaoming','info:age','24';put 'users','xiaoming','info:birthday','1987-06-17';put 'users','xiaoming','info:company','alibaba';put 'users','xiaoming','address:contry','china';put 'users','xiaoming','address:province','zhejiang';put 'users','xiaoming','address:city','hangzhou';put 'users','zhangyifei','info:birthday','1987-4-17';put 'users','zhangyifei','info:favorite','movie';put 'users','zhangyifei','info:company','alibaba';put 'users','zhangyifei','address:contry','china';put 'users','zhangyifei','address:province','guangdong';put 'users','zhangyifei','address:city','jieyang';put 'users','zhangyifei','address:town','xianqiao';#取得一个id的所有数据>get 'users','xiaoming'#获取一个id,一个列族的所有数据>get 'users','xiaoming','info'#获取一个id,一个列族中一个列的所有数据get 'users','xiaoming','info:age'#更新记录>put 'users','xiaoming','info:age' ,'29'>get 'users','xiaoming','info:age'>put 'users','xiaoming','info:age' ,'30'>get 'users','xiaoming','info:age'#获取单元格数据的版本数据>get 'users','xiaoming',{COLUMN=>'info:age',VERSIONS=>1}>get 'users','xiaoming',{COLUMN=>'info:age',VERSIONS=>2}>get 'users','xiaoming',{COLUMN=>'info:age',VERSIONS=>3}#获取单元格数据的某个版本数据〉get 'users','xiaoming',{COLUMN=>'info:age',TIMESTAMP=>1364874937056}#全表扫描>scan 'users'#删除xiaoming值的'info:age'字段>delete 'users','xiaoming','info:age'>get 'users','xiaoming'#删除整行>deleteall 'users','xiaoming'#统计表的行数>count 'users'#清空表>truncate 'users'
#退出quit #查看表状态(启用/未启用)>exists 'users'>is_enabled 'users'>is_disabled 'users
十二、Flume组件安装
1、解压
$ tar -zxvf apache-flume-1.8.0-bin.tar.gz
2、配置Flume环境变量
$ vi /etc/profileexport FLUME_HOME=/home/hadoop/opt/apache-flume-1.8.0-binexport PATH=$PATH:$FLUME_HOME/bin
3、Flume配置文件修改
$ cd /home/hadoop/opt/apache-flume-1.8.0-bin/conf
$ cp flume-env.sh.template flume-env.sh
$ cp flume-conf.properties.template flume-conf.properties
$ vi flume-env.sh
export JAVA_HOME=/home/hadoop/opt/jdk1.8.0_181
# 验证
$ flume-ng version
4、案例一
1)、增加配置文件example.conf
$ yum install telnet-server.x86_64 -y$ yum -y install xinetd telnet telnet-server $ mkdir -p /home/hadoop/opt/testdata$ cd /home/hadoop/opt/testdata$ vi example.conf# example.conf: A single-node Flume configuration# Name the components on this agenta1.sources = r1a1.sinks = k1a1.channels = c1# Describe/configure the sourcea1.sources.r1.type = netcat# a1.sources.r1.bind = 192.168.116.100a1.sources.r1.bind = localhosta1.sources.r1.port = 44444# Describe the sinka1.sinks.k1.type = logger# Use a channel which buffers events in memorya1.channels.c1.type = memorya1.channels.c1.capacity = 1000a1.channels.c1.transactionCapacity = 100a1.sources.r1.channels = c1a1.sinks.k1.channel = c1
2)、启动服务
$ flume-ng agent -c /home/hadoop/opt/apache-flume-1.8.0-bin/conf -f /home/hadoop/opt/testdata/example.conf -n a1 -Dflume.root.logger=INFO,console
3)、Client发送Message
$ telnet localhost 44444五、案例二
1)、准备数据文件
$ mkdir -p /home/hadoop/opt/testdata/avro$ cd /home/hadoop/opt/testdata/avro$ vi avro_data.txt1,test01,23,address012,test02,45,address023,test03,8,addresss01
2)、spool1.conf
$ cd /home/hadoop/opt/testdata$ vi spool1.conf# Name the components on this agent#agent名, source、channel、sink的名称a1.sources = r1a1.channels = c1a1.sinks = k1##具体定义sourcea1.sources.r1.type = spooldira1.sources.r1.spoolDir = /home/hadoop/opt/testdata/avro#具体定义channela1.channels.c1.type = memorya1.channels.c1.capacity = 10000a1.channels.c1.transactionCapacity = 100#具体定义sinka1.sinks.k1.type = hdfsa1.sinks.k1.hdfs.path = hdfs://master:9000/flume/%Y%m%da1.sinks.k1.hdfs.filePrefix = events-a1.sinks.k1.hdfs.fileType = DataStreama1.sinks.k1.hdfs.useLocalTimeStamp = true#不按照条数生成文件a1.sinks.k1.hdfs.rollCount = 0#HDFS上的文件达到128M时生成一个文件a1.sinks.k1.hdfs.rollSize = 134217728#HDFS上的文件达到60秒生成一个文件a1.sinks.k1.hdfs.rollInterval = 60#组装source、channel、sinka1.sources.r1.channels = c1a1.sinks.k1.channel = c1
3)、demo运行指令
# 启动服务
$ flume-ng agent -c /home/hadoop/opt/apache-flume-1.8.0-bin/conf -f /home/hadoop/opt/testdata/spool1.conf -n a1# 新开一个窗口,传输数据$ cp /home/hadoop/opt/testdata/avro/avro_data.txt.COMPLETED /home/hadoop/opt/testdata/avro/avro_data04.txt十三、Sqoop组件安装
1、解压
$ tar -zxvf sqoop-1.4.5.bin__hadoop-2.0.4-alpha.tar.gz2、配置环境变量$ mv sqoop-1.4.5.bin__hadoop-2.0.4-alpha sqoop-1.4.5$ vim /etc/profileexport SQOOP_HOME=/home/hadoop/opt/sqoop-1.4.5export PATH=$PATH:$SQOOP_HOME/bin
$ cd /home/hadoop/opt/sqoop-1.4.5/conf
$ cp sqoop-env-template.sh sqoop-env.sh$ vi sqoop-env.shexport HADOOP_COMMON_HOME=/home/hadoop/opt/hadoop-2.7.6export HADOOP_MAPRED_HOME=/home/hadoop/opt/hadoop-2.7.6export HIVE_HOME=/home/hadoop/opt/apache-hive-2.3.3-binexport HBASE_HOME=/home/hadoop/opt/hbase-1.2.6
3、拷贝mysql驱动包到/home/hadoop/opt/sqoop-1.4.5/lib下
4、进入Mysql数据库创建表
Create database test;Use test;create table data(id varchar(32),name varchar(32),addr varchar(32));insert into data(id,name,addr) values('test01','23','address01');insert into data(id,name,addr) values('test02','45','address02');insert into data(id,name,addr) values('test03','8','address01');
5、导入导出指令
1)、复制sqoop-1.4.5.jar到lib目录下$ cd /home/hadoop/opt/sqoop-1.4.5$ cp sqoop-1.4.5.jar lib/2)、执行指令
#导入命令$ sqoop import --connect jdbc:mysql://192.168.116.100:3306/test?characterEncoding=utf-8 --username root --password '123456' --table data --hive-import --create-hive-table --hive-table hivetest --fields-terminated-by ',' -m 1 --hive-overwrite#查看导入hive的数据
$ hdfs dfs -cat /user/hive/warehouse/hivetest/part-m-00000#导出命令
$ sqoop export --connect jdbc:mysql://192.168.116.100:3306/test --username root --password '123456' --table dataFromHDFS --export-dir /user/hive/warehouse/hivetest/part-m-00000 --input-fields-terminated-by ','十四、Scala安装
1. 解压
$ cd /home/hadoop/opt$ tar -zxvf scala-2.12.6.tgz2. 配置环境变量$ vi /etc/profileexport SCALA_HOME=/home/hadoop/opt/scala-2.12.6export PATH=$SCALA_HOME/bin:$SPARK_HOME/bin:$PATH
$ source /etc/profile
十五、 Spark组件安装
1、修改$SPARK_HOME/conf/spark-env.sh(解压略)
$ cd /home/hadoop/opt/spark-2.3.1-bin-hadoop2.7/conf$ cp spark-env.sh.template spark-env.sh$ hdfs dfs -mkdir -p /spark/historyLog$ vi spark-env.shexport SPARK_MASTER_IP=masterexport SPARK_MASTER_PORT=7077export JAVA_HOME=/home/hadoop/opt/jdk1.8.0_181export HADOOP_HOME=/home/hadoop/opt/hadoop-2.7.6export SCALA_HOME=/home/hadoop/opt/scala-2.12.6export SPARK_HOME=/home/hadoop/opt/spark-2.3.1-bin-hadoop2.7export HADOOP_CONF_DIR=/home/hadoop/opt/hadoop-2.7.6/etc/hadoopexport YARN_CONF_DIR=/home/hadoop/opt/hadoop-2.7.6/etc/hadoopexport SPARK_WORKER_MEMORY=1Gexport SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=100 -Dspark.history.fs.logDirectory=hdfs://master:9000/spark/historyLog"
2、修改$SPARK_HOME/conf/spark-defaults.conf
$ cp spark-defaults.conf.template spark-defaults.conf$ vi spark-defaults.confspark.master yarnspark.deploy.mode clusterspark.yarn.historyServer.address master:18080spark.history.ui.port 18080spark.eventLog.enabled truespark.eventLog.dir hdfs://master:9000/spark/historyLogspark.history.fs.logDirectory hdfs://master:9000/spark/historyLogspark.eventLog.compress truespark.executor.instances 1spark.worker.cores 1spark.worker.memory 1Gspark.eventLog.enabled truespark.serializer org.apache.spark.serializer.KryoSerializer
3、启动spark
$ cd /home/hadoop/opt/spark-2.3.1-bin-hadoop2.7/sbin$ ./start-all.sh$ cd /home/hadoop/opt/spark-2.3.1-bin-hadoop2.7/$ bin/spark-submit --master spark://master:7077 --deploy-mode client --class org.apache.spark.examples.SparkPi examples/jars/spark-examples_2.11-2.3.1.jar 100