zookeeper + kafka + storm 集群搭建

.首先：需要三台测试机器(由于zookeeper 的选举机制，官方推荐是3台，并且是奇数台机器，{1台机器多个端口也可以})

192.168.12.28

192.168.12.151

192.168.12.152

环境及版本

jdk ： java version "1.7.0_79"

os : fedora --x86_64-22-3

zookeeper :3.4.6

kafka:2.11-0.9.0.0

storm:0.10.0

使用：连续加号（+++++）分隔配置文件内容和正文

1.搭建zookeeper集群

先到apache 的zookeeper 项目中下载包

文档地址：http://zookeeper.apache.org/doc/trunk/zookeeperStarted.html

包地址：http://www.apache.org/dyn/closer.cgi/zookeeper/

3.4.6 url：http://mirror.bit.edu.cn/apache/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz

下载包到测试机，解压 tar -zxvf zookeeper-3.4.6.tar.gz

先进入conf 目录配置 zoo.cfg，如下

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/usr/local/zookeeper-3.4.6/data
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients

#这连接客户端包括（比如kafka。strom等连接，所以请注意这个连接数不要太小，导致部署失败，或者客户端连接失败）
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature192
#autopurge.purgeInterval=1

##这是zookeeper 机集群地址。第一个端口是集群之间通信的端口(监听端口，和通信端口和选举端口不能重复，否则报错地址已用)，第二个是选举leader时使用的
server.1=192.168.12.28:2888:3888
server.2=192.168.12.151:2888:3888
server.3=192.168.12.152:2888:3888

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

按这个配置，配置3台测试机器

到bin 目录启动zookeeper 集群：

./zkServer.sh start

查看集群状态

./zkServer.sh status

mode：leader 说明他是leader 否则是follower

leader 挂掉后，集群会自动选举新的leader

在3台机器重复此操作

使用client 连接zookeeper集群（集群中启动的任意一台机器都可以）

./zkCli.sh --server192.168.12.28:2181

ls / 查看根目录

create /test this is test dir 创建目录

到此，zookeeper 集群搭建完毕

这是一写zookeeper 的配置信息

broker.id	整数，建议根据ip区分
log.dirs	kafka存放消息文件的路径，	默认/tmp/kafka-logs
port	broker用于接收producer消息的端口
zookeeper.connnect	zookeeper连接	格式为 ip1:port,ip2:port,ip3:port
message.max.bytes	单条消息的最大长度
num.network.threads	broker用于处理网络请求的线程数	如不配置默认为3，server.properties默认是2
num.io.threads	broker用于执行网络请求的IO线程数	如不配置默认为8，server.properties默认是2可适当增大，
queued.max.requests	排队等候IO线程执行的requests	默认为500
host.name	broker的hostname	默认null,建议写主机的ip,不然消费端不配置hosts会有麻烦
num.partitions	topic的默认分区数	默认1
log.retention.hours	消息被删除前保存多少小时	默认1周168小时
auto.create.topics.enable	是否可以程序自动创建Topic	默认true,建议false
default.replication.factor	消息备份数目	默认1不做复制，建议修改
num.replica.fetchers	用于复制leader消息到follower的IO线程数	默认1

2.搭建 kafka 集群

文档地址：http://kafka.apache.org/documentation.html#quickstart

包地址：

tar -xzf kafka_2.11-0.9.0.0.tgz

修改 config/server.properties

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

# The id of the broker. This must be set to a unique integer for each broker.

##必须唯一

broker.id=0

############################# Socket Server Settings #############################

#客户端连接的时候请按照此地址连接，同一个地址，不同表示方式会导致生产和消费的使用异常

listeners=PLAINTEXT://192.168.12.28:9092

# The port the socket server listens on

##客户端连接kafka的端口

#port=9092

# Hostname the broker will bind to. If not set, the server will bind to all interfaces

#host.name=localhost

# Hostname the broker will advertise to producers and consumers. If not set, it uses the

# value for "host.name" if configured. Otherwise, it will use the value returned from

# java.net.InetAddress.getCanonicalHostName().

#advertised.host.name=<hostname routable by clients>

# The port to publish to ZooKeeper for clients to use. If this is not set,

# it will publish the same port that the broker binds to.

#advertised.port=<port accessible by clients>

# The number of threads handling network requests

num.network.threads=3

# The number of threads doing disk I/O

num.io.threads=8

# The send buffer (SO_SNDBUF) used by the socket server

socket.send.buffer.bytes=102400

# The receive buffer (SO_RCVBUF) used by the socket server

socket.receive.buffer.bytes=102400

# The maximum size of a request that the socket server will accept (protection against OOM)

socket.request.max.bytes=104857600

############################# Log Basics #############################

# A comma seperated list of directories under which to store log files

//这个不要设置到机器的临时目录，否则启动可能会报错

log.dirs=/usr/local/kafka_2.11-0.9.0.0/data

# The default number of log partitions per topic. More partitions allow greater

# parallelism for consumption, but this will also result in more files across

# the brokers.

num.partitions=1

# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.

# This value is recommended to be increased for installations with data dirs located in RAID array.

num.recovery.threads.per.data.dir=1

############################# Log Flush Policy #############################

# Messages are immediately written to the filesystem but by default we only fsync() to sync

# the OS cache lazily. The following configurations control the flush of data to disk.

# There are a few important trade-offs here:

# 1. Durability: Unflushed data may be lost if you are not using replication.

# 2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.

# 3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks.

# The settings below allow one to configure the flush policy to flush data after a period of time or

# every N messages (or both). This can be done globally and overridden on a per-topic basis.

# The number of messages to accept before forcing a flush of data to disk

#log.flush.interval.messages=10000

# The maximum amount of time a message can sit in a log before we force a flush

#log.flush.interval.ms=1000

############################# Log Retention Policy #############################

# The following configurations control the disposal of log segments. The policy can

# be set to delete segments after a period of time, or after a given size has accumulated.

# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens

# from the end of the log.

# The minimum age of a log file to be eligible for deletion

log.retention.hours=168

# A size-based retention policy for logs. Segments are pruned from the log as long as the remaining

# segments don't drop below log.retention.bytes.

#log.retention.bytes=1073741824

# The maximum size of a log segment file. When this size is reached a new log segment will be created.

log.segment.bytes=1073741824

# The interval at which log segments are checked to see if they can be deleted according

# to the retention policies

log.retention.check.interval.ms=300000

# By default the log cleaner is disabled and the log retention policy will default to just delete segments after their retention expires.

# If log.cleaner.enable=true is set the cleaner will be enabled and individual logs can then be marked for log compaction.

log.cleaner.enable=false

############################# Zookeeper #############################

# Zookeeper connection string (see zookeeper docs for details).

# This is a comma separated host:port pairs, each corresponding to a zk

# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".

# You can also append an optional chroot string to the urls to specify the

# root directory for all kafka znodes.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

##kafka 是基于 zookeeper 的，保存kafka的数据信息、配置,读取偏移等

zookeeper.connect=192.168.12.28:2181,192.168.12.151:2181,192.168.12.152:2181

# Timeout in ms for connecting to zookeeper

zookeeper.connection.timeout.ms=6000

把此配置应用到3台测试机，注意：broker.id不能唯一

进入 bin 目录

启动 kafka （后面的参数是kafka 的配置文件目录，启动失败会立即报错）

./kafka-server-start.sh ../config/server.properties

启动3台kafka集群

测试kafka集群：

先创建一个test主题，

./kafka-topics.sh --create --zookeeper 192.168.12.28:2181 --replication-factor 1 --partitions 1 --topic test

查看创建的主题信息

./kafka-topics.sh --zookeeper 192.168.12.28:2181 --describe --topic testtopic

====================================================================

Topic:testtopic PartitionCount:1 ReplicationFactor:1 Configs:

Topic: testtopic Partition: 0 Leader: 4 Replicas: 4 Isr: 4

====================================================================

Partition : 分区

L eader ：负责读写指定分区的节点

Replicas ：复制该分区log的节点列表

Isr ： "in-sync" replicas，当前活跃的副本列表（是一个子集），并且可能成为Leader

通过Kafka自带的bin/kafka-console-producer.sh和bin/kafka-console-consumer.sh脚本，来验证演示如果发布消息、消费消息。
在一个终端，启动Producer，并向我们上面创建的名称为testtopic的Topic中生产消息，执行如下脚本：

bin/kafka-console-producer.sh --broker-list 192.168.12.28:9092,192.168.12.151:9092,192.168.12.152:9092 --topic testtopic

在另一个终端，启动Consumer，并订阅我们上面创建的名称为testtopic5的Topic中生产的消息，执行如下脚本

bin/kafka-console-consumer.sh --zookeeper 192.168.12.28:2181,192.168.12.151:2181,192.168.12.152:2181 --from-beginning --topic testtopic

可以在Producer终端上输入字符串消息行，然后回车(一行一条数据)，就可以在Consumer终端上看到消费者消费的消息内容。
也可以参考Kafka的Producer和Consumer的Java API，通过API编码的方式来实现消息生产和消费的处理逻辑。

到此，kafka集群搭建完毕（具体详细的参数配置请查看文档）

3.搭建storm 集群

文档地址：http://storm.apache.org/documentation.html

包地址： http://storm.apache.org/downloads.html

0.10.0 ： http://124.202.164.11/files/4168000007207070/mirrors.cnnic.cn/apache/storm/apache-storm-0.10.0/apache-storm-0.10.0.tar.gz

tar -zxvf apache-storm-0.10.0.tar.gz

cd apache-storm-0.10.0/conf

修改配置 storm.yaml

1)storm 依赖 zookeeper

如果Zookeeper集群使用的不是默认端口，那么还需要storm.zookeeper.port选项。

2) storm.local.dir: Nimbus和Supervisor进程用于存储少量状态，如jars、confs等的本地磁盘目录，需要提前创建该目录并给以足够的访问权限。然后在storm.yaml中配置该目录，如：

storm.local.dir: "/home/admin/storm/workdir"

3) java.library.path: Storm使用的本地库（ZMQ和JZMQ）加载路径，默认为”/usr/local/lib:/opt/local/lib:/usr/lib”，一般来说ZMQ和JZMQ默认安装在/usr/local/lib 下，因此不需要配置即可。

4) nimbus.host: Storm集群Nimbus机器地址(存在单点问题)，各个Supervisor工作节点需要知道哪个机器是Nimbus，以便下载Topologies的jars、confs等文件

5) supervisor.slots.ports: 对于每个Supervisor工作节点，需要配置该工作节点可以运行的worker数量。每个worker占用一个单独的端口用于接收消息，该配置选项即用于定义哪些端口是可被worker使用的。默认情况下，每个节点上可运行4个workers，分别在6700、6701、6702和6703端口，如：supervisor.slots.ports:- 6700- 6701- 6702- 6703

+++++++++++++++++++++++++++++++++++++++++++++++

########### These MUST be filled in for a storm configuration

storm.zookeeper.servers:

- "192.168.12.28"

- "192.168.12.151"

- "192.168.12.152"

nimbus.host: "192.168.12.28"

storm.local.dir: "/usr/local/apache-storm-0.10.0/workdata"

supervisor.slots.ports:

- 6700

- 6701

# ##### These may optionally be filled in:

## List of custom serializations

# topology.kryo.register:

# - org.mycompany.MyType

# - org.mycompany.MyType2: org.mycompany.MyType2Serializer

## List of custom kryo decorators

# topology.kryo.decorators:

# - org.mycompany.MyDecorator

## Locations of the drpc servers

# drpc.servers:

# - "server1"

# - "server2"

## Metrics Consumers

# topology.metrics.consumer.register:

# - class: "backtype.storm.metric.LoggingMetricsConsumer"

# parallelism.hint: 1

# - class: "org.mycompany.MyMetricsConsumer"

# parallelism.hint: 1

# argument:

# - endpoint: "metrics-collector.mycompany.org"

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++

三台测试机的配置相同即可，现在启动storm

cd ../bin/

./storm nimbus 启动主节点//指定的主节点机器启动

./storm supervisor 启动工作子节点

./storm ui 启动storm 自带的监控UI，使用host：8080访问

自此，storm 集群搭建完毕

4.kafka + storm 继承

刚发现这竟然没写，周末补

相关帖子

Strom 使用 SASL 鉴权时，通过 System.setProperty 设定的值在别的地方获取不到

Storm 1.0.0 正式发布

Storm、Spark和MapReduce 开源分布式计算系统框架比较

Kafka 元数据管理

基于 Kafka 监听 DB 数据变更并同步副表与 ES 的办法

Kafka 的核心原理

零拷贝的原理

欢迎来到这里！

近期热议

推荐标签标签

最新标签

zookeeper + kafka + storm 集群搭建

相关帖子

Strom 使用 SASL 鉴权时，通过 System.setProperty 设定的值在别的地方获取不到

Storm 1.0.0 正式发布

Storm、Spark和MapReduce 开源分布式计算系统框架比较

Kafka 元数据管理

基于 Kafka 监听 DB 数据变更并同步副表与 ES 的办法

Kafka 的核心原理

零拷贝的原理

欢迎来到这里！

近期热议

推荐标签 标签

最新标签

推荐标签标签