flume 的基本工作机制示意图
flume 的安装
flume 官网
在官网下载自己的需要的版本,上传服务器解压到相对应的目录
flume 的简单配置
读取指定目录下所有文件的配置
#定义三大组件的名称 ag1.sources = source1 ag1.sinks = sink1 ag1.channels = channel1 # 配置source组件 ag1.sources.source1.type = spooldir ag1.sources.source1.spoolDir = /root/log/ ag1.sources.source1.fileSuffix=.FINISHED ag1.sources.source1.deserializer.maxLineLength=5120 # 配置sink组件 ag1.sinks.sink1.type = hdfs ag1.sinks.sink1.hdfs.path =hdfs://hdp-01:9000/access_log/%y-%m-%d/%H-%M ag1.sinks.sink1.hdfs.filePrefix = app_log ag1.sinks.sink1.hdfs.fileSuffix = .log ag1.sinks.sink1.hdfs.batchSize= 100 ag1.sinks.sink1.hdfs.fileType = DataStream ag1.sinks.sink1.hdfs.writeFormat =Text ## roll:滚动切换:控制写文件的切换规则 ## 按文件体积(字节)来切 ag1.sinks.sink1.hdfs.rollSize = 512000 ## 按event条数切 ag1.sinks.sink1.hdfs.rollCount = 1000000 ## 按时间间隔切换文件 ag1.sinks.sink1.hdfs.rollInterval = 60 ## 控制生成目录的规则 ag1.sinks.sink1.hdfs.round = true ag1.sinks.sink1.hdfs.roundValue = 10 ag1.sinks.sink1.hdfs.roundUnit = minute ag1.sinks.sink1.hdfs.useLocalTimeStamp = true # channel组件配置 ag1.channels.channel1.type = memory ## event条数 ag1.channels.channel1.capacity = 500000 ##flume事务控制所需要的缓存容量600条event ag1.channels.channel1.transactionCapacity = 600 # 绑定source、channel和sink之间的连接 ag1.sources.source1.channels = channel1 ag1.sinks.sink1.channel = channel1
用过 tail -f 读取指定 log 的配置
tail-hdfs.conf 用tail命令获取数据,下沉到hdfs 启动命令: bin/flume-ng agent -c conf -f conf/tail-hdfs.conf -n a1 ######## # Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = exec a1.sources.r1.command = tail -F /root/app_weichat_login.log # Describe the sink agent1.sinks.sink1.type = hdfs agent1.sinks.sink1.hdfs.path =hdfs://hdp20-01:9000/app_weichat_login_log/%y-%m-%d/%H-%M agent1.sinks.sink1.hdfs.filePrefix = weichat_log agent1.sinks.sink1.hdfs.fileSuffix = .dat agent1.sinks.sink1.hdfs.batchSize= 100 agent1.sinks.sink1.hdfs.fileType = DataStream agent1.sinks.sink1.hdfs.writeFormat =Text agent1.sinks.sink1.hdfs.rollSize = 100 agent1.sinks.sink1.hdfs.rollCount = 1000000 agent1.sinks.sink1.hdfs.rollInterval = 60 agent1.sinks.sink1.hdfs.round = true agent1.sinks.sink1.hdfs.roundValue = 1 agent1.sinks.sink1.hdfs.roundUnit = minute agent1.sinks.sink1.hdfs.useLocalTimeStamp = true # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1
在安装目录的 conf 目录添加配置文件 可以根据注解将上述的配置改为自己的配置
运行 flume 采集数据
进入根目录
flume-ng agent -c conf -f 配置文件的名字 -n ag1
ag1 是配置文件中 agent 的名字
flume-ng agent -c conf -f 配置文件的名字 -n ag1 -Dflume.roo.log=INFO,console
更改 log 级别可以在打印日志到控制台
欢迎来到这里!
我们正在构建一个小众社区,大家在这里相互信任,以平等 • 自由 • 奔放的价值观进行分享交流。最终,希望大家能够找到与自己志同道合的伙伴,共同成长。
注册 关于