追求更好的体验,本文采用 docker 方式。
环境部署
详细可参考 server 官方 docker 镜像和 client 官方 docker 镜像
启动 Server
$ docker run -d --name some-clickhouse-server --ulimit nofile=262144:262144 yandex/clickhouse-server
启动 Client
$ docker run -it --rm --link some-clickhouse-server:clickhouse-server yandex/clickhouse-client --host clickhouse-server
附一:启动自定义 Server
如果需要自定义 Server 的启动参数,则:
$ docker run -d --name some-clickhouse-server --ulimit nofile=262144:262144 -v /path/to/your/config.xml:/etc/clickhouse-server/config.xml yandex/clickhouse-server
附二:其它安装包
参考官方文档
- Ubuntu: https://clickhouse.yandex/docs/en/getting_started/index.html
- From Source: https://clickhouse.yandex/docs/en/getting_started/index.html
- CentOS rpm: https://github.com/Altinity/clickhouse-rpm-install
- Gentoo : https://github.com/kmeaw/clickhouse-overlay
基本 SQL
select 1
:) select 1 SELECT 1 ┌─1─┐ │ 1 │ └───┘ 1 rows in set. Elapsed: 0.004 sec.
select now
:) select now() SELECT now() ┌───────────────now()─┐ │ 2017-11-24 02:34:19 │ └─────────────────────┘ 1 rows in set. Elapsed: 0.003 sec.
AMPLab 测试
如官方文档,
下载数据
-
申请 S3 账号:会有两笔预授权的 1$ 账单,据说 3-5 天后会返还。
-
下载数据
s3cmd sync s3://big-data-benchmark/pavlo/text-deflate/tiny/ /User/abeffect/tmp/tiny/
建表
CREATE TABLE rankings_tiny ( pageURL String, pageRank UInt32, avgDuration UInt32 ) ENGINE = Log; CREATE TABLE uservisits_tiny ( sourceIP String, destinationURL String, visitDate Date, adRevenue Float32, UserAgent String, cCode FixedString(3), lCode FixedString(6), searchWord String, duration UInt32 ) ENGINE = MergeTree(visitDate, visitDate, 8192);
导入数据
#!/bin/sh
for i in tiny/rankings/*.deflate; do
echo $i;
printf "\x1f\x8b\x08\x00\x00\x00\x00\x00" | cat - $i | gzip -dc > ${i}.tmp
cat ${i}.tmp | wc -l
cat ${i}.tmp | clickhouse-client --host server-ip --query="INSERT INTO rankings_tiny FORMAT CSV";
done
for i in tiny/uservisits/*.deflate; do
echo $i;
printf "\x1f\x8b\x08\x00\x00\x00\x00\x00" | cat - $i | gzip -dc > ${i}.tmp
cat ${i}.tmp | clickhouse-client --host server-ip --query="INSERT INTO uservisits_tiny FORMAT CSV";
done
查询
SELECT pageURL, pageRank FROM rankings_tiny WHERE pageRank > 300 ┌─pageURL───────────────────────────────────────────────┬─pageRank─┐ │ nbizrgdziebsaecsecujfjcqtvnpcnxxwiopmddorcxnlijdizgoi │ 665 │ │ nbizrgdziebsaecsecujfjcqtvnpcnxxwiopmddorcxnlijdizgoi │ 665 │ └───────────────────────────────────────────────────────┴──────────┘ ┌─pageURL──────────────────────────────────────────────────────────────────────────────────────┬─pageRank─┐ │ dfqlfhuuwxhhjvdtwljahxbwhqkutigdfcuiogfooprtmomeyysquqlzvzmrxhkttrbumglgwixsovxoqisibirdlgtc │ 483 │ │ pucpwgjykbcelcevdmzuahojzjxgmpgojfehhabasvbzbzpczqcubfbfekv │ 397 │ │ ffygkvsklpmup │ 332 │ └──────────────────────────────────────────────────────────────────────────────────────────────┴──────────┘ 5 rows in set. Elapsed: 0.007 sec. Processed 1.32 thousand rows, 90.28 KB (189.33 thousand rows/s., 12.95 MB/s.)
SELECT substring(sourceIP, 1, 8), sum(adRevenue) FROM uservisits_tiny GROUP BY substring(sourceIP, 1, 8) LIMIT 5 ┌─substring(sourceIP, 1, 8)─┬──────sum(adRevenue)─┐ │ 227.120. │ 0.7206825017929077 │ │ 33.157.1 │ 0.9653052091598511 │ │ 134.195. │ 0.08809900283813477 │ │ 172.239. │ 0.2393447756767273 │ │ 19.26.11 │ 0.13361740112304688 │ └───────────────────────────┴─────────────────────┘ 5 rows in set. Elapsed: 0.102 sec. Processed 10.00 thousand rows, 262.95 KB (98.18 thousand rows/s., 2.58 MB/s.)
SELECT sourceIP, sum(adRevenue) AS totalRevenue, avg(pageRank) AS pageRank FROM rankings_tiny ALL INNER JOIN ( SELECT sourceIP, destinationURL AS pageURL, adRevenue FROM uservisits_tiny WHERE (visitDate > '1980-01-01') AND (visitDate < '1980-04-01') ) USING (pageURL) GROUP BY sourceIP ORDER BY totalRevenue DESC LIMIT 1 ┌─sourceIP───────┬──────totalRevenue─┬─pageRank─┐ │ 242.170.245.71 │ 1.132317066192627 │ 26 │ └────────────────┴───────────────────┴──────────┘ 1 rows in set. Elapsed: 0.009 sec. Processed 1.38 thousand rows, 95.39 KB (150.26 thousand rows/s., 10.41 MB/s.)
进一步阅读
- DB-Engines Ranking: 目前上升非常快
- Berkeley BigData Benchmark: 相关 OLAP 产品的评测
- clickhouse rpm install: 安装 rpm 格式的 client 和 server,文中测试部分的 client 即通过此方法安装
- clickhouse jdbc: clickhouse 的 jdbc 驱动
- tabix.ui: clickhouse gui 界面, 无需安装,直接进入
- ClickHouse 之简单性能测试: 美国民用航班的数据,从 1987 年到 2017 年,有 1.7 亿条的测试 case
- ClickHouse vs Amazon RedShift Benchmark
- 常见的 benchmark:
- HiBench: Intel 开发,针对 Hadoop
- Berkeley BigData Bench:AMPLab 开发,针对 Spark
- Hadoop GridMix:Hadoop 自带
- BigBench:Teradata、多伦多大学、InfoSizing、Oracle 开发
- BigDataBenchmark:中科院研发
- TPC-DS: 针对 SQL on Hadoop
- 其它 Malstone、Cloud Harmony、YCSB、SWIM、LinkBench、DFSIO、Hive performance Benchmark(Pavlo)等等
欢迎来到这里!
我们正在构建一个小众社区,大家在这里相互信任,以平等 • 自由 • 奔放的价值观进行分享交流。最终,希望大家能够找到与自己志同道合的伙伴,共同成长。
注册 关于