ClickHouse 的分布式引擎

分布式引擎

分布式引擎可以将集群操作透明成单机操作。

集群不支持水平扩展，必须修改配置文件。
分布式表中的子表必须是最终表，不能仍然是分布式表。
配置可以动态生效，服务不需要重启。

开启

通过修改配置文件，示例如下


<remote_servers>
    <logs>
        <shard>
            <!-- Optional. Shard weight when writing data. By default, 1. -->
            <weight>1</weight>
            <!-- Optional. Whether to write data to just one of the replicas. By default, false - write data to all of the replicas. -->
            <internal_replication>false</internal_replication>
            <replica>
                <host>example01-01-1</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>example01-01-2</host>
                <port>9000</port>
            </replica>
        </shard>
        <shard>
            <weight>2</weight>
            <internal_replication>false</internal_replication>
            <replica>
                <host>example01-02-1</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>example01-02-2</host>
                <port>9000</port>
            </replica>
        </shard>
    </logs>
</remote_servers>

账号

可以设置用户名密码


            <replica>
                <host>example01-02-2</host>
                <port>9000</port>
                <user>default</user>
                <password></password>
            </replica>

副本

随机读做任意一个副本，失败则读下一个副本，可以通过 load_balancing 来配置读取策略

分片

可以为不同的分片配置不同的副本数，当然也可以都配置相同的副本数。

分片表达式：即一个 hash 函数，从一条日志得到一个 hash 值。可以有：

随机 hash: rand()
按用户 hash: UserID，可以简化用户级别的 IN 和 JOIN 操作
按用户来均匀 hash: intHash64(UserID)，防止 UserID 不均匀

使用 Sharding schema 的场景

数据量超级大
使用指定 key 的 IN or JOIN 时，如果指定 key 正好是 sharding key，则可以使用 Local IN or JOIN，不必用 GLOBAL IN or GLOBAL JOIN。

超大数据量场景

两层 sharding 方式

将 cluster 分成不同的 layers(可以类比为 index), 每个 layers 包含不同的 shards.
每一个 client 的数据对应于一个 layer
shards 按需加入 layer
分布式表建立在 layer 上
每一个分布式表只有一个 shard(?)

In order for the small queries to not affect the entire cluster, it makes sense to locate data for a single client on a single shard. Alternatively, as we’ve done in Yandex.Metrica, you can set up bi-level sharding: divide the entire cluster into “layers”, where a layer may consist of multiple shards. Data for a single client is located on a single layer, but shards can be added to a layer as necessary, and data is randomly distributed within them. Distributed tables are created for each layer, and a single shared distributed table is created for global queries.

写数据的方式

有两种写数据的方式：

将数据写入最终表：这种方式最灵活。
将数据写入分布式表：需要指定 sharding key 集合。

数据写入 shard 时，可以设置每个 shard 的权重。

内部复制 internal_replication

true 选项

描述：数据只写入一个 replica
场景：最终表为 replicated tables，表自身带复制。

false 选项

描述：数据写入所有的 replica
场景：分布式引擎来控制复制

比较：false 时，时间长了，数据会有少量差异，无法保证数据的最终一致性。

异步写入

数据是异步写入集群，落在磁盘的文件在目录 /var/lib/clickhouse/data/database/table/ 下.

在硬件问题引起重启时，如果数据有损坏，会转移到 broken 目录下，不再使用.

感想

表复制

数据写入一个节点，通过 ZK 通知到其它节点，来从写入节点同步写入的数据。所以可以配置无限个备份。

参考

distributed

记一次 Clickhouse 数据写入成功，但是却无数据问题

[图片] 背景在一次数据排查时，无意间发现数据对不齐。结合代码及日志分析发现都没问题，且有插入操作，但是 Clickhouse 却查不到改条记录。匪夷所思.... 排查过程最后断定大概率是 Clickhouse 的问题，难道 Clickhouse 存在数据丢失的情况？进一步分析丢失数据部分时间分布发现并无任何规律， ..

Mac 使用 Docker-compose 安装部署 Clickhouse 的经验和问题

最近尝试在 Macbook 上部署 Clickhouse 学习使用，在此记录下相关经验和问题安装部署 Clickhouse 主要针对 Linux 系统进行开发，尽管也有针对 Mac OS 的安装包，但官方文档中说不推荐使用。因此，为了最接近原生 Linux 版本的 Clickhouse 运行的效果，尝试使用 Dock ..

ClickHouse 中的公共表表达式 CTE

[图片] 什么是公共表表达式(CTE) ？在本文中，学习如何在 ClickHouse 数据库中使用 CTE，并通过示例跟踪用例在下列情况下使用 CTE 很方便: 当一个请求可以获得数据，并且其大小适合内存空间时需要多次使用此查询的结果创建递归查询额外的好处是提高了 SQL 查询的可读性。 CTE 与临时表和嵌套 ..

ClickHouse 的 Buffer 引擎

背景 [链接]，指把数据先写入内存 Buffer 表，再周期性的刷入磁盘表中。读取数据时，会同时从 Buffer 表和磁盘表读取。示例先给例子 CREATE TABLE merge.hits_buffer AS merge.hits ENGINE = Buffer(merge, hits, 16, 10, 100 ..

ClickHouse 聚合函数

背景 ClickHouse 中内置许多标准 SQL 外的函数。这里记录一下其中[链接]的学习笔记。聚合函数常用函数 count 返回记录条数。如 SELECT count() FROM table 注：如果求 COUNT(DISTINCT x)，则使用 uniq 函数 any(x) 返回遇到的第一个值备注：待 ..

欢迎来到这里！

我们正在构建一个小众社区，大家在这里相互信任，以平等 • 自由 • 奔放的价值观进行分享交流。最终，希望大家能够找到与自己志同道合的伙伴，共同成长。

关于