1.2 Why HTAP Matters

本贴最后更新于 1343 天前,其中的信息可能已经渤澥桑田

1.2 Why HTAP Matters(HTAP 数据库简介)

主讲人

Xiaoyu Ma(马晓宇)

Senior Technical Director - Real-time analytics and SQL meta

Tech lead@Quantcast / Netease / PingCAP

Big Data / Distributed database

Before we begin

  • Context: As the need of real-time analytics and HTAP is rising, this topic is an introduction of the concept of HTAP.
  • Goal: After this session, audience will have a brief idea of what is HTAP
  • Outline:
    • Database evolution
    • What does term HTAP means
    • Why HTAP is needed and how it helps you
    • TiDB HTAP architecture
    • Real world scenarios
  • Lab requirements (if needed):N/A

Part I: What is HTAP

  • An overview of HTAP
  • Goal
  • Subtopics
    • What is HTAP
    • AP & TP Databases
    • Why HTAP and how it helps you
    • Technical difficulties
  • Key points
  • Review of goal

What is HTAP

  • Invented by Gartner
  • HTAP is a very simple concept
  • TP = Transactional Processing
    • Row format, update in real-time
    • High concurrency and consistency, touch only a few rows each time
    • Current data
  • AP = Analytical Processing
    • Columnar format, batch update
    • Low concurrency, large batch process each query
    • Historical data

HTAP 是一家著名的市场分析和调研机构 Gartner 发明的词汇。

传统数据平台

121TraditionalDataPlatform.png

数仓中的数据更新不及时,架构复杂。

Why HTAP

The boundary between TP/AP is blurry now

  • TP-ish AP use cases
    • Comprehensive query platforms that provide report and high concurrent short query at the same time
  • AP-ish TP use cases
    • Analyze and optimize online transactional business in real time
    • Real-time cross BU data services

How HTAP help you

  • HTAP databases shine
    • Simplify architecture
    • Lower maintenance cost
    • Empower real-time scenarios
    • Improve business agility

HTAP 使架构变得简单,降低运维成本,支持实时分析和决策。

案例:销售数据平台

122SalesDataPlatform.png

该平台要求必须提供 TP 和 AP 两种能力。

Difficulties

  • Meeting the requirements from both sides is hard
    • Scalability
      • It's easy to build a distributed AP database but TP is hard
    • TP/AP at the same time
      • Supporting both storage forms
      • Avoiding workload interference
    • Seamlessly integration
      • Data synchronization
      • Fresh data

Part II: How HTAP help you

  • An overview of TiDB HTAP
  • Goal
  • Subtopics
    • TiDB HTAP introduction
    • Real world scenarios
  • Key points
  • Review of goal

TiDB HTAP

  • A Scalable database
  • Build for strict transactional use cases
  • Proved at core finance business
  • Equipped with powerful analytical engines
  • Natural fit for datahub / real-time data application

What's new in TiDB 4.0 HTAP

  • Real-time updatable columnar engine
  • Scalable row-wise and columnar engines
    • Separated machines, no interference
    • Consistent data replication
  • vectorized engine
  • Smart selection between row and column formats

增加了一个可更新的列存引擎。
行存引擎和列存引擎可以使用不同的服务器资源,互不干扰。
行存到列存一致性复制(异步)。
优化器自动选择使用行存还是列存。

TiDB 4.0 架构

123NewTiDB4.0Architecture.png

真实案例一:TP + AP 的一站式应用

124TPAPOneStop.png

简化架构,一套系统替代两套系统,保证数据新鲜。

真实案例二:实时数仓

125RealTimeDW.png

承载不同业务系统的数据变更,实时业务分析。

综合数据平台

126ComprehensiveDataPlatform.png

TiSpark 可以横跨多种数据平台。

  • TiDB
    5 引用
  • CAP

    CAP 指的是在一个分布式系统中, Consistency(一致性)、 Availability(可用性)、Partition tolerance(分区容错性),三者不可兼得。

    11 引用 • 5 回帖 • 597 关注
  • 101
    5 引用 • 3 回帖

相关帖子

欢迎来到这里!

我们正在构建一个小众社区,大家在这里相互信任,以平等 • 自由 • 奔放的价值观进行分享交流。最终,希望大家能够找到与自己志同道合的伙伴,共同成长。

注册 关于
请输入回帖内容 ...