腾讯TBase是一款腾讯自研高性能HTAP数据库,提供 高性能的OLTP和OLAP能力,同时保证 可扩展全局一致性分布式事务(ACID),为用户提供高一致性的分布式数据库服务和高性能的数据仓库服务。一方面解决了传统数据库扩展不足、数据sharding之后数据库事务的严格一致性难题、数据安全、跨地域容灾等问题,同时具备了高性能事务处理、数据治理、混合负载支持等能力。
创新互联建站-专业网站定制、快速模板网站建设、高性价比都安网站开发、企业建站全套包干低至880元,成熟完善的模板库,直接使用。一站式都安网站制作公司更省心,省钱,快速模板网站建设找我们,业务覆盖都安地区。费用合理售后完善,十年实体公司更值得信赖。
在OLTP方面,TBase采用 MVCC+全局时钟+2PC+SSI的方式来实现全局一致性分布式事务,同时引入大量性能优化的设计来减少全局事务带来的开销。在小规模集群上,TBase能够提供 超过300万TPMTotal的事务处理吞吐量(工业界标准TPCC测试集)。
交易毫秒内完成
TBase已经覆盖多个行业的标杆用户,其中对内支持了微信广告、微信支付、腾讯地图等海量数据业务,一笔交易毫秒内即可完成, 支撑了微信支付50倍的交易增长。
TBase是一个提供写可靠性,多主节点数据同步的关系数据库集群平台。你可以将TBase配置一台或者多台主机上,TBase数据存储在多台物理主机上面。数据表的存储有两种方式, 分别是distributed或者replicated ,当向TBase发送查询 SQL时,TBase会自动向数据节点发出查询语句并获取最终结果。
TBase采用分布式集群架构(如下图),该架构分布式为无共享(share nothing)模式,节点之间相应独立,各自处理自己的数据,处理后的结果可能向上层汇总或在节点间流转,各处理单元之间通过网络协议进行通信,并行处理和扩展能力更好,这也意味着只需要简单的x86服务器就可以部署TBase数据库集群。
下面简单解读一下TBase的三大模块:
Coordinator:协调节点(简称CN)
业务访问入口,负责数据的分发和查询规划,多个节点位置对等,每个节点都提供相同的数据库视图;在功能上CN上只存储系统的全局元数据,并不存储实际的业务数据。
Datanode:数据节点(简称DN)
每个节点还存储业务数据的分片在功能上,DN节点负责完成执行协调节点分发的执行请求。
GTM:全局事务管理器(Global Transaction Manager)
负责管理集群事务信息,同时管理集群的全局对象,比如序列等。
接下来,让我们来看看如何从源码开始,完成到TBase集群环境的搭建。
注意:所有需要安装TBase集群的机器上都需要创建
mkdir /data useradd -d /data/tbase tbase
git clone https://github.com/Tencent/TBase
cd ${SOURCECODE_PATH} rm -rf ${INSTALL_PATH}/tbase_bin_v2.0 chmod +x configure* ./configure --prefix=${INSTALL_PATH}/tbase_bin_v2.0 --enable-user-switch --with-openssl --with-ossp-uuid CFLAGS=-g make clean make -sj make install chmod +x contrib/pgxc_ctl/make_signature cd contrib make -sj make install
本文的使用环境中,上述两个参数如下
${SOURCECODE_PATH}=/data/tbase/TBase-master
${INSTALL_PATH}=/data/tbase/install
下面以两台服务器上搭建1GTM主,1GTM备,2CN主(CN主之间对等,因此无需备CN),2DN主,2DN备的集群,该集群为具备容灾能力的最小配置
机器1:10.215.147.158 机器2:10.240.138.159
集群规划如下:
参考Linux ssh互信配置
集群所有机器都需要配置
[tbase@TENCENT64 ~]$ vim ~/.bashrc export TBASE_HOME=/data/tbase/install/tbase_bin_v2.0 export PATH=$TBASE_HOME/bin:$PATH export LD_LIBRARY_PATH=$TBASE_HOME/lib:${LD_LIBRARY_PATH}
以上,已经配置好了所需要基础环境,可以进入到集群初始化阶段,为了方便用户,TBase提供了专用的配置和操作工具: pgxc_ctl来协助用户快速搭建并管理集群,首先需要将前文所述的节点的ip,端口,目录写入到配置文件 pgxc_ctl.conf 中。
[tbase@TENCENT64 ~]$ mkdir /data/tbase/pgxc_ctl [tbase@TENCENT64 ~]$ cd /data/tbase/pgxc_ctl [tbase@TENCENT64 ~/pgxc_ctl]$ vim pgxc_ctl.conf
如下,是结合上文描述的IP,端口,数据库目录,二进制目录等规划来写的pgxc_ctl.conf文件。具体实践中只需按照自己的实际情况配置好即可.
#!/bin/bash pgxcInstallDir=/data/tbase/install/tbase_bin_v2.0 pgxcOwner=tbase defaultDatabase=postgres pgxcUser=$pgxcOwner tmpDir=/tmp localTmpDir=$tmpDir configBackup=n configBackupHost=pgxc-linker configBackupDir=$HOME/pgxc configBackupFile=pgxc_ctl.bak #---- GTM ---------- gtmName=gtm gtmMasterServer=10.215.147.158 gtmMasterPort=50001 gtmMasterDir=/data/tbase/data/gtm gtmExtraConfig=none gtmMasterSpecificExtraConfig=none gtmSlave=y gtmSlaveServer=10.240.138.159 gtmSlavePort=50001 gtmSlaveDir=/data/tbase/data/gtm gtmSlaveSpecificExtraConfig=none #---- Coordinators ------- coordMasterDir=/data/tbase/data/coord coordMasterDir=/data/tbase/data/coord coordArchLogDir=/data/tbase/data/coord_archlog coordNames=(cn001 cn002 ) coordPorts=(30004 30004 ) poolerPorts=(31110 31110 ) coordPgHbaEntries=(0.0.0.0/0) coordMasterServers=(10.215.147.158 10.240.138.159) coordMasterDirs=($coordMasterDir $coordMasterDir) coordMaxWALsernder=2 coordMaxWALSenders=($coordMaxWALsernder $coordMaxWALsernder ) coordSlave=n coordSlaveSync=n coordArchLogDirs=($coordArchLogDir $coordArchLogDir) coordExtraConfig=coordExtraConfig cat > $coordExtraConfig <$coordExtraPgHba < $datanodeExtraConfig < $datanodeExtraPgHba <
在一个节点配置好配置文件后,需要预先将二进制包部署到所有节点所在的机器上,这个可以使用pgxc_ctl工具,执行 deploy all命令来完成。
[tbase@TENCENT64 ~/pgxc_ctl]$ pgxc_ctl /usr/bin/bash Installing pgxc_ctl_bash script as /data/tbase/pgxc_ctl/pgxc_ctl_bash. Installing pgxc_ctl_bash script as /data/tbase/pgxc_ctl/pgxc_ctl_bash. Reading configuration using /data/tbase/pgxc_ctl/pgxc_ctl_bash --home /data/tbase/pgxc_ctl --configuration /data/tbase/pgxc_ctl/pgxc_ctl.conf Finished reading configuration. ******** PGXC_CTL START *************** Current directory: /data/tbase/pgxc_ctl PGXC deploy all Deploying Postgres-XL components to all the target servers. Prepare tarball to deploy ... Deploying to the server 10.215.147.158. Deploying to the server 10.240.138.159. Deployment done. 登录到所有节点,check二进制包是否分发OK [tbase@TENCENT64 ~/install]$ ls /data/tbase/install/tbase_bin_v2.0 bin include lib share
[tbase@TENCENT64 ~]$ pgxc_ctl /usr/bin/bash Installing pgxc_ctl_bash script as /data/tbase/pgxc_ctl/pgxc_ctl_bash. Installing pgxc_ctl_bash script as /data/tbase/pgxc_ctl/pgxc_ctl_bash. Reading configuration using /data/tbase/pgxc_ctl/pgxc_ctl_bash --home /data/tbase/pgxc_ctl --configuration /data/tbase/pgxc_ctl/pgxc_ctl.conf Finished reading configuration. ******** PGXC_CTL START *************** Current directory: /data/tbase/pgxc_ctl PGXC init all Initialize GTM master .... .... Initialize datanode slave dn001 Initialize datanode slave dn002 mkdir: cannot create directory '/data1/tbase': Permission denied chmod: cannot access '/data1/tbase/data/dn001': No such file or directory pg_ctl: directory "/data1/tbase/data/dn001" does not exist pg_basebackup: could not create directory "/data1/tbase": Permission denied
一般init集群出错,终端会打印出错误日志,通过查看错误原因,更改配置即可,或者可以通过/data/tbase/pgxc_ctl/pgxc_log路径下的错误日志查看错误,排查配置文件的错误
[tbase@TENCENT64 ~]$ ll ~/pgxc_ctl/pgxc_log/ total 184 -rw-rw-r-- 1 tbase tbase 81123 Nov 13 17:22 14105_pgxc_ctl.log -rw-rw-r-- 1 tbase tbase 2861 Nov 13 17:58 15762_pgxc_ctl.log -rw-rw-r-- 1 tbase tbase 14823 Nov 14 07:59 16671_pgxc_ctl.log -rw-rw-r-- 1 tbase tbase 2721 Nov 13 16:52 18891_pgxc_ctl.log -rw-rw-r-- 1 tbase tbase 1409 Nov 13 16:20 22603_pgxc_ctl.log -rw-rw-r-- 1 tbase tbase 60043 Nov 13 16:33 28932_pgxc_ctl.log -rw-rw-r-- 1 tbase tbase 15671 Nov 14 07:57 6849_pgxc_ctl.log
通过运行 pgxc_ctl 工具,执行 clean all命令删除已经初始化的文件,修改pgxc_ctl.conf文件,重新执行 init all命令重新发起初始化。
[tbase@TENCENT64 ~]$ pgxc_ctl /usr/bin/bash Installing pgxc_ctl_bash script as /data/tbase/pgxc_ctl/pgxc_ctl_bash. Installing pgxc_ctl_bash script as /data/tbase/pgxc_ctl/pgxc_ctl_bash. Reading configuration using /data/tbase/pgxc_ctl/pgxc_ctl_bash --home /data/tbase/pgxc_ctl --configuration /data/tbase/pgxc_ctl/pgxc_ctl.conf Finished reading configuration. ******** PGXC_CTL START *************** Current directory: /data/tbase/pgxc_ctl PGXC clean all [tbase@TENCENT64 ~]$ pgxc_ctl /usr/bin/bash Installing pgxc_ctl_bash script as /data/tbase/pgxc_ctl/pgxc_ctl_bash. Installing pgxc_ctl_bash script as /data/tbase/pgxc_ctl/pgxc_ctl_bash. Reading configuration using /data/tbase/pgxc_ctl/pgxc_ctl_bash --home /data/tbase/pgxc_ctl --configuration /data/tbase/pgxc_ctl/pgxc_ctl.conf Finished reading configuration. ******** PGXC_CTL START *************** Current directory: /data/tbase/pgxc_ctl PGXC init all Initialize GTM master EXECUTE DIRECT ON (dn002) 'ALTER NODE dn002 WITH (TYPE=''datanode'', HOST=''10.240.138.159'', PORT=40004, PREFERRED)'; EXECUTE DIRECT EXECUTE DIRECT ON (dn002) 'SELECT pgxc_pool_reload()'; pgxc_pool_reload ------------------ t (1 row) Done.
当发现上面的输出时,集群已经OK,另外也可以通过pgxc_ctl工具的 monitor all命令来查看集群状态
[tbase@TENCENT64 ~/pgxc_ctl]$ pgxc_ctl /usr/bin/bash Installing pgxc_ctl_bash script as /data/tbase/pgxc_ctl/pgxc_ctl_bash. Installing pgxc_ctl_bash script as /data/tbase/pgxc_ctl/pgxc_ctl_bash. Reading configuration using /data/tbase/pgxc_ctl/pgxc_ctl_bash --home /data/tbase/pgxc_ctl --configuration /data/tbase/pgxc_ctl/pgxc_ctl.conf Finished reading configuration. ******** PGXC_CTL START *************** Current directory: /data/tbase/pgxc_ctl PGXC monitor all Running: gtm master Not running: gtm slave Running: coordinator master cn001 Running: coordinator master cn002 Running: datanode master dn001 Running: datanode slave dn001 Running: datanode master dn002 Not running: datanode slave dn002
一般的如果配置的不是强同步模式,gtm salve,dn slave的故障不会影响访问。
访问TBase集群和访问单机的PostgreSQL基本上无差别,我们可以通过任意一个CN访问数据库集群:例如通过连接CN节点select pgxc_node表即可查看集群的拓扑结构(当前的配置下备机不会展示在pgxc_node中),在Linux命令行下通过psql访问的具体示例如下
[tbase@TENCENT64 ~/pgxc_ctl]$ psql -h 10.215.147.158 -p 30004 -d postgres -U tbase psql (PostgreSQL 10.0 TBase V2) Type "help" for help. postgres=# \d Did not find any relations. postgres=# select * from pgxc_node; node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id | node_cluster_name -----------+-----------+-----------+----------------+----------------+------------------+------------+------------------- gtm | G | 50001 | 10.215.147.158 | t | f | 428125959 | tbase_cluster cn001 | C | 30004 | 10.215.147.158 | f | f | -264077367 | tbase_cluster cn002 | C | 30004 | 10.240.138.159 | f | f | -674870440 | tbase_cluster dn001 | D | 40004 | 10.215.147.158 | t | t | 2142761564 | tbase_cluster dn002 | D | 40004 | 10.240.138.159 | f | f | -17499968 | tbase_cluster (5 rows)
TBase使用datanode group来增加节点的管理灵活度,要求有一个default group才能使用,因此需要预先创建;一般情况下,会将节点的所有datanode节点加入到default group里 另外一方面,TBase的数据分布为了增加灵活度,加了中间逻辑层来维护数据记录到物理节点的映射,我们叫sharding,所以需要预先创建sharding,命令如下:
postgres=# create default node group default_group with (dn001,dn002); CREATE NODE GROUP postgres=# create sharding group to group default_group; CREATE SHARDING GROUP
至此,就可以跟使用单机数据库一样来访问数据库集群了
postgres=# create database test; CREATE DATABASE postgres=# create user test with password 'test'; CREATE ROLE postgres=# alter database test owner to test; ALTER DATABASE postgres=# \c test test You are now connected to database "test" as user "test". test=> create table foo(id bigint, str text) distribute by shard(id); CREATE TABLE test=> insert into foo values(1, 'tencent'), (2, 'shenzhen'); COPY 2 test=> select * from foo; id | str ----+---------- 1 | tencent 2 | shenzhen (2 rows)
通过pgxc_ctl工具的 stop all命令来停止集群,stop all 后面可以加上参数 -m fast或者是 -m immediate来决定如何停止各个节点。
PGXC stop all -m fast Stopping all the coordinator masters. Stopping coordinator master cn001. Stopping coordinator master cn002. Done. Stopping all the datanode slaves. Stopping datanode slave dn001. Stopping datanode slave dn002. pg_ctl: PID file "/data/tbase/data/dn002/postmaster.pid" does not exist Is server running? Stopping all the datanode masters. Stopping datanode master dn001. Stopping datanode master dn002. Done. Stop GTM slave waiting for server to shut down..... done server stopped Stop GTM master waiting for server to shut down.... done server stopped PGXC monitor all Not running: gtm master Not running: gtm slave Not running: coordinator master cn001 Not running: coordinator master cn002 Not running: datanode master dn001 Not running: datanode slave dn001 Not running: datanode master dn002 Not running: datanode slave dn002
通过pgxc_ctl工具的 start all命令来启动集群
[tbase@TENCENT64 ~]$ pgxc_ctl /usr/bin/bash Installing pgxc_ctl_bash script as /data/tbase/pgxc_ctl/pgxc_ctl_bash. Installing pgxc_ctl_bash script as /data/tbase/pgxc_ctl/pgxc_ctl_bash. Reading configuration using /data/tbase/pgxc_ctl/pgxc_ctl_bash --home /data/tbase/pgxc_ctl --configuration /data/tbase/pgxc_ctl/pgxc_ctl.conf Finished reading configuration. ******** PGXC_CTL START *************** Current directory: /data/tbase/pgxc_ctl PGXC start all
本文档只是给用户一个简单的指引,演示如何从源码开始,一步一步搭建一个完整的TBase集群,后续会有更多的文章来介绍TBase的特性使用,优化,问题定位等内容。
腾讯TBase GitHub 开源地址,请搜索关注“腾讯云数据库”官方微信,回复“开源”即可获取。
支撑微信支付的数据库如何提供超300万TPCC事务处理能力?
最佳实践 | 腾讯HTAP数据库TBase助力某省核心IT架构升级