0 TBase_Quick_Start.EN

What's TBase

TBase is a relational database cluster platform that provides write reliability and multi node data synchronization. You can configure TBase on one or more hosts and store TBase data on multiple physical hosts. There are two ways to store data tables: distributed or replicated. When sending query SQL to TBase, TBase will automatically send query statements to data nodes and get the final results.

TBase is a distributed cluster architecture (as shown in the figure below), which is a distributed share nothing architecture mode, each node is independent and processes its own data. After processing, the results may be summarized to the upper layer or transferred between nodes. Each processing unit communicates with each other through the network protocol, with better parallel processing and expansion ability, which also means that the TBase cluster can be deployed on the basic x86 server.

TBase architecture

Here are three modules of TBase

Coordinator Node: (CN for short)

Application access portal is responsible for data distribution and query planning. Multiple nodes are in the same location, and each node provides the same database view. In terms of function, CN only stores the global metadata of the system, not the actual user data.
Datanode Node: (DN for short)

Each DN stores the partition of user data. In function, DN node is responsible for completing the request of executing CN distribution.
GTM Node: (Global Transaction Manager)

GTM is responsible for the management of cluster transaction information, as well as the global objects of the cluster, such as sequence.

Next, let's show how to build a TBase cluster environment from the source code

TBase source code compilation and installation

create user

Note: all machines that need to install TBase cluster need to create
```
 mkdir /data
 useradd -d /data/TBase TBase
```
get source code

git clone https://github.com/Tencent/TBase

source code compilation

cd ${SOURCECODE_PATH}
 rm -rf ${INSTALL_PATH}/TBase_bin_v2.0
 chmod +x configure*
 ./configure --prefix=${INSTALL_PATH}/TBase_bin_v2.0  --enable-user-switch --with-openssl  --with-ossp-uuid CFLAGS=-g
 make clean
 make -sj
 make install
 chmod +x contrib/pgxc_ctl/make_signature
 cd contrib
 make -sj
 make install

In this paper, the above two parameters are as follows

 ${SOURCECODE_PATH}=/data/TBase/TBase-master
 ${INSTALL_PATH}=/data/TBase/install

cluster installation

cluster planning

Next, set up a cluster of 1 GTM master, 1 GTM standby, 2 CN master (the CN master is equivalent, so there is no need to standy CN), 2 DN master, and 2 DN standby on two servers. This cluster is the minimum configuration with disaster tolerance capability.

 host1：10.215.147.158
 host2：10.240.138.159

planning is as follows：

node name	IP	data directory
GTM master	10.215.147.158	/data/TBase/data/gtm
GTM slave	10.240.138.159	/data/TBase/data/gtm
CN1	10.215.147.158	/data/TBase/data/coord
CN2	10.240.138.159	/data/TBase/data/coord
DN1 master	10.215.147.158	/data/TBase/data/dn001
DN1 slave	10.240.138.159	/data/TBase/data/dn001
DN2 master	10.240.138.159	/data/TBase/data/dn002
DN2 slave	10.215.147.158	/data/TBase/data/dn002

Sketch Map:

TBase Deploy Sketch Map

SSH mutual trust configuration between machines

Reference resources: Linux ssh mutual trust

Environment variable configuration

All machines in the cluster need to be configured

 [TBase@TENCENT64 ~]$ vim ~/.bashrc
 export TBase_HOME=/data/TBase/install/TBase_bin_v2.0
 export PATH=$TBase_HOME/bin:$PATH
 export LD_LIBRARY_PATH=$TBase_HOME/lib:${LD_LIBRARY_PATH}

Above, the required basic environment has been configured, and you can enter the cluster initialization stage. For the convenience of users, TBase provides special configuration and operation tools: pgxc_ctl to help users quickly build and manage clusters. Here, you need to write the IP, port and data directory of the nodes mentioned above into the configuration file pgxc_ctl.conf

Initialization configuration file pgxc_ctl.conf

 [TBase@TENCENT64 ~]$ mkdir /data/TBase/pgxc_ctl
 [TBase@TENCENT64 ~]$ cd /data/TBase/pgxc_ctl
 [TBase@TENCENT64 ~/pgxc_ctl]$ vim pgxc_ctl.conf

The following shows the pgxc_ctl.conf file content written using the IP, port, database directory, binary directory and other planning values described above. In practice, we only need to configure it according to our own actual situation.

 #!/bin/bash

 pgxcInstallDir=/data/TBase/install/TBase_bin_v2.0
 pgxcOwner=TBase
 defaultDatabase=postgres
 pgxcUser=$pgxcOwner
 tmpDir=/tmp
 localTmpDir=$tmpDir
 configBackup=n
 configBackupHost=pgxc-linker
 configBackupDir=$HOME/pgxc
 configBackupFile=pgxc_ctl.bak
 
 
 #---- GTM ----------
 gtmName=gtm
 gtmMasterServer=10.215.147.158
 gtmMasterPort=50001
 gtmMasterDir=/data/TBase/data/gtm
 gtmExtraConfig=none
 gtmMasterSpecificExtraConfig=none
 gtmSlave=y
 gtmSlaveServer=10.240.138.159
 gtmSlavePort=50001
 gtmSlaveDir=/data/TBase/data/gtm
 gtmSlaveSpecificExtraConfig=none
 
 #---- Coordinators -------
 coordMasterDir=/data/TBase/data/coord
 coordArchLogDir=/data/TBase/data/coord_archlog
 
 coordNames=(cn001 cn002 )
 coordPorts=(30004 30004 )
 poolerPorts=(31110 31110 )
 coordPgHbaEntries=(0.0.0.0/0)
 coordMasterServers=(10.215.147.158 10.240.138.159)
 coordMasterDirs=($coordMasterDir $coordMasterDir)
 coordMaxWALsernder=2
 coordMaxWALSenders=($coordMaxWALsernder $coordMaxWALsernder )
 coordSlave=n
 coordSlaveSync=n
 coordArchLogDirs=($coordArchLogDir $coordArchLogDir)
 
 coordExtraConfig=coordExtraConfig
 cat > $coordExtraConfig <<EOF
 #================================================
 # Added to all the coordinator postgresql.conf
 # Original: $coordExtraConfig
 
 include_if_exists = '/data/TBase/global/global_TBase.conf'
 
 wal_level = replica
 wal_keep_segments = 256 
 max_wal_senders = 4
 archive_mode = on 
 archive_timeout = 1800 
 archive_command = 'echo 0' 
 log_truncate_on_rotation = on 
 log_filename = 'postgresql-%M.log' 
 log_rotation_age = 4h 
 log_rotation_size = 100MB
 hot_standby = on 
 wal_sender_timeout = 30min 
 wal_receiver_timeout = 30min 
 shared_buffers = 1024MB 
 max_pool_size = 2000
 log_statement = 'ddl'
 log_destination = 'csvlog'
 logging_collector = on
 log_directory = 'pg_log'
 listen_addresses = '*'
 max_connections = 2000
 
 EOF
 
 coordSpecificExtraConfig=(none none)
 coordExtraPgHba=coordExtraPgHba
 cat > $coordExtraPgHba <<EOF
 
 local   all             all                                     trust
 host    all             all             0.0.0.0/0               trust
 host    replication     all             0.0.0.0/0               trust
 host    all             all             ::1/128                 trust
 host    replication     all             ::1/128                 trust
 
 
 EOF
 
 
 coordSpecificExtraPgHba=(none none)
 coordAdditionalSlaves=n	
 cad1_Sync=n
 
 #---- Datanodes ---------------------
 dn1MstrDir=/data/TBase/data/dn001
 dn2MstrDir=/data/TBase/data/dn002
 dn1SlvDir=/data/TBase/data/dn001
 dn2SlvDir=/data/TBase/data/dn002
 dn1ALDir=/data/TBase/data/datanode_archlog
 dn2ALDir=/data/TBase/data/datanode_archlog
 
 primaryDatanode=dn001
 datanodeNames=(dn001 dn002)
 datanodePorts=(40004 40004)
 datanodePoolerPorts=(41110 41110)
 datanodePgHbaEntries=(0.0.0.0/0)
 datanodeMasterServers=(10.215.147.158 10.240.138.159)
 datanodeMasterDirs=($dn1MstrDir $dn2MstrDir)
 dnWALSndr=4
 datanodeMaxWALSenders=($dnWALSndr $dnWALSndr)
 
 datanodeSlave=y
 datanodeSlaveServers=(10.240.138.159 10.215.147.158)
 datanodeSlavePorts=(50004 54004)
 datanodeSlavePoolerPorts=(51110 51110)
 datanodeSlaveSync=n
 datanodeSlaveDirs=($dn1SlvDir $dn2SlvDir)
 datanodeArchLogDirs=($dn1ALDir/dn001 $dn2ALDir/dn002)
 
 datanodeExtraConfig=datanodeExtraConfig
 cat > $datanodeExtraConfig <<EOF
 #================================================
 # Added to all the coordinator postgresql.conf
 # Original: $datanodeExtraConfig
 
 include_if_exists = '/data/TBase/global/global_TBase.conf'
 listen_addresses = '*' 
 wal_level = replica 
 wal_keep_segments = 256 
 max_wal_senders = 4
 archive_mode = on 
 archive_timeout = 1800 
 archive_command = 'echo 0' 
 log_directory = 'pg_log' 
 logging_collector = on 
 log_truncate_on_rotation = on 
 log_filename = 'postgresql-%M.log' 
 log_rotation_age = 4h 
 log_rotation_size = 100MB
 hot_standby = on 
 wal_sender_timeout = 30min 
 wal_receiver_timeout = 30min 
 shared_buffers = 1024MB 
 max_connections = 4000 
 max_pool_size = 4000
 log_statement = 'ddl'
 log_destination = 'csvlog'
 wal_buffers = 1GB
 
 EOF
 
 datanodeSpecificExtraConfig=(none none)
 datanodeExtraPgHba=datanodeExtraPgHba
 cat > $datanodeExtraPgHba <<EOF
 
 local   all             all                                     trust
 host    all             all             0.0.0.0/0               trust
 host    replication     all             0.0.0.0/0               trust
 host    all             all             ::1/128                 trust
 host    replication     all             ::1/128                 trust
 
 
 EOF
 
 
 datanodeSpecificExtraPgHba=(none none)
 
 datanodeAdditionalSlaves=n
 walArchive=n

Distribute binary package

After writing the configuration file, you need to deploy the binary package to the physical machine where all nodes are located. This can be done by executing the deploy all command with pgxc_ctl tool.

 [TBase@TENCENT64 ~/pgxc_ctl]$ pgxc_ctl 
 /usr/bin/bash
 Installing pgxc_ctl_bash script as /data/TBase/pgxc_ctl/pgxc_ctl_bash.
 Installing pgxc_ctl_bash script as /data/TBase/pgxc_ctl/pgxc_ctl_bash.
 Reading configuration using /data/TBase/pgxc_ctl/pgxc_ctl_bash --home /data/TBase/pgxc_ctl --configuration /data/TBase/pgxc_ctl/pgxc_ctl.conf
 Finished reading configuration.
    ******** PGXC_CTL START ***************
 
 Current directory: /data/TBase/pgxc_ctl
 PGXC deploy all
 Deploying Postgres-XL components to all the target servers.
 Prepare tarball to deploy ... 
 Deploying to the server 10.215.147.158.
 Deploying to the server 10.240.138.159.
 Deployment done.

Log in to all nodes and check whether the binary package is distributed

 [TBase@TENCENT64 ~/install]$ ls /data/TBase/install/TBase_bin_v2.0
 bin  include  lib  share

execute init all command to complete cluster initialization

 [TBase@TENCENT64 ~]$ pgxc_ctl
 /usr/bin/bash
 Installing pgxc_ctl_bash script as /data/TBase/pgxc_ctl/pgxc_ctl_bash.
 Installing pgxc_ctl_bash script as /data/TBase/pgxc_ctl/pgxc_ctl_bash.
 Reading configuration using /data/TBase/pgxc_ctl/pgxc_ctl_bash --home /data/TBase/pgxc_ctl --configuration /data/TBase/pgxc_ctl/pgxc_ctl.conf
 Finished reading configuration.
    ******** PGXC_CTL START ***************
 
 Current directory: /data/TBase/pgxc_ctl
 PGXC init all
 Initialize GTM master
 ....
 ....
 Initialize datanode slave dn001
 Initialize datanode slave dn002
 mkdir: cannot create directory '/data1/TBase': Permission denied
 chmod: cannot access '/data1/TBase/data/dn001': No such file or directory
 pg_ctl: directory "/data1/TBase/data/dn001" does not exist
 pg_basebackup: could not create directory "/data1/TBase": Permission denied

Installation error handling

Generally, if there is an error in initializing the cluster, the terminal will print out the error log. You can look up the error reason and change the configuration, or through the error log in '/data/TBase/pgxc_ctl/pgxc_log' path to check the error in the configuration file

 [TBase@TENCENT64 ~]$ ll ~/pgxc_ctl/pgxc_log/
 total 184
 -rw-rw-r-- 1 TBase TBase 81123 Nov 13 17:22 14105_pgxc_ctl.log
 -rw-rw-r-- 1 TBase TBase  2861 Nov 13 17:58 15762_pgxc_ctl.log
 -rw-rw-r-- 1 TBase TBase 14823 Nov 14 07:59 16671_pgxc_ctl.log
 -rw-rw-r-- 1 TBase TBase  2721 Nov 13 16:52 18891_pgxc_ctl.log
 -rw-rw-r-- 1 TBase TBase  1409 Nov 13 16:20 22603_pgxc_ctl.log
 -rw-rw-r-- 1 TBase TBase 60043 Nov 13 16:33 28932_pgxc_ctl.log
 -rw-rw-r-- 1 TBase TBase 15671 Nov 14 07:57 6849_pgxc_ctl.log

By running pgxc_ctl tool, execute clean all command to delete the initialized file. Then modify the pgxc_ctl.conf file，and execute the init all command to reinitialize.

 [TBase@TENCENT64 ~]$ pgxc_ctl
 /usr/bin/bash
 Installing pgxc_ctl_bash script as /data/TBase/pgxc_ctl/pgxc_ctl_bash.
 Installing pgxc_ctl_bash script as /data/TBase/pgxc_ctl/pgxc_ctl_bash.
 Reading configuration using /data/TBase/pgxc_ctl/pgxc_ctl_bash --home /data/TBase/pgxc_ctl --configuration /data/TBase/pgxc_ctl/pgxc_ctl.conf
 Finished reading configuration.
    ******** PGXC_CTL START ***************
 
 Current directory: /data/TBase/pgxc_ctl
 PGXC clean all
 
 
 [TBase@TENCENT64 ~]$ pgxc_ctl
 /usr/bin/bash
 Installing pgxc_ctl_bash script as /data/TBase/pgxc_ctl/pgxc_ctl_bash.
 Installing pgxc_ctl_bash script as /data/TBase/pgxc_ctl/pgxc_ctl_bash.
 Reading configuration using /data/TBase/pgxc_ctl/pgxc_ctl_bash --home /data/TBase/pgxc_ctl --configuration /data/TBase/pgxc_ctl/pgxc_ctl.conf
 Finished reading configuration.
    ******** PGXC_CTL START ***************
 
 Current directory: /data/TBase/pgxc_ctl
 PGXC init all
 Initialize GTM master
 EXECUTE DIRECT ON (dn002) 'ALTER NODE dn002 WITH (TYPE=''datanode'', 	HOST=''10.240.138.159'', PORT=40004, PREFERRED)';
 EXECUTE DIRECT
 EXECUTE DIRECT ON (dn002) 'SELECT pgxc_pool_reload()';
  pgxc_pool_reload 
 ------------------
  t
 (1 row)

 Done.

Show cluster status

When the above output is found, the cluster is OK. In addition, you can show the cluster status through the monitor all command of the pgxc_ctl tool

 [TBase@TENCENT64 ~/pgxc_ctl]$ pgxc_ctl
 /usr/bin/bash
 Installing pgxc_ctl_bash script as /data/TBase/pgxc_ctl/pgxc_ctl_bash.
 Installing pgxc_ctl_bash script as /data/TBase/pgxc_ctl/pgxc_ctl_bash.
 Reading configuration using /data/TBase/pgxc_ctl/pgxc_ctl_bash --home /data/TBase/pgxc_ctl --configuration /data/TBase/pgxc_ctl/pgxc_ctl.conf
 Finished reading configuration.
    ******** PGXC_CTL START ***************
 
 Current directory: /data/TBase/pgxc_ctl
 PGXC monitor all
 Running: gtm master
 Not running: gtm slave
 Running: coordinator master cn001
 Running: coordinator master cn002
 Running: datanode master dn001
 Running: datanode slave dn001
 Running: datanode master dn002
 Not running: datanode slave dn002

If the replication mode between the master and slave nodes is not synchronous replication (meaning asynchronous replication), the failure of GTM salve and DN slave will not affect the access.

cluster access There is basically no difference between accessing TBase cluster and PostgreSQL. We can access database cluster through any CN: for example, we can view the topology of the cluster by connecting CN node to select pgxc_node table (the standby node will not be shown in pgxc_node under the current configuration).
The example of accessing through psql under the Linux command line is as follows:

 [TBase@TENCENT64 ~/pgxc_ctl]$ psql -h 10.215.147.158 -p 30004 -d postgres -U TBase
 psql (PostgreSQL 10.0 TBase V2)
 Type "help" for help.

 postgres=# \d
 Did not find any relations.
 postgres=# select * from pgxc_node;
  node_name | node_type | node_port |   node_host    | nodeis_primary | nodeis_preferred |  node_id   | node_cluster_name 
 -----------+-----------+-----------+----------------+----------------+------------------+------------+-------------------
  gtm       | G         |     50001 | 10.215.147.158 | t              | f                |  428125959 | TBase_cluster
  cn001     | C         |     30004 | 10.215.147.158 | f              | f                | -264077367 | TBase_cluster
  cn002     | C         |     30004 | 10.240.138.159 | f              | f                | -674870440 | TBase_cluster
  dn001     | D         |     40004 | 10.215.147.158 | t              | t                | 2142761564 | TBase_cluster
  dn002     | D         |     40004 | 10.240.138.159 | f              | f                |  -17499968 | TBase_cluster
 (5 rows)

Before using the database, you need to create the default group and sharding group

TBase uses datanode group to increase the management flexibility of nodes. A default group is required to be used, so it needs to be created in advance. In general, all datanode nodes will be added to the default group.

On the other hand, in order to increase the flexibility of TBase data distribution, an intermediate logic layer is added to maintain the mapping of data records to physical nodes, which is called sharding. Therefore, sharding needs to be created in advance. The command is as follows:

 postgres=# create default node group default_group  with (dn001,dn002);
 CREATE NODE GROUP
 postgres=# create sharding group to group default_group;
 CREATE SHARDING GROUP

Create database, user, table, insert/delete/update/select, etc

Now you can access the database cluster just like you can use a stand-alone database

 postgres=# create database test;
 CREATE DATABASE
 postgres=# create user test with password 'test';
 CREATE ROLE
 postgres=# alter database test owner to test;
 ALTER DATABASE
 postgres=# \c test test
 You are now connected to database "test" as user "test".
 test=> create table foo(id bigint, str text) distribute by shard(id);
 CREATE TABLE
 test=> insert into foo values(1, 'tencent'), (2, 'shenzhen');
 COPY 2
 test=> select * from foo;
  id |   str    
 ----+----------
   1 | tencent
   2 | shenzhen
 (2 rows)

stop cluster

Stop the cluster through the stop all command of pgxc_ctl tool.
The stop all command can be followed by the parameter -m fast or -m immediate to decide how to stop each node

 PGXC stop all -m fast
 Stopping all the coordinator masters.
 Stopping coordinator master cn001.
 Stopping coordinator master cn002.
 Done.
 Stopping all the datanode slaves.
 Stopping datanode slave dn001.
 Stopping datanode slave dn002.
 pg_ctl: PID file "/data/TBase/data/dn002/postmaster.pid" does not exist
 Is server running?
 Stopping all the datanode masters.
 Stopping datanode master dn001.
 Stopping datanode master dn002.
 Done.
 Stop GTM slave
 waiting for server to shut down..... done
 server stopped
 Stop GTM master
 waiting for server to shut down.... done
 server stopped
 PGXC monitor all
 Not running: gtm master
 Not running: gtm slave
 Not running: coordinator master cn001
 Not running: coordinator master cn002
 Not running: datanode master dn001
 Not running: datanode slave dn001
 Not running: datanode master dn002
 Not running: datanode slave dn002

start cluster

Start the cluster through the start all command of pgxc_ctl tool.

 [TBase@TENCENT64 ~]$ pgxc_ctl
 /usr/bin/bash
 Installing pgxc_ctl_bash script as /data/TBase/pgxc_ctl/pgxc_ctl_bash.
 Installing pgxc_ctl_bash script as /data/TBase/pgxc_ctl/pgxc_ctl_bash.
 Reading configuration using /data/TBase/pgxc_ctl/pgxc_ctl_bash --home /data/TBase/pgxc_ctl --configuration /data/TBase/pgxc_ctl/pgxc_ctl.conf
 Finished reading configuration.
    ******** PGXC_CTL START ***************
 
 Current directory: /data/TBase/pgxc_ctl
 PGXC start all

Concluding remarks This document is just a quick start guide for users, demonstrating how to build a complete TBase cluster step by step from the source code. Refer to other documents for TBase features, use, optimization, problem location, etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0 TBase_Quick_Start.EN

What's TBase

TBase source code compilation and installation

Clone this wiki locally