Skip to content

0 TBase_Quick_Start.EN

JennyJennyChen edited this page Mar 9, 2021 · 1 revision

What's TBase

TBase is a relational database cluster platform that provides write reliability and multi node data synchronization. You can configure TBase on one or more hosts and store TBase data on multiple physical hosts. There are two ways to store data tables: distributed or replicated. When sending query SQL to TBase, TBase will automatically send query statements to data nodes and get the final results.

TBase is a distributed cluster architecture (as shown in the figure below), which is a distributed share nothing architecture mode, each node is independent and processes its own data. After processing, the results may be summarized to the upper layer or transferred between nodes. Each processing unit communicates with each other through the network protocol, with better parallel processing and expansion ability, which also means that the TBase cluster can be deployed on the basic x86 server.

TBase architecture

Here are three modules of TBase

  • Coordinator Node: (CN for short)

    Application access portal is responsible for data distribution and query planning. Multiple nodes are in the same location, and each node provides the same database view. In terms of function, CN only stores the global metadata of the system, not the actual user data.

  • Datanode Node: (DN for short)

    Each DN stores the partition of user data. In function, DN node is responsible for completing the request of executing CN distribution.

  • GTM Node: (Global Transaction Manager)

    GTM is responsible for the management of cluster transaction information, as well as the global objects of the cluster, such as sequence.

Next, let's show how to build a TBase cluster environment from the source code

TBase source code compilation and installation

  • create user

    Note: all machines that need to install TBase cluster need to create

     mkdir /data
     useradd -d /data/TBase TBase
    
  • get source code

    git clone https://github.com/Tencent/TBase

  • source code compilation

    cd ${SOURCECODE_PATH}
     rm -rf ${INSTALL_PATH}/TBase_bin_v2.0
     chmod +x configure*
     ./configure --prefix=${INSTALL_PATH}/TBase_bin_v2.0  --enable-user-switch --with-openssl  --with-ossp-uuid CFLAGS=-g
     make clean
     make -sj
     make install
     chmod +x contrib/pgxc_ctl/make_signature
     cd contrib
     make -sj
     make install
    

    In this paper, the above two parameters are as follows

     ${SOURCECODE_PATH}=/data/TBase/TBase-master
     ${INSTALL_PATH}=/data/TBase/install
    
  • cluster installation

    • cluster planning

    Next, set up a cluster of 1 GTM master, 1 GTM standby, 2 CN master (the CN master is equivalent, so there is no need to standy CN), 2 DN master, and 2 DN standby on two servers. This cluster is the minimum configuration with disaster tolerance capability.

     host1:10.215.147.158
     host2:10.240.138.159
    

    planning is as follows:

    node name IP data directory
    GTM master 10.215.147.158 /data/TBase/data/gtm
    GTM slave 10.240.138.159 /data/TBase/data/gtm
    CN1 10.215.147.158 /data/TBase/data/coord
    CN2 10.240.138.159 /data/TBase/data/coord
    DN1 master 10.215.147.158 /data/TBase/data/dn001
    DN1 slave 10.240.138.159 /data/TBase/data/dn001
    DN2 master 10.240.138.159 /data/TBase/data/dn002
    DN2 slave 10.215.147.158 /data/TBase/data/dn002

    Sketch Map:

    TBase Deploy Sketch Map

    • SSH mutual trust configuration between machines

    Reference resources: Linux ssh mutual trust

    • Environment variable configuration

    All machines in the cluster need to be configured

     [TBase@TENCENT64 ~]$ vim ~/.bashrc
     export TBase_HOME=/data/TBase/install/TBase_bin_v2.0
     export PATH=$TBase_HOME/bin:$PATH
     export LD_LIBRARY_PATH=$TBase_HOME/lib:${LD_LIBRARY_PATH}

    Above, the required basic environment has been configured, and you can enter the cluster initialization stage. For the convenience of users, TBase provides special configuration and operation tools: pgxc_ctl to help users quickly build and manage clusters. Here, you need to write the IP, port and data directory of the nodes mentioned above into the configuration file pgxc_ctl.conf

    • Initialization configuration file pgxc_ctl.conf
     [TBase@TENCENT64 ~]$ mkdir /data/TBase/pgxc_ctl
     [TBase@TENCENT64 ~]$ cd /data/TBase/pgxc_ctl
     [TBase@TENCENT64 ~/pgxc_ctl]$ vim pgxc_ctl.conf

    The following shows the pgxc_ctl.conf file content written using the IP, port, database directory, binary directory and other planning values described above. In practice, we only need to configure it according to our own actual situation.

     #!/bin/bash
    
     pgxcInstallDir=/data/TBase/install/TBase_bin_v2.0
     pgxcOwner=TBase
     defaultDatabase=postgres
     pgxcUser=$pgxcOwner
     tmpDir=/tmp
     localTmpDir=$tmpDir
     configBackup=n
     configBackupHost=pgxc-linker
     configBackupDir=$HOME/pgxc
     configBackupFile=pgxc_ctl.bak
     
     
     #---- GTM ----------
     gtmName=gtm
     gtmMasterServer=10.215.147.158
     gtmMasterPort=50001
     gtmMasterDir=/data/TBase/data/gtm
     gtmExtraConfig=none
     gtmMasterSpecificExtraConfig=none
     gtmSlave=y
     gtmSlaveServer=10.240.138.159
     gtmSlavePort=50001
     gtmSlaveDir=/data/TBase/data/gtm
     gtmSlaveSpecificExtraConfig=none
     
     #---- Coordinators -------
     coordMasterDir=/data/TBase/data/coord
     coordArchLogDir=/data/TBase/data/coord_archlog
     
     coordNames=(cn001 cn002 )
     coordPorts=(30004 30004 )
     poolerPorts=(31110 31110 )
     coordPgHbaEntries=(0.0.0.0/0)
     coordMasterServers=(10.215.147.158 10.240.138.159)
     coordMasterDirs=($coordMasterDir $coordMasterDir)
     coordMaxWALsernder=2
     coordMaxWALSenders=($coordMaxWALsernder $coordMaxWALsernder )
     coordSlave=n
     coordSlaveSync=n
     coordArchLogDirs=($coordArchLogDir $coordArchLogDir)
     
     coordExtraConfig=coordExtraConfig
     cat > $coordExtraConfig <<EOF
     #================================================
     # Added to all the coordinator postgresql.conf
     # Original: $coordExtraConfig
     
     include_if_exists = '/data/TBase/global/global_TBase.conf'
     
     wal_level = replica
     wal_keep_segments = 256 
     max_wal_senders = 4
     archive_mode = on 
     archive_timeout = 1800 
     archive_command = 'echo 0' 
     log_truncate_on_rotation = on 
     log_filename = 'postgresql-%M.log' 
     log_rotation_age = 4h 
     log_rotation_size = 100MB
     hot_standby = on 
     wal_sender_timeout = 30min 
     wal_receiver_timeout = 30min 
     shared_buffers = 1024MB 
     max_pool_size = 2000
     log_statement = 'ddl'
     log_destination = 'csvlog'
     logging_collector = on
     log_directory = 'pg_log'
     listen_addresses = '*'
     max_connections = 2000
     
     EOF
     
     coordSpecificExtraConfig=(none none)
     coordExtraPgHba=coordExtraPgHba
     cat > $coordExtraPgHba <<EOF
     
     local   all             all                                     trust
     host    all             all             0.0.0.0/0               trust
     host    replication     all             0.0.0.0/0               trust
     host    all             all             ::1/128                 trust
     host    replication     all             ::1/128                 trust
     
     
     EOF
     
     
     coordSpecificExtraPgHba=(none none)
     coordAdditionalSlaves=n	
     cad1_Sync=n
     
     #---- Datanodes ---------------------
     dn1MstrDir=/data/TBase/data/dn001
     dn2MstrDir=/data/TBase/data/dn002
     dn1SlvDir=/data/TBase/data/dn001
     dn2SlvDir=/data/TBase/data/dn002
     dn1ALDir=/data/TBase/data/datanode_archlog
     dn2ALDir=/data/TBase/data/datanode_archlog
     
     primaryDatanode=dn001
     datanodeNames=(dn001 dn002)
     datanodePorts=(40004 40004)
     datanodePoolerPorts=(41110 41110)
     datanodePgHbaEntries=(0.0.0.0/0)
     datanodeMasterServers=(10.215.147.158 10.240.138.159)
     datanodeMasterDirs=($dn1MstrDir $dn2MstrDir)
     dnWALSndr=4
     datanodeMaxWALSenders=($dnWALSndr $dnWALSndr)
     
     datanodeSlave=y
     datanodeSlaveServers=(10.240.138.159 10.215.147.158)
     datanodeSlavePorts=(50004 54004)
     datanodeSlavePoolerPorts=(51110 51110)
     datanodeSlaveSync=n
     datanodeSlaveDirs=($dn1SlvDir $dn2SlvDir)
     datanodeArchLogDirs=($dn1ALDir/dn001 $dn2ALDir/dn002)
     
     datanodeExtraConfig=datanodeExtraConfig
     cat > $datanodeExtraConfig <<EOF
     #================================================
     # Added to all the coordinator postgresql.conf
     # Original: $datanodeExtraConfig
     
     include_if_exists = '/data/TBase/global/global_TBase.conf'
     listen_addresses = '*' 
     wal_level = replica 
     wal_keep_segments = 256 
     max_wal_senders = 4
     archive_mode = on 
     archive_timeout = 1800 
     archive_command = 'echo 0' 
     log_directory = 'pg_log' 
     logging_collector = on 
     log_truncate_on_rotation = on 
     log_filename = 'postgresql-%M.log' 
     log_rotation_age = 4h 
     log_rotation_size = 100MB
     hot_standby = on 
     wal_sender_timeout = 30min 
     wal_receiver_timeout = 30min 
     shared_buffers = 1024MB 
     max_connections = 4000 
     max_pool_size = 4000
     log_statement = 'ddl'
     log_destination = 'csvlog'
     wal_buffers = 1GB
     
     EOF
     
     datanodeSpecificExtraConfig=(none none)
     datanodeExtraPgHba=datanodeExtraPgHba
     cat > $datanodeExtraPgHba <<EOF
     
     local   all             all                                     trust
     host    all             all             0.0.0.0/0               trust
     host    replication     all             0.0.0.0/0               trust
     host    all             all             ::1/128                 trust
     host    replication     all             ::1/128                 trust
     
     
     EOF
     
     
     datanodeSpecificExtraPgHba=(none none)
     
     datanodeAdditionalSlaves=n
     walArchive=n
    
    • Distribute binary package

    After writing the configuration file, you need to deploy the binary package to the physical machine where all nodes are located. This can be done by executing the deploy all command with pgxc_ctl tool.

     [TBase@TENCENT64 ~/pgxc_ctl]$ pgxc_ctl 
     /usr/bin/bash
     Installing pgxc_ctl_bash script as /data/TBase/pgxc_ctl/pgxc_ctl_bash.
     Installing pgxc_ctl_bash script as /data/TBase/pgxc_ctl/pgxc_ctl_bash.
     Reading configuration using /data/TBase/pgxc_ctl/pgxc_ctl_bash --home /data/TBase/pgxc_ctl --configuration /data/TBase/pgxc_ctl/pgxc_ctl.conf
     Finished reading configuration.
        ******** PGXC_CTL START ***************
     
     Current directory: /data/TBase/pgxc_ctl
     PGXC deploy all
     Deploying Postgres-XL components to all the target servers.
     Prepare tarball to deploy ... 
     Deploying to the server 10.215.147.158.
     Deploying to the server 10.240.138.159.
     Deployment done.

    Log in to all nodes and check whether the binary package is distributed

     [TBase@TENCENT64 ~/install]$ ls /data/TBase/install/TBase_bin_v2.0
     bin  include  lib  share	
    
    • execute init all command to complete cluster initialization
     [TBase@TENCENT64 ~]$ pgxc_ctl
     /usr/bin/bash
     Installing pgxc_ctl_bash script as /data/TBase/pgxc_ctl/pgxc_ctl_bash.
     Installing pgxc_ctl_bash script as /data/TBase/pgxc_ctl/pgxc_ctl_bash.
     Reading configuration using /data/TBase/pgxc_ctl/pgxc_ctl_bash --home /data/TBase/pgxc_ctl --configuration /data/TBase/pgxc_ctl/pgxc_ctl.conf
     Finished reading configuration.
        ******** PGXC_CTL START ***************
     
     Current directory: /data/TBase/pgxc_ctl
     PGXC init all
     Initialize GTM master
     ....
     ....
     Initialize datanode slave dn001
     Initialize datanode slave dn002
     mkdir: cannot create directory '/data1/TBase': Permission denied
     chmod: cannot access '/data1/TBase/data/dn001': No such file or directory
     pg_ctl: directory "/data1/TBase/data/dn001" does not exist
     pg_basebackup: could not create directory "/data1/TBase": Permission denied
    • Installation error handling

    Generally, if there is an error in initializing the cluster, the terminal will print out the error log. You can look up the error reason and change the configuration, or through the error log in '/data/TBase/pgxc_ctl/pgxc_log' path to check the error in the configuration file

     [TBase@TENCENT64 ~]$ ll ~/pgxc_ctl/pgxc_log/
     total 184
     -rw-rw-r-- 1 TBase TBase 81123 Nov 13 17:22 14105_pgxc_ctl.log
     -rw-rw-r-- 1 TBase TBase  2861 Nov 13 17:58 15762_pgxc_ctl.log
     -rw-rw-r-- 1 TBase TBase 14823 Nov 14 07:59 16671_pgxc_ctl.log
     -rw-rw-r-- 1 TBase TBase  2721 Nov 13 16:52 18891_pgxc_ctl.log
     -rw-rw-r-- 1 TBase TBase  1409 Nov 13 16:20 22603_pgxc_ctl.log
     -rw-rw-r-- 1 TBase TBase 60043 Nov 13 16:33 28932_pgxc_ctl.log
     -rw-rw-r-- 1 TBase TBase 15671 Nov 14 07:57 6849_pgxc_ctl.log

    By running pgxc_ctl tool, execute clean all command to delete the initialized file. Then modify the pgxc_ctl.conf file,and execute the init all command to reinitialize.

     [TBase@TENCENT64 ~]$ pgxc_ctl
     /usr/bin/bash
     Installing pgxc_ctl_bash script as /data/TBase/pgxc_ctl/pgxc_ctl_bash.
     Installing pgxc_ctl_bash script as /data/TBase/pgxc_ctl/pgxc_ctl_bash.
     Reading configuration using /data/TBase/pgxc_ctl/pgxc_ctl_bash --home /data/TBase/pgxc_ctl --configuration /data/TBase/pgxc_ctl/pgxc_ctl.conf
     Finished reading configuration.
        ******** PGXC_CTL START ***************
     
     Current directory: /data/TBase/pgxc_ctl
     PGXC clean all
     
     
     [TBase@TENCENT64 ~]$ pgxc_ctl
     /usr/bin/bash
     Installing pgxc_ctl_bash script as /data/TBase/pgxc_ctl/pgxc_ctl_bash.
     Installing pgxc_ctl_bash script as /data/TBase/pgxc_ctl/pgxc_ctl_bash.
     Reading configuration using /data/TBase/pgxc_ctl/pgxc_ctl_bash --home /data/TBase/pgxc_ctl --configuration /data/TBase/pgxc_ctl/pgxc_ctl.conf
     Finished reading configuration.
        ******** PGXC_CTL START ***************
     
     Current directory: /data/TBase/pgxc_ctl
     PGXC init all
     Initialize GTM master
     EXECUTE DIRECT ON (dn002) 'ALTER NODE dn002 WITH (TYPE=''datanode'', 	HOST=''10.240.138.159'', PORT=40004, PREFERRED)';
     EXECUTE DIRECT
     EXECUTE DIRECT ON (dn002) 'SELECT pgxc_pool_reload()';
      pgxc_pool_reload 
     ------------------
      t
     (1 row)
    
     Done.
  • Show cluster status

    When the above output is found, the cluster is OK. In addition, you can show the cluster status through the monitor all command of the pgxc_ctl tool

     [TBase@TENCENT64 ~/pgxc_ctl]$ pgxc_ctl
     /usr/bin/bash
     Installing pgxc_ctl_bash script as /data/TBase/pgxc_ctl/pgxc_ctl_bash.
     Installing pgxc_ctl_bash script as /data/TBase/pgxc_ctl/pgxc_ctl_bash.
     Reading configuration using /data/TBase/pgxc_ctl/pgxc_ctl_bash --home /data/TBase/pgxc_ctl --configuration /data/TBase/pgxc_ctl/pgxc_ctl.conf
     Finished reading configuration.
        ******** PGXC_CTL START ***************
     
     Current directory: /data/TBase/pgxc_ctl
     PGXC monitor all
     Running: gtm master
     Not running: gtm slave
     Running: coordinator master cn001
     Running: coordinator master cn002
     Running: datanode master dn001
     Running: datanode slave dn001
     Running: datanode master dn002
     Not running: datanode slave dn002

    If the replication mode between the master and slave nodes is not synchronous replication (meaning asynchronous replication), the failure of GTM salve and DN slave will not affect the access.

  • cluster access There is basically no difference between accessing TBase cluster and PostgreSQL. We can access database cluster through any CN: for example, we can view the topology of the cluster by connecting CN node to select pgxc_node table (the standby node will not be shown in pgxc_node under the current configuration).
    The example of accessing through psql under the Linux command line is as follows:

     [TBase@TENCENT64 ~/pgxc_ctl]$ psql -h 10.215.147.158 -p 30004 -d postgres -U TBase
     psql (PostgreSQL 10.0 TBase V2)
     Type "help" for help.
    
     postgres=# \d
     Did not find any relations.
     postgres=# select * from pgxc_node;
      node_name | node_type | node_port |   node_host    | nodeis_primary | nodeis_preferred |  node_id   | node_cluster_name 
     -----------+-----------+-----------+----------------+----------------+------------------+------------+-------------------
      gtm       | G         |     50001 | 10.215.147.158 | t              | f                |  428125959 | TBase_cluster
      cn001     | C         |     30004 | 10.215.147.158 | f              | f                | -264077367 | TBase_cluster
      cn002     | C         |     30004 | 10.240.138.159 | f              | f                | -674870440 | TBase_cluster
      dn001     | D         |     40004 | 10.215.147.158 | t              | t                | 2142761564 | TBase_cluster
      dn002     | D         |     40004 | 10.240.138.159 | f              | f                |  -17499968 | TBase_cluster
     (5 rows)
    • Before using the database, you need to create the default group and sharding group

    TBase uses datanode group to increase the management flexibility of nodes. A default group is required to be used, so it needs to be created in advance. In general, all datanode nodes will be added to the default group.

    On the other hand, in order to increase the flexibility of TBase data distribution, an intermediate logic layer is added to maintain the mapping of data records to physical nodes, which is called sharding. Therefore, sharding needs to be created in advance. The command is as follows:

     postgres=# create default node group default_group  with (dn001,dn002);
     CREATE NODE GROUP
     postgres=# create sharding group to group default_group;
     CREATE SHARDING GROUP
    • Create database, user, table, insert/delete/update/select, etc

    Now you can access the database cluster just like you can use a stand-alone database

     postgres=# create database test;
     CREATE DATABASE
     postgres=# create user test with password 'test';
     CREATE ROLE
     postgres=# alter database test owner to test;
     ALTER DATABASE
     postgres=# \c test test
     You are now connected to database "test" as user "test".
     test=> create table foo(id bigint, str text) distribute by shard(id);
     CREATE TABLE
     test=> insert into foo values(1, 'tencent'), (2, 'shenzhen');
     COPY 2
     test=> select * from foo;
      id |   str    
     ----+----------
       1 | tencent
       2 | shenzhen
     (2 rows)
  • stop cluster

    Stop the cluster through the stop all command of pgxc_ctl tool.
    The stop all command can be followed by the parameter -m fast or -m immediate to decide how to stop each node

     PGXC stop all -m fast
     Stopping all the coordinator masters.
     Stopping coordinator master cn001.
     Stopping coordinator master cn002.
     Done.
     Stopping all the datanode slaves.
     Stopping datanode slave dn001.
     Stopping datanode slave dn002.
     pg_ctl: PID file "/data/TBase/data/dn002/postmaster.pid" does not exist
     Is server running?
     Stopping all the datanode masters.
     Stopping datanode master dn001.
     Stopping datanode master dn002.
     Done.
     Stop GTM slave
     waiting for server to shut down..... done
     server stopped
     Stop GTM master
     waiting for server to shut down.... done
     server stopped
     PGXC monitor all
     Not running: gtm master
     Not running: gtm slave
     Not running: coordinator master cn001
     Not running: coordinator master cn002
     Not running: datanode master dn001
     Not running: datanode slave dn001
     Not running: datanode master dn002
     Not running: datanode slave dn002
  • start cluster

    Start the cluster through the start all command of pgxc_ctl tool.

     [TBase@TENCENT64 ~]$ pgxc_ctl
     /usr/bin/bash
     Installing pgxc_ctl_bash script as /data/TBase/pgxc_ctl/pgxc_ctl_bash.
     Installing pgxc_ctl_bash script as /data/TBase/pgxc_ctl/pgxc_ctl_bash.
     Reading configuration using /data/TBase/pgxc_ctl/pgxc_ctl_bash --home /data/TBase/pgxc_ctl --configuration /data/TBase/pgxc_ctl/pgxc_ctl.conf
     Finished reading configuration.
        ******** PGXC_CTL START ***************
     
     Current directory: /data/TBase/pgxc_ctl
     PGXC start all
  • Concluding remarks This document is just a quick start guide for users, demonstrating how to build a complete TBase cluster step by step from the source code. Refer to other documents for TBase features, use, optimization, problem location, etc.