-
Notifications
You must be signed in to change notification settings - Fork 82
MySQL Adapter
-
Before quick start:
- Follow the instructions on this
page
to install sbt. - Install Docker (version at least 1.10) on the local machine;
- Clone Cloudberry codebase from github.
~> git clone https://github.com/ISG-ICS/cloudberry.git
Suppose the repository is cloned to the folder
~/cloudberry
. - Follow the instructions on this
-
Step 1: run
script/dockerRunxxxDB.sh
: insert data into MySQL~/> cd ~/cloudberry/examples/twittermap ~/twittermap> ./script/dockerRunMySqlDB.sh
The 2nd command will download and run a prebuilt MySQL docker container. The process will take 2-5 minutes or even longer, depending on your network speed.
-
Step 2: setup config file in Cloudberry: Copy
twittermap/web/cloudberry-mysqldb.conf
's content intocloudberry/neo/conf/application.conf
and cover the original codes. -
Step 3: run cloudberry
~/> cd ~/cloudberry/cloudberry ~/cloudberry> sbt "project neo" "run"
-
Step 4: change config file in Twittermap: change
# sqlDB = MySQL
intosqlDB = MySQL
, this change would:- Let the Twittermap register MySQLMigration's schema, i.e.,schema in cloudberry when using MySQL DB, (file
MySqlMigration_20170810
might be renamed intocloudberryDDL
in the future). - Use
common/services-mysql.js
instead when sending queries to cloudberry middleware. The apply ofservices-mysql.js
is set inviews/twittermap/main.scala.html
.
- Let the Twittermap register MySQLMigration's schema, i.e.,schema in cloudberry when using MySQL DB, (file
-
Step 5: run Twittermap
~/> cd ~/cloudberry/examples/twittermap ~/twittermap> sbt "project web" "run 9001"
-
Step 6: MySqlMigration file would register schemas by sending them to cloudberry, the
services-mysql.js
file would send send request and get response from cloudberry middleware. -
Step 7: MySqlGenerator would translate the request query into MySQL's query, and send it to MySqlConn, MySqlConn would send these queries to MySQL DB. The result would be send back to Twittermap. The website would be visited on http://localhost:9001 now
In file ./script/ingestMySqlTwitterToLocalCluster.sh
, you'll the see the schema of sample data's schema in MySQL DB:
CREATE TABLE IF NOT EXISTS `twitter_ds_tweet` (
`place.full_name` varchar(255) DEFAULT NULL,
`place.bounding_box` varchar(255) DEFAULT NULL,
`place.country_code` varchar(255) DEFAULT NULL,
`user.friends_count` bigint(20) NOT NULL,
`user.description` text NOT NULL,
`favorite_count` bigint(20) NOT NULL,
`geo_tag.countyID` bigint(20) DEFAULT NULL,
`user.location` varchar(255) NOT NULL,
`place.type` varchar(255) DEFAULT NULL,
`geo_tag.cityName` varchar(255) DEFAULT NULL,
`user.id` double NOT NULL,
`geo_tag.stateName` varchar(255) DEFAULT NULL,
`geo_tag.cityID` double DEFAULT NULL,
`is_retweet` double NOT NULL,
`text` text NOT NULL,
`user.screen_name` varchar(255) NOT NULL,
`retweet_count` bigint(20) NOT NULL,
`place.country` varchar(255) DEFAULT NULL,
`in_reply_to_user` bigint(20) NOT NULL,
`user.statues_count` varchar(255) NOT NULL,
`id` bigint(20) NOT NULL,
`coordinate` point DEFAULT NULL,
`place.id` varchar(255) DEFAULT NULL,
`in_reply_to_status` bigint(20) NOT NULL,
`geo_tag.stateID` bigint(20) DEFAULT NULL,
`create_at` datetime NOT NULL,
`user.create_at` datetime NOT NULL,
`place.name` varchar(255) DEFAULT NULL,
`lang` varchar(255) NOT NULL,
`user.lang` varchar(255) NOT NULL,
`user.name` varchar(255) NOT NULL,
`geo_tag.countyName` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `create_at` (`create_at`),
FULLTEXT KEY `idx` (`text`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
-
Flattened data: We've flattened the hierarchy records, and use their name to demonstrate their original hierarchy relation. Like "user.id" denoted the id for user.
-
Leave out array columns: Since the format of
array
is not supported in MySqlDB, we left out columnhashtags
anduser_mentions
from MySqlDB. Hence the Twittermap's query would also exclude theHashtagRequest
. -
Full text search: add full text search index on column
text
.
We changed the original sample data sample.adm.gz
into its json format: sample.json
to read each record via php. This process can be done via running twittermap/script/reconstructAdmSampleTweetToJson.sh
(requires python environment).
In file ./script/dockerRunMySqlDB.sh
, we use image mysql:latest
for setting up MySQL database.
Also in file ./script/dockerRunMySqlDB.sh
, we use image spittet/php-mysql
docker container to inject records from ./script/sample.json
into mysql-container
via php. And publish mysql-container
to port 6603.
Remember to pass a Gateway from dockerRunMySql.sh to ingestMySqlSampleTweetToLocalCluster.sh as mysql connection's host, it might be different under different circumstances.
After you have installed the docker container for the first time, you can restart it using the following command in case the docker is shutdown:
~> docker start mysql-container
If you want to shutdown the docker container later, you can use the following command:
~> docker stop mysql-container
- SQLGenerator: an basic abstract class for all SQL family generators.
- MySqlGenerator: an extended class from SQLGenerator for MySQL DB.
- PostgreSqlGenerator: an extended class from SQLGenerator for PostgreSQL DB.
-
SQLGenerator can cover up most translation functions for SQL family.
-
MySqlGenerator contains (also for PostgreSqlGenerator):
- quote, truncate, fullTextMatch, geoAsText, pointGetCoord (p.s.: geo type is very special and complex in MySQL).
- parseCreate
- fieldType2MySQLType
- parseUpsertMeta
- initExprMap
- parseTextRelation
- parseUnnest
- parseGroupByFunc (case Interval)
- parseGeoCell
These values and functions are different in most DB of SQL, so we specify them out into their own generator. And most other functions (like parseQuery, parseAppend, parseLookup … ) is currently sharable for MySQL & PostgreSQL.
-
Unsupported fieldTypes in MySqlGenerator:
The schema send to the generator should not contain any
Bag
,Hierarchy
, orRecord
fields. -
MetaData in SQLGenerator:
Considering the particularity of MetaData, we specially use function
parseUpsertMeta
to insert json type records into theberry.meta
table.
- The connector only differs in the url, but you might have to add another dependency when adding a new sql database, because the driver would be different during the connection.
- The former is for MySql's sample tweet schema, the latter is for PostgreSQL. For the reason that MySQL doesn't support BagField(in middleware) or Array type(in database), we left the
hashtag
anduser_mentions
out inTwitterDataStoreWithoutHashtag
. PostgreSQL supports Array though.
-
You need to add a match case in Migration_20160814, which would send the metaData's query of creation.
-
Also, in class
Cloudberry
, a case should be added for config.AsterixLang. It would decide which DB, generator, connector the back-end would use.
Contains the cloudberry's table's schema, register the schema each time the Twittermap starts up.
The schema for twiiter (in SqlDB is named twitter_ds_tweet
) should be the same as back-end's TwitterDataStore's schema.
Sends request to middleware. in this js file, we removed byHashtagRequest, for its field is not supported in MySQL.
Change web/conf/application.conf:
# sqlDB = MySQL
into
sqlDB = MySQL
Copy the twittermap/web/cloudberry-mysql.conf
's content into cloudberry/neo/conf/application.conf
and cover the original codes. These two files differs in asterixdb.lang
and its accordance url.