Configure GSN for performances

The GSN architecture is designed with performance in mind (see Publications). However, the following settings can significantly improve the performances and we invite you to tweak them according to your deployment characteristics. We will focus on the DBMS and the Web server, as they are central components of the architecture and those which deserve a special attention.

##DBMS

The database is a central component in GSN and it is used for both processing the data (window processing, user queries) and storing the processed data. In GSN the software component which handles the communication with the database is called a Storage Manager. The latest version of GSN can handle many different Storage Manager instances, dealing with different databases. This feature improves the flexibility (GSN can define a Storage Manager per Virtual Sensor) and performance. Indeed, we could use a fast in-memory DBMS (such as H2) for window processing and a standard MySQL for storing the data.

Each Storage Manager uses a pool of database connection and this pool can handle a fixed amount of concurrent active connections. Increasing the size of this pool can increase the performances of the system but will typically consume more memory.

These parameters are defined in the build.xml file.

    <property name="max-db-connections" value="8"/>
    <property name="max-sliding-db-connections" value="8"/>
    ...
    <property name="maxMemoryUsage" value="128m"/>

Keep in mind that DBMS specific settings can still be added to your JDBC urls. The example below sets the cache size on a H2 DBMS. Refer to the DB vendors for available settings.

<storage ... url="jdbc:h2:h2db/mydb;CACHE_SIZE=131072" />

##Web server

Another central component of GSN is the Web server which handles the calls to render the UI, to download data and the remote wrappers requests. We are currently using the Jetty Web server configured to use a thread pool. Changing the max size of this pool can improve the performances when GSN is requested by a big number of users. By user we mean each rendered UI and each remote wrapper. This setting can be found in the build.xml file.

<property name="max-servlets" value="20"/>

##Query Performance

You can use the following ANT task to evaluate the performances of your setup.

ant eval-queries

This command will first retrieve the list of virtual sensor from the GSN instance and then will generate queries for the /multidata servlet like the following

http://localhost:22001/multidata?vs[0]=hist_imis_zer_3&field[0]=All&download_mode=inline&download_format=csv&nb=SPECIFIED&nb_value=50000

This will produce the following output.

[java] ...
[java] ------ GSN Queries Result --------
[java] | URL: http://montblanc.slf.ch:22001
[java] | Eval duration: 22.239 [s]
[java] | Nb Queries   : 20
[java] | Tuples       : sum:98637.000, min:3637.000, max:5000.000, mean:4931.850, var:92888.450 [no unit]
[java] | Fields       : sum:335.000, min:8.000, max:31.000, mean:16.750, var:44.197 [no unit]
[java] | Raw Data     : sum:93.686, min:2.190, max:8.290, mean:4.684, var:2.939 [MB]
[java] | Download time: sum:186.678, min:2.372, max:19.134, mean:9.334, var:19.111 [s]
[java] | Tuple Rate   : sum:14125.635, min:261.315, max:2107.926, mean:706.282, var:221517.436 [tuple/s]
[java] | Field Rate   : sum:42.686, min:.748, max:6.745, mean:2.134, var:1.659 [field/s]
[java] | Data Rate    : sum:12.299, min:.221, max:2.198, mean:.615, var:.179 [MB/s]
[java] -----------------------------------

The parameters given to the experiment can be set in the build.xml file

nbQueries: The number of queries to be executed.
nbThreads: The maximum number of queries executed in parallel
maxQuerySize: The maximum number of StreamElement to be retrieved per query
gsnUrl: The url (host and port) of the GSN instance to be tested

###Example

Request	Tuples nb	tuple/s
1	3125	53879
1	6250	67934
1	12500	66844
1	25000	29585
1	50000	20169
1	100000	17206
1	200000	16433
2	3125	47712
2	6250	59300
2	12500	47984
2	25000	23505
2	50000	16835
2	100000	16236
2	200000	16208
4	3125	33341
4	6250	16277
4	12500	12686
4	25000	10265
4	50000	10054
4	100000	9548
4	200000	9368

##Data Insertion Performance

###Example: Distributed setup

In this setup, the main storage database is running on a remote host. In this case, using a local storage manager for processing should reduce the network access and thus improve the performances.

TBD

###Example: Local Setup

178 Virtual Sensors
One Input Stream (csv wrapper), ~10 Output Fields, storage-size="1" sampling-rate="1"
Simple select queries
Server Characteristics: Intel(R) Xeon(R) CPU E5430 @2.66GHz / 4GB RAM
MySQL Server version: 5.0.51a-3ubuntu5.4 (Ubuntu), url1: jdbc:mysql://localhost/timothee, url2: jdbc:mysql://localhost/timotheesliding
H2 v1.1.116, url: jdbc:h2:mem:s;DB_CLOSE_DELAY=-1

Once all the virtual sensors are loaded, we wait 1 minute and count the number of elements in the data storage DB. We then repeat this operation 5 minutes later and then compute the difference.

We use the following query to compute the total number of elements in the storage db:

SELECT NOW(), table_schema, sum(table_rows) FROM information_schema.TABLES WHERE table_schema = 'timothee' AND table_name LIKE 'hist_imis_%' GROUP BY table_schema;

Max Memory	Pool Size	Storage DB	Sliding DB	Insertion rate [elt/s]
128MB	8	MySQL	-	660
128MB	8	MySQL	MySQL	635
128MB	8	MySQL	H2	390
128MB	16	MySQL	-	760
128MB	16	MySQL	MySQL	730
128MB	16	MySQL	H2	410
256MB	8	MySQL	-	675
256MB	8	MySQL	MySQL	640
256MB	8	MySQL	H2	605
512MB	8	MySQL	-	720
512MB	8	MySQL	MySQL	690
512MB	8	MySQL	H2	725
512MB	16	MySQL	-	770
512MB	16	MySQL	MySQL	730
512MB	16	MySQL	H2	755
2048MB	8	MySQL	-	710
2048MB	8	MySQL	MySQL	640
2048MB	8	MySQL	H2	830
2048MB	16	MySQL	-	770
2048MB	16	MySQL	MySQL	685
2048MB	16	MySQL	H2	840

Observations

In-memory processing based on H2 needs a large amount of memory to outperform MySQL (> 512MB).
Split StorageManager overhead < 10%

Welcome to the gsn wiki!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configure GSN for performances

Clone this wiki locally