Skip to content

Configure GSN for performances

ebiiii edited this page Oct 6, 2014 · 2 revisions

The GSN architecture is designed with performance in mind (see Publications). However, the following settings can significantly improve the performances and we invite you to tweak them according to your deployment characteristics. We will focus on the DBMS and the Web server, as they are central components of the architecture and those which deserve a special attention.

##DBMS

The database is a central component in GSN and it is used for both processing the data (window processing, user queries) and storing the processed data. In GSN the software component which handles the communication with the database is called a Storage Manager. The latest version of GSN can handle many different Storage Manager instances, dealing with different databases. This feature improves the flexibility (GSN can define a Storage Manager per Virtual Sensor) and performance. Indeed, we could use a fast in-memory DBMS (such as H2) for window processing and a standard MySQL for storing the data.

Each Storage Manager uses a pool of database connection and this pool can handle a fixed amount of concurrent active connections. Increasing the size of this pool can increase the performances of the system but will typically consume more memory.

These parameters are defined in the build.xml file.

    <property name="max-db-connections" value="8"/>
    <property name="max-sliding-db-connections" value="8"/>
    ...
    <property name="maxMemoryUsage" value="128m"/>

Keep in mind that DBMS specific settings can still be added to your JDBC urls. The example below sets the cache size on a H2 DBMS. Refer to the DB vendors for available settings.

<storage ... url="jdbc:h2:h2db/mydb;CACHE_SIZE=131072" />

##Web server

Another central component of GSN is the Web server which handles the calls to render the UI, to download data and the remote wrappers requests. We are currently using the Jetty Web server configured to use a thread pool. Changing the max size of this pool can improve the performances when GSN is requested by a big number of users. By user we mean each rendered UI and each remote wrapper. This setting can be found in the build.xml file.

<property name="max-servlets" value="20"/>

##Query Performance

You can use the following ANT task to evaluate the performances of your setup.

ant eval-queries 

This command will first retrieve the list of virtual sensor from the GSN instance and then will generate queries for the /multidata servlet like the following

http://localhost:22001/multidata?vs[0]=hist_imis_zer_3&field[0]=All&download_mode=inline&download_format=csv&nb=SPECIFIED&nb_value=50000

This will produce the following output.

[java] ...
[java] ------ GSN Queries Result --------
[java] | URL: http://montblanc.slf.ch:22001
[java] | Eval duration: 22.239 [s]
[java] | Nb Queries   : 20
[java] | Tuples       : sum:98637.000, min:3637.000, max:5000.000, mean:4931.850, var:92888.450 [no unit]
[java] | Fields       : sum:335.000, min:8.000, max:31.000, mean:16.750, var:44.197 [no unit]
[java] | Raw Data     : sum:93.686, min:2.190, max:8.290, mean:4.684, var:2.939 [MB]
[java] | Download time: sum:186.678, min:2.372, max:19.134, mean:9.334, var:19.111 [s]
[java] | Tuple Rate   : sum:14125.635, min:261.315, max:2107.926, mean:706.282, var:221517.436 [tuple/s]
[java] | Field Rate   : sum:42.686, min:.748, max:6.745, mean:2.134, var:1.659 [field/s]
[java] | Data Rate    : sum:12.299, min:.221, max:2.198, mean:.615, var:.179 [MB/s]
[java] -----------------------------------

The parameters given to the experiment can be set in the build.xml file

  • nbQueries: The number of queries to be executed.
  • nbThreads: The maximum number of queries executed in parallel
  • maxQuerySize: The maximum number of StreamElement to be retrieved per query
  • gsnUrl: The url (host and port) of the GSN instance to be tested

###Example

Request Tuples nb tuple/s
1 3125 53879
1 6250 67934
1 12500 66844
1 25000 29585
1 50000 20169
1 100000 17206
1 200000 16433
2 3125 47712
2 6250 59300
2 12500 47984
2 25000 23505
2 50000 16835
2 100000 16236
2 200000 16208
4 3125 33341
4 6250 16277
4 12500 12686
4 25000 10265
4 50000 10054
4 100000 9548
4 200000 9368

##Data Insertion Performance

###Example: Distributed setup

In this setup, the main storage database is running on a remote host. In this case, using a local storage manager for processing should reduce the network access and thus improve the performances.

TBD

###Example: Local Setup

  • 178 Virtual Sensors
  • One Input Stream (csv wrapper), ~10 Output Fields, storage-size="1" sampling-rate="1"
  • Simple select queries
  • Server Characteristics: Intel(R) Xeon(R) CPU E5430 @2.66GHz / 4GB RAM
  • MySQL Server version: 5.0.51a-3ubuntu5.4 (Ubuntu), url1: jdbc:mysql://localhost/timothee, url2: jdbc:mysql://localhost/timotheesliding
  • H2 v1.1.116, url: jdbc:h2:mem:s;DB_CLOSE_DELAY=-1

Once all the virtual sensors are loaded, we wait 1 minute and count the number of elements in the data storage DB. We then repeat this operation 5 minutes later and then compute the difference.

We use the following query to compute the total number of elements in the storage db:

SELECT NOW(), table_schema, sum(table_rows) FROM information_schema.TABLES WHERE table_schema = 'timothee' AND table_name LIKE 'hist_imis_%' GROUP BY table_schema;

Max Memory Pool Size Storage DB Sliding DB Insertion rate [elt/s]
128MB 8 MySQL - 660
128MB 8 MySQL MySQL 635
128MB 8 MySQL H2 390
128MB 16 MySQL - 760
128MB 16 MySQL MySQL 730
128MB 16 MySQL H2 410
256MB 8 MySQL - 675
256MB 8 MySQL MySQL 640
256MB 8 MySQL H2 605
512MB 8 MySQL - 720
512MB 8 MySQL MySQL 690
512MB 8 MySQL H2 725
512MB 16 MySQL - 770
512MB 16 MySQL MySQL 730
512MB 16 MySQL H2 755
2048MB 8 MySQL - 710
2048MB 8 MySQL MySQL 640
2048MB 8 MySQL H2 830
2048MB 16 MySQL - 770
2048MB 16 MySQL MySQL 685
2048MB 16 MySQL H2 840

Observations

  • In-memory processing based on H2 needs a large amount of memory to outperform MySQL (> 512MB).
  • Split StorageManager overhead < 10%
Clone this wiki locally