-
Notifications
You must be signed in to change notification settings - Fork 44
Configure GSN for performances
The GSN architecture is designed with performance in mind (see Publications). However, the following settings can significantly improve the performances and we invite you to tweak them according to your deployment characteristics. We will focus on the DBMS and the Web server, as they are central components of the architecture and those which deserve a special attention.
##DBMS
The database is a central component in GSN and it is used for both processing the data (window processing, user queries) and storing the processed data. In GSN the software component which handles the communication with the database is called a Storage Manager. The latest version of GSN can handle many different Storage Manager instances, dealing with different databases. This feature improves the flexibility (GSN can define a Storage Manager per Virtual Sensor) and performance. Indeed, we could use a fast in-memory DBMS (such as H2) for window processing and a standard MySQL for storing the data.
Each Storage Manager uses a pool of database connection and this pool can handle a fixed amount of concurrent active connections. Increasing the size of this pool can increase the performances of the system but will typically consume more memory.
These parameters are defined in the build.xml file.
<property name="max-db-connections" value="8"/>
<property name="max-sliding-db-connections" value="8"/>
...
<property name="maxMemoryUsage" value="128m"/>
Keep in mind that DBMS specific settings can still be added to your JDBC urls. The example below sets the cache size on a H2 DBMS. Refer to the DB vendors for available settings.
<storage ... url="jdbc:h2:h2db/mydb;CACHE_SIZE=131072" />
##Web server
Another central component of GSN is the Web server which handles the calls to render the UI, to download data and the remote wrappers requests. We are currently using the Jetty Web server configured to use a thread pool. Changing the max size of this pool can improve the performances when GSN is requested by a big number of users. By user we mean each rendered UI and each remote wrapper. This setting can be found in the build.xml file.
<property name="max-servlets" value="20"/>
##Query Performance
You can use the following ANT task to evaluate the performances of your setup.
ant eval-queries
This command will first retrieve the list of virtual sensor from the GSN instance and then will generate queries for the /multidata servlet like the following
http://localhost:22001/multidata?vs[0]=hist_imis_zer_3&field[0]=All&download_mode=inline&download_format=csv&nb=SPECIFIED&nb_value=50000
This will produce the following output.
[java] ...
[java] ------ GSN Queries Result --------
[java] | URL: http://montblanc.slf.ch:22001
[java] | Eval duration: 22.239 [s]
[java] | Nb Queries : 20
[java] | Tuples : sum:98637.000, min:3637.000, max:5000.000, mean:4931.850, var:92888.450 [no unit]
[java] | Fields : sum:335.000, min:8.000, max:31.000, mean:16.750, var:44.197 [no unit]
[java] | Raw Data : sum:93.686, min:2.190, max:8.290, mean:4.684, var:2.939 [MB]
[java] | Download time: sum:186.678, min:2.372, max:19.134, mean:9.334, var:19.111 [s]
[java] | Tuple Rate : sum:14125.635, min:261.315, max:2107.926, mean:706.282, var:221517.436 [tuple/s]
[java] | Field Rate : sum:42.686, min:.748, max:6.745, mean:2.134, var:1.659 [field/s]
[java] | Data Rate : sum:12.299, min:.221, max:2.198, mean:.615, var:.179 [MB/s]
[java] -----------------------------------
The parameters given to the experiment can be set in the build.xml file
- nbQueries: The number of queries to be executed.
- nbThreads: The maximum number of queries executed in parallel
- maxQuerySize: The maximum number of StreamElement to be retrieved per query
- gsnUrl: The url (host and port) of the GSN instance to be tested
###Example
Request | Tuples nb | tuple/s |
---|---|---|
1 | 3125 | 53879 |
1 | 6250 | 67934 |
1 | 12500 | 66844 |
1 | 25000 | 29585 |
1 | 50000 | 20169 |
1 | 100000 | 17206 |
1 | 200000 | 16433 |
2 | 3125 | 47712 |
2 | 6250 | 59300 |
2 | 12500 | 47984 |
2 | 25000 | 23505 |
2 | 50000 | 16835 |
2 | 100000 | 16236 |
2 | 200000 | 16208 |
4 | 3125 | 33341 |
4 | 6250 | 16277 |
4 | 12500 | 12686 |
4 | 25000 | 10265 |
4 | 50000 | 10054 |
4 | 100000 | 9548 |
4 | 200000 | 9368 |
##Data Insertion Performance
###Example: Distributed setup
In this setup, the main storage database is running on a remote host. In this case, using a local storage manager for processing should reduce the network access and thus improve the performances.
TBD
###Example: Local Setup
- 178 Virtual Sensors
- One Input Stream (csv wrapper), ~10 Output Fields, storage-size="1" sampling-rate="1"
- Simple select queries
- Server Characteristics: Intel(R) Xeon(R) CPU E5430 @2.66GHz / 4GB RAM
- MySQL Server version: 5.0.51a-3ubuntu5.4 (Ubuntu), url1: jdbc:mysql://localhost/timothee, url2: jdbc:mysql://localhost/timotheesliding
- H2 v1.1.116, url: jdbc:h2:mem:s;DB_CLOSE_DELAY=-1
Once all the virtual sensors are loaded, we wait 1 minute and count the number of elements in the data storage DB. We then repeat this operation 5 minutes later and then compute the difference.
We use the following query to compute the total number of elements in the storage db:
SELECT NOW(), table_schema, sum(table_rows) FROM information_schema.TABLES WHERE table_schema = 'timothee' AND table_name LIKE 'hist_imis_%' GROUP BY table_schema;
Max Memory | Pool Size | Storage DB | Sliding DB | Insertion rate [elt/s] |
---|---|---|---|---|
128MB | 8 | MySQL | - | 660 |
128MB | 8 | MySQL | MySQL | 635 |
128MB | 8 | MySQL | H2 | 390 |
128MB | 16 | MySQL | - | 760 |
128MB | 16 | MySQL | MySQL | 730 |
128MB | 16 | MySQL | H2 | 410 |
256MB | 8 | MySQL | - | 675 |
256MB | 8 | MySQL | MySQL | 640 |
256MB | 8 | MySQL | H2 | 605 |
512MB | 8 | MySQL | - | 720 |
512MB | 8 | MySQL | MySQL | 690 |
512MB | 8 | MySQL | H2 | 725 |
512MB | 16 | MySQL | - | 770 |
512MB | 16 | MySQL | MySQL | 730 |
512MB | 16 | MySQL | H2 | 755 |
2048MB | 8 | MySQL | - | 710 |
2048MB | 8 | MySQL | MySQL | 640 |
2048MB | 8 | MySQL | H2 | 830 |
2048MB | 16 | MySQL | - | 770 |
2048MB | 16 | MySQL | MySQL | 685 |
2048MB | 16 | MySQL | H2 | 840 |
Observations
- In-memory processing based on H2 needs a large amount of memory to outperform MySQL (> 512MB).
- Split StorageManager overhead < 10%