Skip to content
Siddharth Karandikar edited this page Jun 7, 2021 · 9 revisions

Draft design

I am building a tool called psb to benchmark PromQL query performance across multiple workers/clients against a Promscale instance. This tool will be available as a CLI command and can only be run from single machine.

CLI

  • psb will need following input

    • Query file path
    • Number of workers to spawn
    • Number of iterations to do over given set of queries OR duration for which tool should keep running those set of queries. Duration is not a high priority as anyone can hit ctrl-c to stop the benchmark.
    • Max query execution time to wait for. It will be used as http client timeout.
    • Server details
  • Actual CLI will look like this

psb
  -u Promscale server URL
  -f query file path
  -i number of iterations
  -d duration for which benchmark should run. If this is provided, -i will be ignored.
  -c number of concurrent workers/clients to spawn
  -t time out duration for which client will wait for result
  -cpus how many CPUs from local machine to be used

Design details

If we are going to benchmark anything like HTTP server, biggest bottleneck could be network bandwidth available at client end as well as server end. If server has bigger capacity, care needs to be taken to make sure that we are not bottlenecked at client end. If we are we might need to spread the traffic over multiple client machines.

For benchmarking Promscale, I am thinking that most of the queries will need some processing on server side. Unless queries are pulling lots and lots of data, network io is not going to be a bottleneck. Considering this and to keep scope limited for 1st version, I am not thinking of doing anything to spread clients across multiple machines.

This is how psb will work overall

  • Parse input flags
  • Read queries from input and keep them ready for distribution to workers
  • Before putting all the load to given server, we will call Promscale's health check API. It that works well, we will go ahead and start workers and benchmarks.
  • For every query in input
    • Based on input start workers. Each worker is 1 go-routine.
    • Each worker will run for N iterations or for D duration with that same query
    • Its responsibility of worker to keep track of query performance numbers and record those details.
    • Once done, report will be generated and printed. Then we move to next query.
  • Main goroutine will wait for all workers to finish their job with all queries or Ctrl-C.

Handling timestamps and support of query types

  • Input obs-queries.csv uses 13digit timestamp. i.e. its providing it in milliseconds. I am thinking of assuming that input timestamps will always be in milliseconds and fixing my parsing to that.
  • All the queries in this file are range queries. Changing from range queries to instant queries is not big thing to do, but again, for 1st iteration, I am thinking of fixing the query type to range-query.
Clone this wiki locally