Measure Crawler Performance and Scalability #309

canturkisci · 2017-06-05T16:44:16Z

Description

We will measure crawler performance and scalability. This includes, scan rates (Hz) per feature and combo of features. What is the #containers/sec measure we can achieve in a given host/VM. Observed bottlenecks.

We will use this issue to report our findings.

sahilsuneja1 · 2017-06-06T19:35:43Z

Above: Time to crawl different features for 200 containers

Above: Effective crawl frequency per container.

Notes:

The left bar represents the case where the crawler syn-
chronizes with the docker daemon on every monitoring iteration to
get metadata for each container, whereas the right bar represents
the optimization where the crawler caches the container metadata
after the first iteration, and subscribes to docker events to update its
cached metadata asynchronously based upon container creation or
deletion events.
This is with a single crawler
process consuming a single CPU core (for many feature plugins,
actually only 70% of a core, with time spent waiting either for (i)
crawling the containers’ rootfs from disk, or (ii) the kernel while
reading cgroups stats and/or during namespace jumping).
crawl times for feature combinations can be calcu-
lated by adding the individual components together. For example,
simultaneously enabling the CPU, memory and package crawler
plugins yields a combined base crawl latency of 6s (∼ 0.18 + 1 +
4.8s for respective plugins).

sahilsuneja1 · 2017-07-17T20:56:44Z

With regards to changing watson crawler version from 0.1 to master, we (prabhakar + sastry + me) performed the following tests on master branch crawler version:

verified crawler-emitted graphite-formatted data shows up on logmet
verified log linking works- saw docker.log for containers
measured mem and cpu usage across time for crawling metrics on host running 100 httpd containers (os,disk,cpu,memory,interface,load for host, and cpu,memory,interface,disk for containers)
TODO: verify container-specific log files (mentioned in some /etc/* file ) are being extracted/linked as well (@sastryduri )

Experiment was run for almost 3 days continuosuly. All looks stable- mem and CPU usage remains in limit, http response rates and times don't suffer (the response rate in the figures below should be multipled across 100 containers i.e. 70x100 = 7000 replies/s.)

sahilsuneja1 · 2017-07-17T20:57:12Z

@canturkisci @kudva @sastryduri

kudvanpk · 2017-07-17T22:10:33Z

thanks @sahilsuneja1 Great work, thanks! and looks good. I will make two pull requests to you today that will repeat the changes I made for v0.1 to master. Then I will ready another release of the cloudsight-crawler.
cc: @canturkisci @sastry

canturkisci assigned sahilsuneja1 Jun 5, 2017

canturkisci added armada perf labels Jun 7, 2017

cclauss mentioned this issue Aug 3, 2017

print() and other flake8 fixes for tests #323

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Measure Crawler Performance and Scalability #309

Measure Crawler Performance and Scalability #309

canturkisci commented Jun 5, 2017

sahilsuneja1 commented Jun 6, 2017

sahilsuneja1 commented Jul 17, 2017 •

edited

Loading

sahilsuneja1 commented Jul 17, 2017

kudvanpk commented Jul 17, 2017

Measure Crawler Performance and Scalability #309

Measure Crawler Performance and Scalability #309

Comments

canturkisci commented Jun 5, 2017

Description

sahilsuneja1 commented Jun 6, 2017

sahilsuneja1 commented Jul 17, 2017 • edited Loading

sahilsuneja1 commented Jul 17, 2017

kudvanpk commented Jul 17, 2017

sahilsuneja1 commented Jul 17, 2017 •

edited

Loading