Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measure Crawler Performance and Scalability #309

Open
canturkisci opened this issue Jun 5, 2017 · 4 comments
Open

Measure Crawler Performance and Scalability #309

canturkisci opened this issue Jun 5, 2017 · 4 comments
Assignees

Comments

@canturkisci
Copy link
Member

Description

We will measure crawler performance and scalability. This includes, scan rates (Hz) per feature and combo of features. What is the #containers/sec measure we can achieve in a given host/VM. Observed bottlenecks.

We will use this issue to report our findings.

@sahilsuneja1
Copy link
Contributor

crawl-latency
Above: Time to crawl different features for 200 containers

crawl-freq
Above: Effective crawl frequency per container.

Notes:

  1. The left bar represents the case where the crawler syn-
    chronizes with the docker daemon on every monitoring iteration to
    get metadata for each container, whereas the right bar represents
    the optimization where the crawler caches the container metadata
    after the first iteration, and subscribes to docker events to update its
    cached metadata asynchronously based upon container creation or
    deletion events.

  2. This is with a single crawler
    process consuming a single CPU core (for many feature plugins,
    actually only 70% of a core, with time spent waiting either for (i)
    crawling the containers’ rootfs from disk, or (ii) the kernel while
    reading cgroups stats and/or during namespace jumping).

  3. crawl times for feature combinations can be calcu-
    lated by adding the individual components together. For example,
    simultaneously enabling the CPU, memory and package crawler
    plugins yields a combined base crawl latency of 6s (∼ 0.18 + 1 +
    4.8s for respective plugins).

@sahilsuneja1
Copy link
Contributor

sahilsuneja1 commented Jul 17, 2017

With regards to changing watson crawler version from 0.1 to master, we (prabhakar + sastry + me) performed the following tests on master branch crawler version:

  1. verified crawler-emitted graphite-formatted data shows up on logmet
  2. verified log linking works- saw docker.log for containers
  3. measured mem and cpu usage across time for crawling metrics on host running 100 httpd containers (os,disk,cpu,memory,interface,load for host, and cpu,memory,interface,disk for containers)
  4. TODO: verify container-specific log files (mentioned in some /etc/* file ) are being extracted/linked as well (@sastryduri )

Experiment was run for almost 3 days continuosuly. All looks stable- mem and CPU usage remains in limit, http response rates and times don't suffer (the response rate in the figures below should be multipled across 100 containers i.e. 70x100 = 7000 replies/s.)

mem_vs_time

cpu_vs_time

httperf_vs_time

@sahilsuneja1
Copy link
Contributor

@kudvanpk
Copy link

thanks @sahilsuneja1 Great work, thanks! and looks good. I will make two pull requests to you today that will repeat the changes I made for v0.1 to master. Then I will ready another release of the cloudsight-crawler.
cc: @canturkisci @sastry

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants