explicit types for everything #31

Dieterbe · 2015-10-21T11:56:38Z

we should choose the best fitting types for every metric and make them explicit.
some are best served as a float64, others int32 or whatever, some are merely a bool.

benefits:

less resource usage in data pipeline and storage (after optimizations)
explicit data ranges and format (like decimal digits), currently it's a bit ambiguous what kind of values we can expect.

Dieterbe · 2015-10-21T11:56:50Z

(not urgent)

Dieterbe · 2015-11-30T08:40:33Z

@woodsaj did you see dgryski/go-tsz#7 ? it looks like if we can avoid decimals, storage efficiency goes up quite a bit. though could use some further experimentation.

woodsaj · 2015-11-30T12:45:50Z

It is completely impractical to not be able to handle floats.

Dieterbe · 2015-11-30T20:17:45Z

right, we would still be using and supporting floats.
but there are a bunch of cases where decimal numbers are simply not needed, and they compress better. for example response times in ms don't need decimals, that level of accuracy is pointless. we could simply submit those as uints and have better compression in storage and save space on the wire as well. other values have much narrower ranges and can have more suited metrics as well. like our ok_state metrics are basically booleans.
all i'm saying is we should :

strive to use optimized datatypes on a case by case basis.
avoid decimals if we don't need them.

our storage already handles floats without decimals much more efficiently, and later we can add optimisations for ints and bools. but i think it's important to start sending data properly sooner rather then later.

Dieterbe · 2016-01-01T03:14:37Z

I think the sooner we can anticipate changes in data format that will be required to optimize storage and do a 1-time data format change (breaking to current alpha users), the better. we're only boarding more and more people and supposedly our platform is gearing up to become production ready, so if we're gonna do changes like store latency on ms or 0.1 ms granularity instead of 1s that is going to break people's graphs, i rather get it over with soon.

@woodsaj and @nopzor1200 I would like to discuss and decide what's next, either through a hangout soon, or possibly irl in NYC.

basically it's all about figuring out what's the best way to store the litmus (and more than just litmus but litmus is where to start) metrics, here's the current list:

health.fake_org_$org_endpoint_$endp.dns.error_state
health.fake_org_$org_endpoint_$endp.dns.ok_state
health.fake_org_$org_endpoint_$endp.dns.warn_state
health.fake_org_$org_endpoint_$endp.http.error_state
health.fake_org_$org_endpoint_$endp.http.ok_state
health.fake_org_$org_endpoint_$endp.http.warn_state
health.fake_org_$org_endpoint_$endp.ping.error_state
health.fake_org_$org_endpoint_$endp.ping.ok_state
health.fake_org_$org_endpoint_$endp.ping.warn_state
litmus.fake_org_$org_endpoint_$endp.dev1.dns.answers
litmus.fake_org_$org_endpoint_$endp.dev1.dns.default
litmus.fake_org_$org_endpoint_$endp.dev1.dns.error_state
litmus.fake_org_$org_endpoint_$endp.dev1.dns.ok_state
litmus.fake_org_$org_endpoint_$endp.dev1.dns.time
litmus.fake_org_$org_endpoint_$endp.dev1.dns.ttl
litmus.fake_org_$org_endpoint_$endp.dev1.dns.warn_state
litmus.fake_org_$org_endpoint_$endp.dev1.http.connect
litmus.fake_org_$org_endpoint_$endp.dev1.http.dataLength
litmus.fake_org_$org_endpoint_$endp.dev1.http.default
litmus.fake_org_$org_endpoint_$endp.dev1.http.dns
litmus.fake_org_$org_endpoint_$endp.dev1.http.error_state
litmus.fake_org_$org_endpoint_$endp.dev1.http.ok_state
litmus.fake_org_$org_endpoint_$endp.dev1.http.recv
litmus.fake_org_$org_endpoint_$endp.dev1.http.send
litmus.fake_org_$org_endpoint_$endp.dev1.http.statusCode
litmus.fake_org_$org.endpoint_$endp.dev1.http.throughput
litmus.fake_org_$org_endpoint_$endp.dev1.http.total
litmus.fake_org_$org_endpoint_$endp.dev1.http.wait
litmus.fake_org_$org_endpoint_$endp.dev1.http.warn_state
litmus.fake_org_$org_endpoint_$endp.dev1.ping.avg
litmus.fake_org_$org_endpoint_$endp.dev1.ping.default
litmus.fake_org_$org_endpoint_$endp.dev1.ping.error_state
litmus.fake_org_$org_endpoint_$endp.dev1.ping.loss
litmus.fake_org_$org_endpoint_$endp.dev1.ping.max
litmus.fake_org_$org_endpoint_$endp.dev1.ping.mdev
litmus.fake_org_$org_endpoint_$endp.dev1.ping.mean
litmus.fake_org_$org_endpoint_$endp.dev1.ping.min
litmus.fake_org_$org_endpoint_$endp.dev1.ping.ok_state
litmus.fake_org_$org_endpoint_$endp.dev1.ping.warn_state

health.*.state and litmus*.state -> keep bool (only 1/0) series for ok/warn/crit, or 2bit numbers?
latency measurements: e.g. if we stick with uint16: 2^16 - 1 = 65535 if ms scale, we can cover a range from 0 to 65s, everything >60s can be considered "way too long" i.e. timeout so no need to support values higher than that. if 0.1 ms scale, we would be limited at 6s which seems too strict.
however there's a middle ground:
0 - 60k: could be the range of points in ms
60k - 65535 = could be a range for data in 0.1ms between 0 and 5535, i.e. support a decimal up to when we hit 0.5s or 500ms at which point you really don't care about decimals anymore.
want higher precision or more range then know that floats store worse and uint32 covers way more than what we need.
ping loss: is a percentage, right? so let's commit to only using values 0-100
ping max/mean/min/mdev etc can be the same as whatever we use for the measurements. mdev could potentially be a smaller type.
however if we do singular ping checks and store the results of that, we can do away with storing min/max/mdev/mean etc. see also 1 ping per tick raintank-probe#12 is loss still a % in this case or could it be just a bool because there's only 1 packet?
http status code: seems to be in the range of 100-599 (however, seems like there are <256 codes in total so we may end up storing them as uint8 at some point if we have type awareness in MT)

I think I want MT to become a metrics2.0/unit-aware store and it can do its own optimizations.
for context:

graphite just casts everything to a float64 and calls it a day
influxdb has bool/float/integer types

it MT becomes unit aware, it means

it can automatically convert data to the expected format it should be in, lowering transition issues and paving the way for doing behind-the-scenes optimizations that are abstracted away. such as the http-as-uint8 or latency-in-ms-with-decimals-in-the-lower-range approaches described above
though this might make the transition even more complicated. but i think making the store aware of what the data is, allows it to better optimize, and even combine different strategies.
it can automatically convert data to the desired format. i'ld love to bring some of the graph-explorer magic into the store at some point, i.e. query for latency in s format and it'll make sure the data is proper, no matter the unit/format it was originally stored in. if we know something is a 64bit counter, we can return it as a rate properly, etc. I think it could be a great help in abstracting some low level stuff that people typically don't want to deal with.

DanCech · 2016-01-01T16:46:34Z

@Dieterbe I absolutely agree that metrics2.0 support in MT is the correct way forward on this, presumably combined with an ingestion service that can handle old-school unitless readings and decide type for them based on a configured set of rules.

As far as data storage, I don't really have a frame of reference yet for how much optimization is too much. The split range approach for ping seems like an interesting approach, if we can assume that the maximum supported value will be 60s. If we need more range we could take that approach further and define 4 sub-ranges which would give us a gradual falloff in precision and support for values up to just over 30 minutes, which will also support a broader range of use-cases that might include values over 60s as we expand the things we monitor.

Time Range      Unit   Formula               Raw Range
0 - 0.9999s     0.1ms  X ms * 10             0     - 9999
1s - 29.999s    1ms    X ms + 9000           10000 - 38999
30s - 119.99s   10ms   (X ms / 10) + 36000   39000 - 47999
120s - 1873.5s  100ms  (X ms / 100) + 46800  48000 - 65535

Here's some quick and dirty python to pack and unpack a float representing time in ms according to this scheme:

def pack_uint16(ms):
    if ms < 0:
        return None
    if ms < 1000:
        return int(ms * 10)
    if ms < 30000:
        return int(ms) + 9000
    if ms < 120000:
        return int(ms / 10) + 36000
    if ms <= 1873500:
        return int(ms / 100) + 46800
    return None

def unpack_uint16(packed):
    if packed < 10000:
        return packed / 10.0
    if packed < 39000:
        return packed - 9000
    if packed < 48000:
        return (packed - 36000) * 10
    return (packed - 46800) * 100

As far as ping loss, yes 0-100 is probably fine from a technical perspective. I'm not sure if users expect to see a decimal place or 2 there when you have a scenario that doesn't come out to a round percentage like 3 of 7 pings failed.

woodsaj · 2016-01-02T14:23:19Z

I think it is a very bad idea to store different dataTypes inside MetricTank. It will significantly increase the complexity of the code base, i dont think the gain is worth it. I would only consider allowing int64 in addition to float64. Influxdb had an issue about dropping support for int64's and after long discussion decided to keep them. Prometheus on the other hand only supports float64. The need for int64's arises when the values exceed 2^53, as you loose precision when storing as float64. This typically only happens when you are using a 64bit counter, such as on a network switch that does a lot of traffic. At full line rate, a byte counter on a 10GB switchPort could reach 2^53 bytes (9petabytes) in ~90days. Though if the counter is incrementing quickly enough to fill 64bits in a year or less the precision loss would be much less then 1%.

Because we use deltaEncoding of the data and all of the the bool, and small integer values like statusCode, error_state, packetLoss, etc... change infrequently, they already achieve excellent compression. I am very doubtful that the effort required to support the range of datatypes would be worth the gain received.

if we do singular ping checks and store the results of that, we can do away with storing min/max/mdev/mean etc. see also raintank/raintank-probe#12

This is just not possible. The ping check is a check that reports an up/down state. It can only do that based on percentage of ping failure so each check run needs to send many pings. I also fundamentally disagree with the premise that sending 1 ping every 2seconds gives better insight into network issues then sending 5pings once every 10seconds. If you send 1 ping every 2seconds and it is lost you only know that the link is experiencing some packet loss. If the packet is not lost you only know that there is less then 100% packet loss. However if you send 5pings at once and 1 out of the five is lost, you know the link is experiencing <20% packet loss.

DanCech · 2016-01-02T14:45:50Z

Very interesting comment, (I think) I misread @Dieterbe 's comment to imply we were using uint16 to store these values today. Based on what @woodsaj just said it does sound like there is very little reason to change the internal storage format (and go through the kind of acrobatics I described above). That said, I think the storage format is a separate discussion from the metrics2.0 concept of tagging series with their type to support intelligence in dealing with the stored data, which is definitely valuable. Having a unified internal storage format would also sidestep the issues ES has seen with the 2.0 upgrade in getting more strict about the data type for index fields.

woodsaj · 2016-01-02T15:03:27Z

100% agree that metrics2.0 is the way to go. And i think this whole issue needs to be moved to https://github.com/raintank/raintank-metric

There are two separate decisions to be made. Our transport protocol (metrics2.0 format) and our storage protocol, which currently uses float64 for everything then delta encodes 10 or 30minutes of values before storing in Cassandra).

For the transport protocol, i feel that we need to support JSON for ease of use, and additionally support msgPack or protoBuf for performance. Our own collectors/agents would use msgPack/protoBuf, while tools written by the community would likely prefer JSON for simplicity. For the transport protocol, we are not going to see very any noticeable performance gain from supporting many different dataTypes and will likely see reduced performance as we need to treat all values as just []byte until we can read the dataType from the metricDefinition then correctly decode the data. For protoBuf this means that we loose the benefit of some of the encoding techniques used for various datatypes.

Dieterbe · 2016-01-10T04:12:12Z

This is just not possible. The ping check is a check that reports an up/down state. It can only do that based on percentage of ping failure so each check run needs to send many pings. I also fundamentally disagree with the premise that sending 1 ping every 2seconds gives better insight into network issues then sending 5pings once every 10seconds. If you send 1 ping every 2seconds and it is lost you only know that the link is experiencing some packet loss. If the packet is not lost you only know that there is less then 100% packet loss. However if you send 5pings at once and 1 out of the five is lost, you know the link is experiencing <20% packet loss.

this is why we have alerting rules on top of the datastream to make informed decisions about whether a ping check is in a problematic state or not. The alerting rules is where we have access to all data (including from other probes), and where is the best place to provide a reliable indicator to the user.
the output from the probe check is not meant for any particular use other than helping the centralized alerting rule to come up with a conclusion. (correct me if i'm wrong)
we could easily extend our current logic that counts x failures in a row, to an "alert if at least x failures out of y", which mimics the behavior you're describing, but in the pinging style i described, except it's more flexible and powerful. and in case of downtime you can see in higher resolution when the failures happened (e.g. at 2 second precision instead of 10s).

so we got derailed a bit into talking about specific optimisations for datatypes, but keep in mind the original reason i opened this ticket is that we should think about, and document what are acceptable data precisions for our data, because even if we stick to just float64 for everything, it makes a lot of sense to avoid decimals (see dgryski/go-tsz#9) so this brings up questions such as do we want timings on ms level precision or 0.1 ms, which may force a change in data and hence a breaking change in graphs, unless we add awareness and conversion to the store.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

explicit types for everything #31

explicit types for everything #31

Dieterbe commented Oct 21, 2015

Dieterbe commented Oct 21, 2015

Dieterbe commented Nov 30, 2015

woodsaj commented Nov 30, 2015

Dieterbe commented Nov 30, 2015

Dieterbe commented Jan 1, 2016

DanCech commented Jan 1, 2016

woodsaj commented Jan 2, 2016

DanCech commented Jan 2, 2016

woodsaj commented Jan 2, 2016

Dieterbe commented Jan 10, 2016

explicit types for everything #31

explicit types for everything #31

Comments

Dieterbe commented Oct 21, 2015

Dieterbe commented Oct 21, 2015

Dieterbe commented Nov 30, 2015

woodsaj commented Nov 30, 2015

Dieterbe commented Nov 30, 2015

Dieterbe commented Jan 1, 2016

DanCech commented Jan 1, 2016

woodsaj commented Jan 2, 2016

DanCech commented Jan 2, 2016

woodsaj commented Jan 2, 2016

Dieterbe commented Jan 10, 2016