[BUG] Using`log_metric()` with numpy types or other invalid values crashes Redis 😰 #127

mohammedri · 2020-04-22T18:49:18Z

We have faced this issue before, but for some reason it was ignored. Recently it was resurfaced by https://github.com/SebastianGergen again in the Slack community channels.

Doing log_metric() with anything other than ints or floats essentially break the entire system and requires you to re-install Atlas because Atlas goes into a non-recoverable state.

This happens because our Flask API is expecting a certain contract with Redis which is not being met.

We should fail more gracefully, or support numpy types.

Related: #96

Other details
This was his message:

Hi Atlas Team, one more issue that I faced today in my "on premise" atlas installation: when I make "foundations.log_metric('accuracy', accuracy)" and add a value which is somehow "invalid" (like accuracy = np.random.rand(1)[0]) it seems that the whole database crashes or something. I tried several simple "valid" values like accuracy = 0.5, and it works fine.  The result after the crash is problematic: The project overview page does not show up any more (actually it only shows up for a brief short milisecond and then disappears), and I am left with a blank GUI page in the browser. Only way to recover for me until now is re-installing :disappointed: Sorry. this description is not very clear. I hope you can get my problem :slightly_smiling_face:

when I use accuracy = float(np.random.rand(1)[0]) and foundations.log_metric('accuracy', accuracy) it works fine. So i guess that numpy makes a problem, anyways it is pretty unconvenient, that at least from my perspective I can't recover atlas after once appending a numpy based value.

The text was updated successfully, but these errors were encountered:

mohammedri · 2020-05-05T13:44:01Z

Does this PR #130 fix this issue as well @pippinlee ?

pippinlee · 2020-05-05T13:59:10Z

Don't believe the colon issue was the same root cause, but @nicknagi correct me if I'm wrong.

mohammedri · 2020-05-05T14:56:08Z

Sounds good! Did you have an expected timeline for this?

pippinlee · 2020-05-11T18:17:52Z

Tested out log_metric() using np.float64 types specifically, trying to reproduce previous errors using .log_metric('key', np.random.rand(1)[0]).

I was able to successfully log without error. This may have been thanks to Nick’s fix. I'll add a test to prove np.float64 support. I think we should be clearer with what types we do allow logging for though. Here’s a list of values to help us understand current state WRT numpy:

Valid
np.float64(29)

Invalid
type(np.random.rand(1)[0])
np.array([1,2,3,4])
np.ndarray([1,2,3])

Note: all of the above invalid values gracefully fail with an error like the following: TypeError: Invalid metric with key="np.ndarray([1,2,3])" of value=[[[ 2.33233293e-316 -4.3347313 ... with type <class 'numpy.ndarray'>. Value should be of type string or number, or a list of strings / numbers.

Ongoing questions:

Is that error message helpful?
Can we better define types we wish to support? That'll help us with these one offs "does log_metric() support type X", and we can provide this in docs to make this clearer to users.

mohammedri · 2020-05-11T18:29:51Z

Initial thoughts:

Is that error message helpful?

I think the error message is helpful as is. The last message clearly specifies it all.
TypeError: Invalid metric with key="np.ndarray([1,2,3])" of value=[[[ 2.33233293e-316 -4.3347313 ... with type <class 'numpy.ndarray'>. Value should be of type string or number, or a list of strings / numbers.

Can we better define types we wish to support?

The intention behind log metric is to support recording metrics such as precision, recall, loss, Area Under Curve, f1-score et.al. NOT to support the outputs of a task e.g. the output of a BERT model, or the output of a generative NLP model - as such it seems weird to me that we support strings and a list of strings / numbers.

On the contrary I can see - why someone would wish to view the output on the GUI.

There are two routes to go from here:

We be more flexible: support strings, lists of string / numbers, np.array et. al.
We stick to what log_metric is supposed to be there for and strip off support for strings / lists.

Can anyone recall why we decided to support strings?

mohammedri · 2020-05-11T18:34:38Z

My vote is to strip support for strings / lists and use log_metrics for what it is meant to be used for

ekhl · 2020-05-11T18:46:53Z

The decision of whether to change the behaviour of this feature (e.g. modify the types supported by log_metric) looks to be out of scope of this issue. I suggest that if this bug has been fixed we close it out and open a new issue to address that separate concern.

pippinlee · 2020-05-11T18:56:14Z

Works for me. Going to submit a PR to cover us for np.float64 and then we can close.

kyledef · 2020-05-11T18:56:30Z

I think this issue also came up when Damien and others in the DS team were exploring Atlas. I suspect their use case is a valid concern for us. We should understand better what it is before we mark this as closed.

… with log_metric(), this test case ensures that is no longer the case #127

shazraz · 2020-05-15T16:57:19Z

@mohammedri & @pippinlee ,

@cindyzczhao and I were able to reproduce this error on 0.1.1 using the following code snippet

import foundations
import numpy as np
x = np.float64(5)
fooundations.log_metric('metric', x)

It broke the GUI and we had to flush redis to get things back up.

pippinlee · 2020-05-18T16:25:48Z

It broke the GUI and we had to flush redis to get things back up.

Thanks for the heads up. Have confirmed this is because we need to do a new release––using current release 0.1.1 still has log_metric bug.

Latest master works without error with logging np.float64(5). Have added a cypress test to confirm this.

mohammedri · 2020-05-21T14:56:56Z

Hey @pippinlee okay to close this issue as this is fixed?

mohammedri added the bug Something isn't working label Apr 22, 2020

pippinlee self-assigned this Apr 28, 2020

pippinlee added a commit that referenced this issue May 15, 2020

[PL] Previous version of atlas crashed when using numpy float64 types…

a241137

… with log_metric(), this test case ensures that is no longer the case #127

pippinlee mentioned this issue May 15, 2020

[PL] Previous version of atlas crashed when using numpy float64 types… #156

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Using`log_metric()` with numpy types or other invalid values crashes Redis 😰 #127

[BUG] Using`log_metric()` with numpy types or other invalid values crashes Redis 😰 #127

mohammedri commented Apr 22, 2020

mohammedri commented May 5, 2020

pippinlee commented May 5, 2020

mohammedri commented May 5, 2020

pippinlee commented May 11, 2020

mohammedri commented May 11, 2020 •

edited

Loading

mohammedri commented May 11, 2020

ekhl commented May 11, 2020

pippinlee commented May 11, 2020

kyledef commented May 11, 2020

shazraz commented May 15, 2020

pippinlee commented May 18, 2020

mohammedri commented May 21, 2020

[BUG] Usinglog_metric() with numpy types or other invalid values crashes Redis 😰 #127

[BUG] Usinglog_metric() with numpy types or other invalid values crashes Redis 😰 #127

Comments

mohammedri commented Apr 22, 2020

mohammedri commented May 5, 2020

pippinlee commented May 5, 2020

mohammedri commented May 5, 2020

pippinlee commented May 11, 2020

mohammedri commented May 11, 2020 • edited Loading

mohammedri commented May 11, 2020

ekhl commented May 11, 2020

pippinlee commented May 11, 2020

kyledef commented May 11, 2020

shazraz commented May 15, 2020

pippinlee commented May 18, 2020

mohammedri commented May 21, 2020

[BUG] Using`log_metric()` with numpy types or other invalid values crashes Redis 😰 #127

[BUG] Using`log_metric()` with numpy types or other invalid values crashes Redis 😰 #127

mohammedri commented May 11, 2020 •

edited

Loading