-
Notifications
You must be signed in to change notification settings - Fork 30
R image processing #107
base: master
Are you sure you want to change the base?
R image processing #107
Conversation
Got it, will review when I get a chance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks cool!
I left some generic R style and best practice stuff, I would be happy to take another look at what the code is actually doing later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! @jameslamb knows R better than I do, so I'm going to defer to him on the details.
One limitation of the approach taken here is that the model probabilities for the different categories are independent, so that e.g. an image containing both a human and a dog should get high probabilities for both. As a result, choosing the highest-probability label will not be the best approach for some applications. (For instance, to evaluate the model's performance e.g. for the human
label, you may want to look at the probabilities for human
on every image regardless of whether that probability is the highest or not.)
At some point it might be worthwhile to create a library or otherwise make this code more modular and reusable, but I'm fine with having it all in one script as a first pass.
@@ -0,0 +1,230 @@ | |||
#Examples of how to make requests agains the image classification endpoints |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the double underscore in the filename intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, that is definitely a typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the model probabilities. That is something that I was unaware of until you brought it up today (that they were independent). I'm thinking that if we are trying to select the 'best' one it may make the most sense to divide each probability by the sum of all the probabilities.
For example, if we have
{
raccoon:0.80,
rabbit:0.20,
coyote:0.75
}
Then we would divide each of those element by 0.80+0.20+0.75
. That would at least ensure that the relative probabilities sum to 1. On top of this, an image with multiple 'high' probabilities for different classifications (as in the above example) would get down-weighted a bit while those with a single 'high' probability would be penalized less.
Finally, I do agree that this could be made MUCH more modular and a library would be a great way to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we would divide each of those element by 0.80+0.20+0.75. That would at least ensure that the relative probabilities sum to 1.
Why would we want that? The idea behind letting them be independent is that the categories are not actually mutually exclusive, so we should not force them to sum to 1. E.g. {'human': 1, 'dog': 1, ...}
is just the right result for an image that contains both humans and dogs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While that is true in this specific case, what do you do with an image that is {'raccoon': 0.99, 'coyote': 0.99}
?. In our decade of camera trapping we've only gotten one image of a coyote and a raccoon so assuming that both are in the is a little suspect. Aside from human and dog the likelihood of getting two unique species is quite low. However, if we allow the probabilities to sum to one you could just add together the human and dog probabilities (making a human AND dog classifier). At the end of the day though these types of summaries can be done after the fact (post autofocus) and we can then make some comparisons about which way performs better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While that is true in this specific case, what do you do with an image that is {'raccoon': 0.99, 'coyote': 0.99}?
If the model gives that result and the image doesn't contain both a raccoon and a coyote, then the model got it wrong. Cases where it is that badly wrong should be extremely rare. At this point if I saw {'raccoon': 0.99, 'coyote': 0.99}
from the model I would be inclined to believe that the image does contain both a raccoon and a coyote, although my prior probability for that scenario is low.
However, if we allow the probabilities to sum to one you could just add together the human and dog probabilities (making a human AND dog classifier).
If the app returns {"human": 1, "dog": 1}
(with zeros for other categories), that means that the model is confident that the image contains both a human and a dog. If it returns {"human": .5, "dog": .5}
(with zeros for other categories), that means that the model is maximally uncertain about whether the image contains a human and about whether it contains a dog. This is a very important distinction, and we would lose it if we made the numbers sum to one after the fact.
If we trained a multiclass rather than multilabel model so that {"human": 1, "dog": 1}
was impossible, then it is hard to say what the model would do on an image that contains both humans and dogs. For instance, in that case {"human": .5, "dog": .5}
could mean that the image contains both humans and dogs but the model doesn't have the means to represent that fact, or it could mean that the model is confident that the image contains something but not whether that thing is a human or a dog.
The categories are not mutually exclusive, so treating them as separate labels is the right approach in principle. We can revisit the approach if we find that it doesn't work well in practice, but so far I don't see any reason to think that it wouldn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's meet up or jump on a call if what I'm saying isn't clear or if it is clear but you disagree.
These PR is a quality of life improvement for R users of autofocus. The old example assumed that images have already been preprocessed and/or zipped together. This is not the case when someone has a whole bunch of camera trap images in hand.
This new example
process_predict_example.R
contains a suite of functions that can be used to:process_raw.py
. We remove the bottom 198 pixels and then reduce to 760x512 pixels.In R, you then end up with a svelt data.frame that contains the original file name and what the most likely species in that photo is. As an example:
Finally, the images (and associated zip files) that get processed are treated as temporary files so you don't have to have to create a secondary batch of images. This is most useful for the prediction side of autofocus (for training we'd probably want to retain the processed images).