-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I added MLP probability #206
Conversation
sorry for this .idea/misc.xml |
I made also some test |
But I think I got old PR there was not some file I put there yesterday |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Luca, there are still some style stuff that needs to taken care of. Also I was wondering about keeping training and predicting separate, but maybe I misunderstood some logic? And MLP was now done in sklearn and not keras apparently? Thanks for delivering this one fast!
elif cross_validation_type == "SKFOLD": | ||
return StratifiedKFold(n_splits=number_of_split, shuffle=True) | ||
else: | ||
raise InvalidCrossValidationSelected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This too InvalidParameterValueException
eis_toolkit/model_performance_estimation/model_performance_estimation.py
Outdated
Show resolved
Hide resolved
eis_toolkit/model_performance_estimation/model_performance_estimation.py
Outdated
Show resolved
Hide resolved
eis_toolkit/model_performance_estimation/model_performance_estimation.py
Outdated
Show resolved
Hide resolved
Ok so i can de couple training score and prediction in three sepatate
phases.... now i m thinking to do some class here...what do you think?
I can take care all the styling stuff!
…On Thu, 26 Oct 2023, 12:17 Niko Aarnio, ***@***.***> wrote:
***@***.**** requested changes on this pull request.
Hey Luca, there are still some style stuff that needs to taken care of.
Also I was wondering about keeping training and predicting separate, but
maybe I misunderstood some logic? And MLP was now done in sklearn and not
keras apparently? Thanks for delivering this one fast!
------------------------------
In eis_toolkit/deep_learning/mlp_function.py
<#206 (comment)>
:
> + random_state: random state for repeatability of results.
+ Return:
+ a numpy array with prediction (class if is_class_probability is set to false otherwise it return probability).
+ Raises:
+ InvalidDatasetException: When the dataset is None.
+ InvalidArgumentTypeException when the function try to make probability and the threshold is None.
+ """
⬇️ Suggested change
- random_state: random state for repeatability of results.
- Return:
- a numpy array with prediction (class if is_class_probability is set to false otherwise it return probability).
- Raises:
- InvalidDatasetException: When the dataset is None.
- InvalidArgumentTypeException when the function try to make probability and the threshold is None.
- """
+ random_state: random state for repeatability of results.
+
+ Return:
+ A Numpy array with prediction (class if is_class_probability is set to false otherwise it return probability).
+
+ Raises:
+ InvalidDatasetException: When the dataset is None.
+ InvalidArgumentTypeException when the function try to make probability and the threshold is None.
+ """
------------------------------
In eis_toolkit/deep_learning/mlp_function.py
<#206 (comment)>
:
> @@ -0,0 +1,95 @@
+import numpy as np
+from sklearn.neural_network import MLPClassifier
+
+from eis_toolkit.exceptions import InvalidArgumentTypeException, InvalidDatasetException
+from eis_toolkit.model_performance_estimation.model_performance_estimation import performance_model_estimation
+
+
+def train_evaluate_predict_with_mlp(
+ dataset: np.ndarray,
+ labels: np.ndarray,
+ cross_validation_type: str,
This could be Literal["LOOCV", "KFOLD", "SKFOLD"]
------------------------------
In eis_toolkit/deep_learning/mlp_function.py
<#206 (comment)>
:
> @@ -0,0 +1,95 @@
+import numpy as np
+from sklearn.neural_network import MLPClassifier
+
+from eis_toolkit.exceptions import InvalidArgumentTypeException, InvalidDatasetException
+from eis_toolkit.model_performance_estimation.model_performance_estimation import performance_model_estimation
+
+
remember @beartype <https://github.com/beartype>
------------------------------
In eis_toolkit/deep_learning/mlp_function.py
<#206 (comment)>
:
> +import numpy as np
+from sklearn.neural_network import MLPClassifier
+
+from eis_toolkit.exceptions import InvalidArgumentTypeException, InvalidDatasetException
+from eis_toolkit.model_performance_estimation.model_performance_estimation import performance_model_estimation
+
+
+def train_evaluate_predict_with_mlp(
+ dataset: np.ndarray,
+ labels: np.ndarray,
+ cross_validation_type: str,
+ number_of_split: int,
+ is_class_probability: bool = False,
+ threshold_probability: float = None,
+ is_predict_full_map: bool = False,
+ solver: str = "adam",
This could be Literal listing the valid options too
------------------------------
In eis_toolkit/deep_learning/mlp_function.py
<#206 (comment)>
:
> +from eis_toolkit.exceptions import InvalidArgumentTypeException, InvalidDatasetException
+from eis_toolkit.model_performance_estimation.model_performance_estimation import performance_model_estimation
+
+
+def train_evaluate_predict_with_mlp(
+ dataset: np.ndarray,
+ labels: np.ndarray,
+ cross_validation_type: str,
+ number_of_split: int,
+ is_class_probability: bool = False,
+ threshold_probability: float = None,
+ is_predict_full_map: bool = False,
+ solver: str = "adam",
+ alpha: float = 0.001,
+ hidden_layer_sizes: tuple[int, int] = (16, 2),
+ random_state=0,
type missing here (int?)
------------------------------
In eis_toolkit/deep_learning/mlp_function.py
<#206 (comment)>
:
> + cross_validation_type: selected cross validation method.
+ number_of_split: number of split to divide the dataset.
+ is_class_probability: if True the code return probability, otherwise it return class.
+ is_predict_full_map: if True the function will predict the full dataset otherwise predict only the te4st fold.
+ threshold_probability: works only if is_class_probability is True, is thresholds of probability.
+ solver: this is what in keras is called optimizer.
+ alpha: floating point represent regularization.
+ hidden_layer_sizes: It represents the number of neurons in the ith hidden layer.
+ random_state: random state for repeatability of results.
Remember to start docstring sentences with uppercase. There is also one
typo
------------------------------
In eis_toolkit/deep_learning/mlp_function.py
<#206 (comment)>
:
> + if is_class_probability is not False and threshold_probability is None:
+ raise InvalidArgumentTypeException
I'd use here the InvalidParameterValueException and include a comment in
the exception
------------------------------
In eis_toolkit/deep_learning/mlp_function.py
<#206 (comment)>
:
> + # assign to classifier and data a vars I do not like see to much indexing
+ classifier = best_handler_list[0]
+
+ if not is_predict_full_map:
+ data = best_handler_list[1]
+ else:
+ data = dataset
+
+ if not is_class_probability:
+ # predict class
+ prediction = classifier.predict(data)
+ else:
+ # predict proba
+ prediction = classifier.predict_proba(data)
+ # assign proba to threshold
+ prediction[prediction >= threshold_probability] = 1
+
+ return prediction
I am thinking should we do the predicting in another function and simply
train a model here? We could return a dictionary with the best model, the
best score and a list of fold performances or something like that.
------------------------------
In eis_toolkit/exceptions.py
<#206 (comment)>
:
> +class InvalidCrossValidationSelected(Exception):
+ """Exception thrown when a not valid cv is selected."""
+
+
+class InvalidNumberOfSplit(Exception):
+ """Exception throws when number of split is incompatible."""
I would not add these exceptions but use the
InvalidParameterValueException with a comment.
------------------------------
In
eis_toolkit/model_performance_estimation/model_performance_estimation.py
<#206 (comment)>
:
> @@ -0,0 +1,35 @@
+import sklearn
+from sklearn.model_selection import KFold, LeaveOneOut, StratifiedKFold
+
+from eis_toolkit.exceptions import InvalidCrossValidationSelected, InvalidNumberOfSplit
+
+
+def performance_model_estimation(
+ cross_validation_type: str = "LOOCV", number_of_split: int = 5
+) -> sklearn.model_selection:
+ """
+ Evaluate the feature importance of a sklearn classifier or linear model.
+
+ Parameters:
+ cross_validation_type: Select cross validation (LOOCV, SKFOLD, KFOLD).
+ number_of_split: number used to split the dataset.
Could detail here what is required for the number_of_split so that
exceptions are not raised.
------------------------------
In
eis_toolkit/model_performance_estimation/model_performance_estimation.py
<#206 (comment)>
:
> +
+ Parameters:
+ cross_validation_type: Select cross validation (LOOCV, SKFOLD, KFOLD).
+ number_of_split: number used to split the dataset.
+ Return:
+ Selected cross validation method
+ Raises:
+ InvalidCrossValidationSelected: When the cross validation method selected is not implemented.
+ InvalidNumberOfSplit: When the number of split is incompatible with the selected cross validation
+ """
+
+ if cross_validation_type is None:
+ raise InvalidCrossValidationSelected
+
+ if cross_validation_type != "LOOCV" and number_of_split <= 1:
+ raise InvalidNumberOfSplit
This could be InvalidParameterValueException
------------------------------
In
eis_toolkit/model_performance_estimation/model_performance_estimation.py
<#206 (comment)>
:
> + InvalidNumberOfSplit: When the number of split is incompatible with the selected cross validation
+ """
+
+ if cross_validation_type is None:
+ raise InvalidCrossValidationSelected
+
+ if cross_validation_type != "LOOCV" and number_of_split <= 1:
+ raise InvalidNumberOfSplit
+ if cross_validation_type == "LOOCV":
+ return LeaveOneOut()
+ elif cross_validation_type == "KFOLD":
+ return KFold(n_splits=number_of_split, shuffle=True)
+ elif cross_validation_type == "SKFOLD":
+ return StratifiedKFold(n_splits=number_of_split, shuffle=True)
+ else:
+ raise InvalidCrossValidationSelected
This too InvalidParameterValueException
------------------------------
In
eis_toolkit/model_performance_estimation/model_performance_estimation.py
<#206 (comment)>
:
> @@ -0,0 +1,35 @@
+import sklearn
+from sklearn.model_selection import KFold, LeaveOneOut, StratifiedKFold
+
+from eis_toolkit.exceptions import InvalidCrossValidationSelected, InvalidNumberOfSplit
+
+
beartype
------------------------------
In
eis_toolkit/model_performance_estimation/model_performance_estimation.py
<#206 (comment)>
:
> @@ -0,0 +1,35 @@
+import sklearn
+from sklearn.model_selection import KFold, LeaveOneOut, StratifiedKFold
+
+from eis_toolkit.exceptions import InvalidCrossValidationSelected, InvalidNumberOfSplit
+
+
+def performance_model_estimation(
+ cross_validation_type: str = "LOOCV", number_of_split: int = 5
Literal to list all the valid cross_validation_types. Then beartype will
validate also the input str
------------------------------
On
eis_toolkit/model_performance_estimation/model_performance_estimation.py
<#206 (comment)>
:
Move this file too to prediction (at least for now). Lets not create new
high-level folders at least yet.
------------------------------
On eis_toolkit/deep_learning/mlp_function.py
<#206 (comment)>
:
You could rename this file just "mlp" I think. Also, please move it under
prediction and remove the deep_learning folder.
—
Reply to this email directly, view it on GitHub
<#206 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AF3KUFPHHLF5JRRUYMVUONDYBITA3AVCNFSM6AAAAAA57E3YFWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTMOJZGA3TCMBTGQ>
.
You are receiving this because you were assigned.Message ID:
***@***.***>
|
I can't comment much about how many phases there should be, but I'd prefer if there are no custom classes. It's easier for the plugin to just call some functions that produce relevant output and also keeps the toolkit consistent since we haven't been creating own classes so far |
I f we put three fuunction there in mlp it is ok? |
Should be fine I guess. If they are public so user is supposed to call them directly, just make sure they represent logical steps. If they are private helper functions then there's no problem and there could be even more if you see fit |
498a8b4
to
e4c7bc5
Compare
e4c7bc5
to
b3544f6
Compare
I try to finish it by this evening |
b3544f6
to
b6252aa
Compare
I put all the function separately, but I left alsoi the one with all in one just in case |
let s check the functions before we do for nothing |
Would it make sense to have just two functions, one for training and one for predicting? The training would take care of evaluation too automatically and have more parameters. What do you think if you take a look at what I did in PR #210 ? Just thinking whats the best way to pack these steps. And did you leave the cross-validation loop out on purpose in the new functions? |
Let s see let s see if i use function like these... train prediction and
eval.. i like to have cross validation as separate entity.
For me me is the same if you prefere we can like you say. When you do
cross validation you have to create each time new instance of the model new
training and new pred. That why i put all in one function! How do you
prefere it? We can have double: one as one entity and one like this...
Let s try to finish this pr before the weekend so i can prepare the
fusion!
Now i feel very well and my girfriend has corona
…On Thu, 26 Oct 2023, 15:09 Niko Aarnio, ***@***.***> wrote:
Would it make sense to have just two functions, one for training and one
for predicting? The training would take care of evaluation too
automatically and have more parameters. What do you think if you take a
look at what I did in PR #210
<#210> ? Just thinking
whats the best way to pack these steps. And did you leave the
cross-validation loop out on purpose in the new functions?
—
Reply to this email directly, view it on GitHub
<#206 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AF3KUFKIU4NHOBFAHV2VCGLYBJHGTAVCNFSM6AAAAAA57E3YFWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBQHE4DKNBYGE>
.
You are receiving this because you were assigned.Message ID:
***@***.***>
|
Sorry but I'm out of office tomorrow and gotta stop working soon, so I don't think I have time to review this again this week. Maybe you can do make some style preparations for the other functions in the meanwhile. But regarding how to separate these functions, I think something in between these 3 functions and the 1 original function could work. Did you look at the PR I linked? I think the big function you put first was pretty good, but I was just thinking maybe leaving the predictions in the end out if that makes sense? To create, train and validate a model as step 1, and then when the user is happy the model they created they call a different function to predict with unseen data |
Ahhh sound good to me! So i prepare everything in my best way and maybe
monday we merge ❤️❤️❤️❤️❤️❤️ have a nice weekend man!!! It is nice working
with you!
…On Thu, 26 Oct 2023, 15:50 Niko Aarnio, ***@***.***> wrote:
Sorry but I'm out of office tomorrow and gotta stop working soon, so I
don't think I have time to review this again this week. Maybe you can do
make some style preparations for the other functions in the meanwhile.
But regarding how to separate these functions, I think something in
between these 3 functions and the 1 original function could work. Did you
look at the PR I linked? I think the big function you put first was pretty
good, but I was just thinking maybe leaving the predictions in the end out
if that makes sense? To create, train and validate a model as step 1, and
then when the user is happy the model they created they call a different
function to predict with unseen data
—
Reply to this email directly, view it on GitHub
<#206 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AF3KUFPGKRMCXVETIX4BF6LYBJMBZAVCNFSM6AAAAAA57E3YFWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBRGA2TSNZXGQ>
.
You are receiving this because you were assigned.Message ID:
***@***.***>
|
Niko I added two more function let's see which ones are better! |
Hey Luca! Thanks for the update, I took a quick look now and looks pretty good now. I'm thinking about the function interfaces still a little bit. Is your CNN code somewhat similar than this? I am wondering would it make sense to include the CNN code to this PR and look at all of this at the same time 🤔. Also @msmiyels , if you got time beginning of this week, you could take a look at these functions too if you have some ideas! |
This can be a very good idea we can reduce a cnn to mlp, we just remove the conv2d layer! It is ok if the cnn is written with tensorflow?
Sent from Outlook for Android<https://aka.ms/AAb9ysg>
…________________________________
From: Niko Aarnio ***@***.***>
Sent: Monday, October 30, 2023 11:33:52 AM
To: GispoCoding/eis_toolkit ***@***.***>
Cc: Luca Zelioli ***@***.***>; Assign ***@***.***>
Subject: Re: [GispoCoding/eis_toolkit] I added MLP probability (PR #206)
Hey Luca! Thanks for the update, I took a quick look now and looks pretty good now. I'm thinking about the function interfaces still a little bit. Is your CNN code somewhat similar than this? I am wondering would it make sense to include the CNN code to this PR and look at all of this at the same time 🤔. Also @msmiyels<https://github.com/msmiyels> , if you got time beginning of this week, you could take a look at these functions too if you have some ideas!
—
Reply to this email directly, view it on GitHub<#206 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AF3KUFKJ7QBINA3E7NBMBZTYB5YABAVCNFSM6AAAAAA57E3YFWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBUHAYDOOBVGY>.
You are receiving this because you were assigned.Message ID: ***@***.***>
|
Yea it can use tensorflow (so keras?). Do you think MLP should still use sklearn or keras too? But if this idea is ok to you, go ahead and include some CNN code in this PR too and maybe it will be easier to pack all of them in a clever way! |
If we use tf and keras, we have to remove the probability map, there is not
in keras. Maybe for now we can keep both? I anyway add some code for cnn
with same PR and let s see what looks like... we keep the best
…On Mon, 30 Oct 2023, 11:42 Niko Aarnio, ***@***.***> wrote:
Yea it can use tensorflow (so keras?). Do you think MLP should still use
sklearn or keras too? But if this idea is ok to you, go ahead and include
some CNN code in this PR too and maybe it will be easier to pack all of
them in a clever way!
—
Reply to this email directly, view it on GitHub
<#206 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AF3KUFLWTWVXRFHIFDRNPA3YB5ZAXAVCNFSM6AAAAAA57E3YFWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBUHAZDCOBRG4>
.
You are receiving this because you were assigned.Message ID:
***@***.***>
|
Ah i forget, i m formatting my work pc, so the code will be ready by this evening / tomorrow morning 😛
Sent from Outlook for Android<https://aka.ms/AAb9ysg>
…________________________________
From: Niko Aarnio ***@***.***>
Sent: Monday, October 30, 2023 11:42:35 AM
To: GispoCoding/eis_toolkit ***@***.***>
Cc: Luca Zelioli ***@***.***>; Assign ***@***.***>
Subject: Re: [GispoCoding/eis_toolkit] I added MLP probability (PR #206)
Yea it can use tensorflow (so keras?). Do you think MLP should still use sklearn or keras too? But if this idea is ok to you, go ahead and include some CNN code in this PR too and maybe it will be easier to pack all of them in a clever way!
—
Reply to this email directly, view it on GitHub<#206 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AF3KUFLWTWVXRFHIFDRNPA3YB5ZAXAVCNFSM6AAAAAA57E3YFWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBUHAZDCOBRG4>.
You are receiving this because you were assigned.Message ID: ***@***.***>
|
Yeah no hurry! Tomorrow or even later is ok |
I did all the necessary function for the CNN / MLP. I need to implement exception and test and I m ready for the PR! |
ahh I also need to tun it couple of tiime as well :-) |
hey niko I got this bear type I do not have idea how to fix it beartype.roar.BeartypeDecorHintNonpepException: Function main.convolutional_body_of_the_cnn() parameter "input_layer" type hint <function Input at 0x7f9d23009f30> either PEP-noncompliant or currently unsupported by @beartype. |
without bear type is working now |
OK man I m ready to check the code with you! I removed some bear type cos were little bit pahaa |
Hi Luca, I've been a bit busy but maybe I can get you some comments today, sorry for the wait. |
Hey man how are you?
No problem! Me too I m implementing correction to my thesis! Very boring
and time consuming task😛!
No rush we can do this on monday if you do not have time today! We are
also almost finished the pr for the rami's code!
…On Fri, 3 Nov 2023, 08:55 Niko Aarnio, ***@***.***> wrote:
Hi Luca, I've been a bit busy but maybe I can get you some comments today,
sorry for the wait.
—
Reply to this email directly, view it on GitHub
<#206 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AF3KUFIGT5UVJPBRFAYYU6DYCSINXAVCNFSM6AAAAAA57E3YFWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJRHE3DANZYGQ>
.
You are receiving this because you were assigned.Message ID:
***@***.***>
|
What we do here with this big boy???? :-) |
Hey I was just about to say me and Mika finished creating the new CNN and MLP functions. It doesnt include yet multimodal approach and some other stuff I wrote as comments in the beginning of the file. So theres a new file, I didn't modify any of your work even if our functions are based on it. You can already take a look if you want and comment! Still thinking how to best include hyperparameter optimization and CV (although how often is CV used for these methods?) |
Hi there!!!! Thank you man i will take a look for sure! Yes yes i always
use CV
…On Wed, 8 Nov 2023, 11:39 Niko Aarnio, ***@***.***> wrote:
Hey I was just about to say me and Mika finished creating the new CNN and
MLP functions. It doesnt include yet multimodal approach and some other
stuff I wrote as comments in the beginning of the file. So theres a new
file, I didn't modify any of your work even if our functions are based on
it. You can already take a look if you want and comment! Still thinking how
to best include hyperparameter optimization and CV (although how often is
CV used for these methods?)
—
Reply to this email directly, view it on GitHub
<#206 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AF3KUFI2MYRNEJPUS3CI4KLYDNHN7AVCNFSM6AAAAAA57E3YFWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBRGQZDMOJQGM>
.
You are receiving this because you were assigned.Message ID:
***@***.***>
|
Closing as not relevant anymore |
No description provided.