-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented ingestion #9
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small requests, looks good otherwise! Nice use of pandas for validation!
try: | ||
df = df.applymap(lambda x: int(x) if pd.notna(x) else x) | ||
except ValueError as e: | ||
raise HTTPException( | ||
status_code=status.HTTP_400_BAD_REQUEST, | ||
detail=f"Data type error: Non-integer value found: {e}", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can let the parent try/except
handle this exception, since we already handle ValueError there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They both handle different cases of value error. the apply map raises a value error if the something from the map is incorrect datatype. while the outervalue error handles value error from loading the df which might be some random value error in the csv like maybe mixed datatypes in a column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is the case but non-obvious, put a comment explaining it exactly as you did here.
in general if something is non-obvious in a PR and someone comments on it, it's good to put a comment. same goes for below.
if df.index.duplicated().any(): | ||
raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Duplicate Gene IDs detected.") | ||
if df.columns.duplicated().any(): | ||
raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Duplicate Sample IDs detected.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these HTTP exceptions are re-caught below as 500s, which is incorrect.
No description provided.