Downloading data from Zooniverse; classification_export.status_code == 403 error #38

beckynevin · 2023-04-27T18:50:55Z

Describe the bug
The last cell of the citizen science notebook (the one that grabs the classifications from Zooniverse using panoptes client) fails every 10th time it runs.

To Reproduce
Steps to reproduce the behavior, written in imperative mood:

Restart the kernel
Scroll down to the last cell in citizen science notebook
Run the cell
Follow the directions to log in with your Zooniverse credentials.
Sometimes it works, continue restarting the kernel and rerunning until you see the error.

Expected behavior
That there be no error with downloading the classifications. In other words, classification_export.status_code == 200 and classification_export.ok == True.

Actual behavior
Sometimes (again only ~10th time this is run), classification_export.status_code == 403.

Screenshots

EDC Output

INPUT
# This cell is set up to run independently from all of the above cells
import panoptes_client, utils
panoptes_client.Panoptes.connect(login="interactive")
# This project_id is found on Zooniverse by selecting 'build a project' and then selecting the project
# You don't need to be the project owner.
project_id = 19539
classification_export = panoptes_client.Project(project_id).get_export('classifications')
list_rows = []
counter = 0
# If the following line throws an error, restart the kernel and rerun the cell.
for row in classification_export.csv_reader():
    if counter == 0:
        header = row
    else:
        list_rows.append(row)
    counter += 1
df = utils.pandas.DataFrame(list_rows, columns = header)
df

SAMPLE OUTPUT
Enter your Zooniverse credentials...
Username:  rebecca.nevin
 ········
---------------------------------------------------------------------------
Error                                     Traceback (most recent call last)
Input In [1], in <cell line: 14>()
     10 counter = 0
     11 # I get a weird error if I run the rest of this notebook first and don't rerun the import and call
     12 # to panoptes_client above: 
     13 # Error: iterator should return strings, not bytes (the file should be opened in text mode)
---> 14 for row in classification_export.csv_reader():
     16     if counter == 0:
     17         header = row

Error: iterator should return strings, not bytes (the file should be opened in text mode)

Additional context
Here is the code we wrote that bypasses this issue. We are not including this in the alpha version of the code release, but we'd like to include it down the road. Currently, we just have one comment that recommends re-running the cell if it fails.

# I currently have this cell set up to run independently from all of the above cells
#from panoptes_client import Panoptes, Project
import panoptes_client, utils
panoptes_client.Panoptes.connect(login="interactive")
# This project_id is found on Zooniverse by selecting 'build a project' and then selecting the project
# I also don't think you need to be the project owner, but I'm not sure
project_id = 19539
classification_export = panoptes_client.Project(project_id).get_export('classifications')
list_rows = []
counter = 0
# I get a weird error if I run the rest of this notebook first and don't rerun the import and call
# to panoptes_client above: 
# Error: iterator should return strings, not bytes (the file should be opened in text mode)
if classification_export.status_code == 200 and classification_export.ok == True:
    for row in classification_export.csv_reader():

        if counter == 0:
            header = row
        else:
            #print(row)
            list_rows.append(row)
        counter += 1

    df = utils.pandas.DataFrame(list_rows, columns = header)
    print(df)
elif classification_export.status_code == 403:
    print("There was an issue with the request, please try again in a minute.")
else:
    print(classification_export.status_code)
    print(classification_export.text)

The text was updated successfully, but these errors were encountered:

clareh · 2023-05-16T01:03:54Z

had a discussion with someone who is keen to use our pipeline down the road and they raised the concern about the delay for getting results. They think the ~24 hour wait to get classifications will impact their ability to do science... worth discussing in this context perhaps?

bnord · 2023-05-27T00:30:07Z

@clareh What specific concerns did they have about the delay? Why does 24-hour delay affect their science capacity?

beckynevin · 2023-05-30T14:40:11Z

Maybe the above two comments should be attached to a separate discussion? They seem not related to this issue/bug but seem related to the general discussion topic of how to fetch data.

bnord · 2023-05-30T20:17:01Z

@clareh Could you start an issue or a new discussion on this?

eatyourgreens · 2023-06-06T11:59:17Z

Hi! I've added myself to this as the Zooniverse contact.

My first thought is that perhaps the failed requests are using expired Authorization headers but I will investigate.

ericdrosas87 · 2023-06-07T18:14:23Z

Thank you @eatyourgreens !

eatyourgreens · 2023-06-08T09:15:59Z

Hi again,

Do you know if the classification export is being requested after its signed URL has expired? Here's an example of an expired link:
https://panoptesuploads.blob.core.windows.net/private/project_classifications_export/2659a7c3-043d-45c7-8cef-c0fbae185cc5.csv?sp=r&sv=2018-11-09&se=2023-06-07T22%3A08%3A14Z&sr=b&sig=rnOa82WJhSROjG61If1qZ0QLIGcHT3KADJptlQB%2BoAE%3D

The URLs expire 3 minutes after they're generated, so maybe that's the cause of the problem?

If the signed URL has expired, I think that you need to retry and generate a new URL.

eatyourgreens · 2023-06-08T21:42:18Z

zooniverse/panoptes#4209 might fix this, once it’s deployed to Panoptes production.

Credit to @yuenmichelle1 for figuring out the caching problem: those classification links are good for 3 minutes but Panoptes caches for 5 minutes, so there's a 2 minute overlap where Panoptes can give you an expired link.

ericdrosas87 · 2023-06-09T21:15:53Z

Thank you for the update @eatyourgreens, we'll retest soon

beckynevin added the bug Something isn't working label Apr 27, 2023

beckynevin assigned clareh and ericdrosas87 Apr 27, 2023

beckynevin mentioned this issue Apr 27, 2023

Fetch classifications #39

Merged

eatyourgreens self-assigned this Jun 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Downloading data from Zooniverse; classification_export.status_code == 403 error #38

Downloading data from Zooniverse; classification_export.status_code == 403 error #38

beckynevin commented Apr 27, 2023

clareh commented May 16, 2023

bnord commented May 27, 2023

beckynevin commented May 30, 2023

bnord commented May 30, 2023

eatyourgreens commented Jun 6, 2023

ericdrosas87 commented Jun 7, 2023

eatyourgreens commented Jun 8, 2023 •

edited

Loading

eatyourgreens commented Jun 8, 2023 •

edited

Loading

ericdrosas87 commented Jun 9, 2023

Downloading data from Zooniverse; classification_export.status_code == 403 error #38

Downloading data from Zooniverse; classification_export.status_code == 403 error #38

Comments

beckynevin commented Apr 27, 2023

clareh commented May 16, 2023

bnord commented May 27, 2023

beckynevin commented May 30, 2023

bnord commented May 30, 2023

eatyourgreens commented Jun 6, 2023

ericdrosas87 commented Jun 7, 2023

eatyourgreens commented Jun 8, 2023 • edited Loading

eatyourgreens commented Jun 8, 2023 • edited Loading

ericdrosas87 commented Jun 9, 2023

eatyourgreens commented Jun 8, 2023 •

edited

Loading

eatyourgreens commented Jun 8, 2023 •

edited

Loading