-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Engaging Crowds: Subject set completeness #2016
Conversation
eff8b0a
to
7b27884
Compare
4ac9154
to
195cdd3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a28708d
to
95cf2bb
Compare
@beckyrother Should be fixed now. This screenshot's from Firefox, but should be the same in other browsers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thanks Jim :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Review
This PR adds a "subject set completeness" reading to the Subject Set Selection menu (where applicable).
- The "subject set completeness" readout is a basic new UI element
- The code connects to Cellect to get the completeness stats (see Dev Notes below) ❕
Testing looks good on HMS NHS, but I ran into WTFs in other scenarios.
Dev Notes
The question of Cellect
Observation:
- this PR adds a fetch to the Cellect service (cellect.zooniverse.org) to fetch stats for Subject Sets.
- This fetch sub-action occurs on the main "classify" action logic route. i.e. it happens as soon as the user fetches the page, and isn't optional.
Thoughts:
- Concern: cellect.zooniverse.org seems to be a supplementary external service (i.e. not core to the classifier's main purpose) but it's tied to the main logic route.
- Opinion: If Cellect has issues for any reason, we shouldn't allow it to either crash or slow down the main classifier experience for the user.
Recommendations(?):
- Pop a try-catch on the fetch-from-Cellect action...
- ...and drop the fetch timeout to, like, 5 seconds? If possible?
Level of concern, from 1 () to 10 (somebody just ran rm -fr *
on the server root): 6? 7?
Testing
Tested with macOS10+Chrome88
Scenario 1: HMS NHS (production)
This is the primary target project, and has 2 workflows with 2 subject sets each.
URLs:
- http://localhost:3000/projects/msalmon/hms-nhs-the-nautical-health-service?env=production
- http://localhost:3000/projects/msalmon/hms-nhs-the-nautical-health-service/classify?env=production
Testing Steps:
- Go to the project page and click Classify, OR go to the root Classify URL directly.
- You'll see the Workflow Selection screen. Pick any workflow.
- You'll see the Subject Set Selection screen.
- Observe that each Subject Sets "card" has an additional "50% complete" etc readout
This part LGTM! 👍
However...
Scenario 2: a bunch of other projects (staging and production)
URLs:
- http://localhost:3000/projects/darkeshard/transformers/classify?env=staging
- http://localhost:3000/projects/brooke/i-fancy-cats?env=staging
- http://localhost:3000/projects/darkeshard/prototype-2015?env=production
- http://localhost:3000/projects/mschwamb/planet-four?env=production
Results:
- Unable to view any of those pages!
- Website fully crashes with the error message:
Server Error Error: Error serializing `.workflows[0]` returned from `getServerSideProps` in "/projects/[owner]/[project]". Reason: `undefined` cannot be serialized as JSON. Please use `null` or omit this value all together.
🤷♂️ ❓ ❗
Scenario 3: more other projects
URLs:
This part... looks good??
Status
I'm popping a CR for this review:
- The question of Cellect is something I'd like answered. If you think Cellect ain't something big to be worried about, I won't be too hard to convince.
- The bugs with testing scenario 2 is something that I'm still trying to figure out. Why are only specific projects borking?? I chose a mix of projects with multiple WFs and single WFs.
Current status is 🤷 ❓ ❗
@@ -50,6 +50,7 @@ | |||
"morgan": "^1.10.0", | |||
"newrelic": "~7.1.0", | |||
"next": "~9.5.5", | |||
"node-fetch": "~2.6.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: this dependency adds the equivalent of "window.fetch" to Node, which doesn't have fetch functionality by default (methinks).
async function fetchWorkflowCellectStatus(workflow) { | ||
const workflowURL = `https://cellect.zooniverse.org/workflows/${workflow.id}/status` | ||
const response = await fetch(workflowURL) | ||
const body = await response.json() | ||
const { groups } = body ?? {} | ||
return groups | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering what'd happen if cellect.zooniverse.org takes an unexpected nap - not sure if the fetchWorkflowCellectStatus() would return groups = {}, groups = undefined, or just throw new Error(). But in any case, I wouldn't want to see the Classifier completely bork as a result of a secondary system failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional note to self: expanding on error handling, if Cellect wakes up drunk one day and has a response time of like 30 seconds, would that delay also be felt by users on the front end?
Bonus: figure out if fetch()/fetch-node can be configured with an explicit timeout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. If getServerSideProps
, or getStaticProps
errors, then the server will send back a 500 response. The same thing will happen if any of the Panoptes requests fail.
Ideally, I think we'd like to use getStaticProps
and use static caching to reduce our dependency on API requests. I believe getServerSideProps
runs on every page transition so we are hitting the API a lot at the moment.
const subjectSet = subjectSets.find(subjectSet => subjectSet.id === subjectSetID) | ||
subjectSet.availableSubjects = subjectSetCounts[subjectSetID] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[minor] Is there any real possibility that subjectSet might end up as undefined here? I mean, logically, you really wouldn't call buildWorkflow()
without a workflow
and a matching subjectSets
. The only way this could bork is if the Subject Sets retrieved from fetchWorkflowData() contains more than what's listed in the workflow's links, and if that happens that's usually a back end bug that we'd need to know about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subject sets sent back from fetchWorkflowData
are all subject sets from all workflows, so should include every linked set for each individual workflow.
It would be useful to test this for a realistic data set. The test project only has two workflows and two sets. What happens when you want to link 100 volumes of transcription material to your workflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out this is undefined for AnnoTate (but then, so are a bunch of other values that we would expect to exist, like workflow display name.)
http://local.zooniverse.org:3000/projects/drrogg/annotate?env=production
I suspect what's going on for AnnoTate is that completed sets aren't present in the Cellect response.
I'm now playing with the code to see what needs to be hacked to make AnnoTate work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const [subject] = subjects | ||
const { publicRuntimeConfig = {} } = getConfig() || {} | ||
const assetPrefix = publicRuntimeConfig.assetPrefix || '' | ||
const placeholderUrl = `${assetPrefix}/subject-placeholder.png` | ||
const subjectURLs = subject ? subject.locations.map(location => Object.values(location)[0]) : [] | ||
const alt = subject ? `Subject ${subject.id}` : 'Loading' | ||
const completeness = 1 - (availableSubjects / set_member_subjects_count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[minor] Will set_member_subjects_count ever be 0? I mean, it's no biggie, it won't crash, it'll just set completeness
to negative infinity, which admittedly may make the workflow seem daunting to all but the most optimistic volunteer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Does Panoptes let you create sets with no subjects in them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's definitely possible to create Workflow with an empty Subject Set. I made a quick test, and the result is that the Classifier just shrugs and says "I guess there's no subject to load" (with a bunch of errors in the dev console) and displays a blank Task Area and blank Subject Viewer.
If you're curious: http://localhost:3000/projects/darkeshard/transformers/classify/workflow/3457
This is still in the "minor worries" category though.
e067eff
to
db4966d
Compare
The errors from standard projects should be fixed now. I'd forgotten to check for |
If Cellect is down, I guess we'll lose sequential subject selection for Engaging Crowds. I think the fallback then might be the default Panoptes selector. Random selection maybe? So Cellect being unavailable might not just break the API requests here, it might break the subject queue too. |
760d3fa
to
023f8b0
Compare
PR Review (Update)First off, a big 👍 for making sure there's some branching logic that checks when Second, I ran test through the URLs and...
So far I've traced the issue of Prototype 2015 to the following:
I'm filing this as a Monday problem. Yoinks! |
Here's what I found out: Pinging
BUT compare that to
So Workflow 7792 has translations, but WF 900 doesn't? (They're both WFs in the same Project 1292) I think the key takeaway here is that we can't 100% rely on /translations to always return resources?? |
When was workflow 900 last updated? Anything updated in the last two years or so should have translations, I think. |
023f8b0
to
bf655ee
Compare
I've added |
packages/app-project/src/helpers/fetchWorkflowsHelper/fetchWorkflowsHelper.js
Outdated
Show resolved
Hide resolved
bf655ee
to
f92626b
Compare
packages/app-project/src/helpers/fetchWorkflowsHelper/fetchWorkflowsHelper.js
Outdated
Show resolved
Hide resolved
9028524
to
bfd95d7
Compare
The tests expect EDIT: should the page error or render if subject sets aren't available? I've got the subject sets query set up to catch API errors and render with an empty subject sets array, but that doesn't make sense if subject sets are required in order to classify. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a conceptual question out of scope of this PR (which I'm happy to make a discussion issue for), but it seems like theres a lot of async code going on in this helper file, much more than its name fetchWorkflowsHelper
would imply and I'm not just not clear why there isn't a workflow store for all this.
I think that's historical, going back to #1077 when There's an ongoing discussion to add static props to |
a1233b3
to
2b7446c
Compare
I've updated this to include the latest changes to our Sentry setup. I've also split the Node data-fetching code into |
Fetch subject set statuses from Cellect, after fetching workflows from Panoptes. Add `subjectSet.availableSubjects` to each workflow subject set.
Only use the Cellect API for grouped workflows. Return a default set of workflow data otherwise. Remove included subject sets from the workflow request and request them separately, based on the Cellect response. Update tests to test for grouped and non-grouped workflows. Update mocks to represent Cellect responses without the grouped attribute.
Wrap workflow API calls in try/catch and log Node errors to Sentry.
Split out the subject set data fetching into `helpers/fetchSubjectSets`.
2b7446c
to
ff47add
Compare
Following up from my previous PR review, the functionality checks now all look good! I'm going to give the code a quick read to see what updates have happened since I last went through this. brb! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Review (Update)
Follows earlier PR review
Functionality tests look good, and code read looks good!
Testing
Same tests as before, and everything's passing OK now.
Scenario 1: HMS NHS (production)
- http://localhost:3000/projects/msalmon/hms-nhs-the-nautical-health-service?env=production
- http://localhost:3000/projects/msalmon/hms-nhs-the-nautical-health-service/classify?env=production
Results: 🆗 Completeness value for each WF and SS are displaying properly on their selection screens.
Scenario 2: a bunch of other projects (staging and production)
- http://localhost:3000/projects/darkeshard/transformers/classify?env=staging
- http://localhost:3000/projects/brooke/i-fancy-cats?env=staging
- http://localhost:3000/projects/darkeshard/prototype-2015?env=production
- http://localhost:3000/projects/mschwamb/planet-four?env=production
Results: 🆗 These workflows load normally. I take it the "can't find translations" issue has been resolved, by using the workflow.display_name
as a fallback? (i.e.: const displayName = displayNames[workflow.id] || workflow.display_name
) Lookin' good!
Scenario 3: more other projects
Results: 🆗
Status
LGTM 👍 I genuinely can't find anything of issue with this PR in its current state. Let's gooo!
Adds a query which gets available subject numbers, for a given workflow, from the Cellect API.
Removes included subject sets from the workflows API query (see #678.) Subject sets are fetched separately, by ID, from the Panoptes API. This increases the number of API requests required to build each page, but allows us to limit subject set requests to grouped workflows.
Updates the API mocks, in data-fetching tests, to include grouped/non-grouped cases and to add the
/subject_sets
endpoint.Wraps the data-fetching requests in try/catch, so that we can log API errors to Sentry.
Package:
app-project
Closes #1956.
Review Checklist
General
Components
Apps
yarn panic && yarn bootstrap
ordocker-compose up --build
and app works as expected?Publishing
Post-merging