-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated UN ILO topic generation script, added variable groupings #4242
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool!!
undata_ilo_ft: | ||
store: MEMORY | ||
embeddings: gs://datcom-nl-models/embeddings_undata_ilo_2024_05_15_11_18_05.ft_final_v20230717230459.all-MiniLM-L6-v2.csv | ||
model: ft-final-v20230717230459-all-MiniLM-L6-v2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: when you sync, you might also need to add this to
enabledIndexes: [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
@@ -49,6 +49,7 @@ class DCNames(str, Enum): | |||
SDG_DC = 'sdg' | |||
SDG_MINI_DC = 'sdgmini' | |||
UNDATA_DC = 'undata' | |||
UNDATA_ILO_DC = 'undata_ilo' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also add this to SPECIAL_DC_LIST
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
nodes = generate([_UNDATA_ILO_ROOT], | ||
vars, | ||
REMOVE_SVG_PREFIX, | ||
skip_assert=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment for why we skip_assert? I suppose we want to resolve that (in the next iteration of variable groupings?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment & TODO to remove the skip_assert
_UNDATA_ILO_NL_DESCRIPTIONS_FILE = os.path.join( | ||
_TMP_DIR, 'undata_ilo_nl_descriptions.csv') | ||
|
||
REMOVE_SVG_PREFIX = "Custom_" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could consider making this a flag like "--use_custom_dc_groups" or something that can then set "Custom_". So when we run UNData post launch, we don't need it.
Deferring this to later is OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good callout- added a comment here for now, and can address this when moving this data to production
|
||
REMOVE_SVG_PREFIX = "Custom_" | ||
|
||
API_ROOT = "https://staging.unsdg.datacommons.org" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this too should not be hardcoded, but passed via flag? Deferring to later is OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 added a comment.
def download_svg_recursive(svgs: List[str], | ||
nodes: Dict[str, Dict], | ||
keep_vars: Set[str], | ||
remove_svg_prefix=""): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I regret not wrapping this whole thing in a class :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After working with this script , maybe we can refactor it to a class and to take in a yaml config file for the various sdg/who/ilo configurations
@@ -27,6 +27,10 @@ indexes: | |||
store: MEMORY | |||
embeddings: gs://datcom-nl-models/embeddings_undata_2024_03_20_11_01_12.ft_final_v20230717230459.all-MiniLM-L6-v2.csv | |||
model: ft-final-v20230717230459-all-MiniLM-L6-v2 | |||
undata_ilo_ft: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As validation, should we have an e2e test like
website/server/integration_tests/explore_test.py
Lines 278 to 281 in f7433df
def test_detection_basic_undata(self): | |
self.run_detection('detection_api_undata_idx', ['Health in USA'], | |
test='unittest', | |
idx='undata_ft') |
(And later when the data has been added to main DC, we can do fulfill as well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
I see your comments, but not the commit? |
…environments. Added detect tests for undata_ilo index. pr feedback
Commented a little too early- just pushed |
dc/g/Custom_UN
variable grouping root to un staging environment