Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use cached features in /rows #1573

Merged
merged 10 commits into from
Jul 28, 2023
Merged

Conversation

lhoestq
Copy link
Member

@lhoestq lhoestq commented Jul 27, 2023

/rows needs the cached features since they're not always available in the parquet metadata.
This was causing some Image columns to be seen as a struct of binary data, which are not supported in the viewer (shown as "null").

Therefore I'm now passing the features from config-parquet-and-info to config-parquet and then to config-parquet-metadata. I kept it backward compatible in case a cached value doesn't have this field yet.

Therefore there's no need for a mongo migration. We can just re-run all the config-parquet and config-parquet-metadata jobs. I incremented their versions.

close #1421

@codecov-commenter
Copy link

codecov-commenter commented Jul 27, 2023

Codecov Report

Patch coverage: 92.50% and project coverage change: -1.73% ⚠️

Comparison is base (f0b5992) 92.17% compared to head (d68e8be) 90.44%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1573      +/-   ##
==========================================
- Coverage   92.17%   90.44%   -1.73%     
==========================================
  Files          77      193     +116     
  Lines        5443    12027    +6584     
==========================================
+ Hits         5017    10878    +5861     
- Misses        426     1149     +723     
Flag Coverage Δ
jobs_cache_maintenance 99.08% <ø> (?)
jobs_mongodb_migration 85.07% <ø> (?)
libs_libcommon 91.55% <82.35%> (?)
services_admin 85.87% <ø> (?)
services_api 88.06% <ø> (?)
services_rows 83.45% <100.00%> (?)
services_worker 92.18% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
...s/worker/src/worker/job_runners/dataset/parquet.py 82.35% <ø> (ø)
...orker/job_runners/split/first_rows_from_parquet.py 95.08% <ø> (ø)
...s/worker/tests/job_runners/dataset/test_parquet.py 100.00% <ø> (ø)
libs/libcommon/src/libcommon/simple_cache.py 91.80% <80.00%> (ø)
libs/libcommon/src/libcommon/constants.py 100.00% <100.00%> (ø)
services/rows/src/rows/routes/rows.py 61.70% <100.00%> (ø)
services/rows/tests/routes/test_rows.py 92.85% <100.00%> (ø)
services/worker/src/worker/dtos.py 99.23% <100.00%> (+0.01%) ⬆️
...es/worker/src/worker/job_runners/config/parquet.py 100.00% <100.00%> (ø)
.../src/worker/job_runners/config/parquet_metadata.py 94.54% <100.00%> (+0.42%) ⬆️
... and 2 more

... and 112 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lhoestq lhoestq marked this pull request as ready for review July 27, 2023 17:19
@lhoestq lhoestq requested a review from severo July 27, 2023 17:19
parquet_files=response["content"]["parquet_files"], partial=response["content"]["partial"]
parquet_files=response["content"]["parquet_files"],
partial=response["content"]["partial"],
features=None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is it used for? better not to provide the field if it's always None

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config_parquet_content is instantiated for mypy typing I believe, but we don't return the features anyway in DatasetParquetResponse so no need to specify it.

I'll add a comment

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we could test the other branch of the if/else

services/rows/src/rows/routes/rows.py Show resolved Hide resolved
@@ -30,6 +31,18 @@ class FileSystemError(Exception):
pass


def _clean_mongo_objects(obj: Any) -> Any:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe move this to the root function in simple_cache.py that gets the content out of the MongoDB?

libs/libcommon/src/libcommon/parquet_utils.py Show resolved Hide resolved
@severo
Copy link
Collaborator

severo commented Jul 27, 2023

also: I think we have to update the openapi spec

@github-actions
Copy link

github-actions bot commented Jul 28, 2023

ArgoCD Diff for commit 616c618

Updated at 7/28/2023, 2:21:37 PM CEST

App: datasets-server-prod
YAML generation: Success 🟢
App sync status: Out of Sync ⚠️

===== /ConfigMap datasets-server/prod-datasets-server-reverse-proxy ======
--- /tmp/argocd-diff3500159394/prod-datasets-server-reverse-proxy-live.yaml	2023-07-28 12:21:35.992197591 +0000
+++ /tmp/argocd-diff3500159394/prod-datasets-server-reverse-proxy	2023-07-28 12:21:35.968197239 +0000
@@ -1580,14 +1580,21 @@
     },\n            \"examples\": {\n              \"glue\": { \"summary\": \"a canonical
     dataset\", \"value\": \"glue\" },\n              \"Helsinki-NLP/tatoeba_mt\":
     {\n                \"summary\": \"a namespaced dataset\",\n                \"value\":
-    \"Helsinki-NLP/tatoeba_mt\"\n              }\n            }\n          }\n        ],\n
-    \       \"responses\": {\n          \"200\": {\n            \"description\": \"A
-    list of parquet files.</br>Beware: the response is not paginated.\",\n            \"headers\":
-    {\n              \"Cache-Control\": { \"$ref\": \"#/components/headers/Cache-Control\"
-    },\n              \"Access-Control-Allow-Origin\": {\n                \"$ref\":
-    \"#/components/headers/Access-Control-Allow-Origin\"\n              }\n            },\n
-    \           \"content\": {\n              \"application/json\": {\n                \"schema\":
-    {\n                  \"$ref\": \"#/components/schemas/ParquetFilesResponse\"\n
+    \"Helsinki-NLP/tatoeba_mt\"\n              }\n            }\n          },\n          {\n
+    \           \"name\": \"config\",\n            \"in\": \"query\",\n            \"description\":
+    \"The dataset configuration (or subset).\",\n            \"required\": false,\n
+    \           \"schema\": { \"type\": \"string\" },\n            \"examples\": {\n
+    \             \"cola\": {\n                \"summary\": \"a subset of the glue
+    dataset\",\n                \"value\": \"cola\"\n              },\n              \"yangdong/ecqa\":
+    {\n                \"summary\": \"the default configuration given by the \U0001F917
+    Datasets library\",\n                \"value\": \"yangdong--ecqa\"\n              }\n
+    \           }\n          }\n        ],\n        \"responses\": {\n          \"200\":
+    {\n            \"description\": \"A list of parquet files.</br>Beware: the response
+    is not paginated.\",\n            \"headers\": {\n              \"Cache-Control\":
+    { \"$ref\": \"#/components/headers/Cache-Control\" },\n              \"Access-Control-Allow-Origin\":
+    {\n                \"$ref\": \"#/components/headers/Access-Control-Allow-Origin\"\n
+    \             }\n            },\n            \"content\": {\n              \"application/json\":
+    {\n                \"schema\": {\n                  \"$ref\": \"#/components/schemas/ParquetFilesResponse\"\n
     \               },\n                \"examples\": {\n                  \"duorc\":
     {\n                    \"summary\": \"duorc: six parquet files, one per split\",\n
     \                   \"value\": {\n                      \"parquet_files\": [\n
@@ -1615,77 +1622,152 @@
     \"duorc\",\n                          \"config\": \"SelfRC\",\n                          \"split\":
     \"validation\",\n                          \"url\": \"https://huggingface.co/datasets/duorc/resolve/refs%2Fconvert%2Fparquet/SelfRC/duorc-validation.parquet\",\n
     \                         \"filename\": \"duorc-validation.parquet\",\n                          \"size\":
-    3114389\n                        }\n                      ]\n                    }\n
-    \                 },\n                  \"sharded\": {\n                    \"summary\":
-    \"alexandrainst/danish-wit: the parquet file for the train split is partitioned
-    into 9 shards\",\n                    \"value\": {\n                      \"parquet_files\":
-    [\n                        {\n                          \"dataset\": \"alexandrainst/danish-wit\",\n
-    \                         \"config\": \"alexandrainst--danish-wit\",\n                          \"split\":
-    \"test\",\n                          \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-test.parquet\",\n
+    3114389\n                        }\n                      ],\n                      \"pending\":
+    [],\n                      \"failed\": [],\n                      \"partial\":
+    false\n                    }\n                  },\n                  \"duorc
+    with ParaphraseRC config\": {\n                    \"summary\": \"duorc: three
+    parquet files for ParaphraseRC, one per split\",\n                    \"value\":
+    {\n                      \"parquet_files\": [\n                        {\n                          \"dataset\":
+    \"duorc\",\n                          \"config\": \"ParaphraseRC\",\n                          \"split\":
+    \"test\",\n                          \"url\": \"https://huggingface.co/datasets/duorc/resolve/refs%2Fconvert%2Fparquet/ParaphraseRC/duorc-test.parquet\",\n
+    \                         \"filename\": \"duorc-test.parquet\",\n                          \"size\":
+    6136590\n                        },\n                        {\n                          \"dataset\":
+    \"duorc\",\n                          \"config\": \"ParaphraseRC\",\n                          \"split\":
+    \"train\",\n                          \"url\": \"https://huggingface.co/datasets/duorc/resolve/refs%2Fconvert%2Fparquet/ParaphraseRC/duorc-train.parquet\",\n
+    \                         \"filename\": \"duorc-train.parquet\",\n                          \"size\":
+    26005667\n                        },\n                        {\n                          \"dataset\":
+    \"duorc\",\n                          \"config\": \"ParaphraseRC\",\n                          \"split\":
+    \"validation\",\n                          \"url\": \"https://huggingface.co/datasets/duorc/resolve/refs%2Fconvert%2Fparquet/ParaphraseRC/duorc-validation.parquet\",\n
+    \                         \"filename\": \"duorc-validation.parquet\",\n                          \"size\":
+    5566867\n                        }\n                      ],\n                      \"pending\":
+    [],\n                      \"failed\": [],\n                      \"partial\":
+    false,\n                      \"features\": {\n                        \"plot_id\":
+    {\n                          \"dtype\": \"string\",\n                          \"_type\":
+    \"Value\"\n                        },\n                        \"plot\": {\n                          \"dtype\":
+    \"string\",\n                          \"_type\": \"Value\"\n                        },\n
+    \                       \"title\": {\n                          \"dtype\": \"string\",\n
+    \                         \"_type\": \"Value\"\n                        },\n                        \"question_id\":
+    {\n                          \"dtype\": \"string\",\n                          \"_type\":
+    \"Value\"\n                        },\n                        \"question\": {\n
+    \                         \"dtype\": \"string\",\n                          \"_type\":
+    \"Value\"\n                        },\n                        \"answers\": {\n
+    \                         \"feature\": {\n                            \"dtype\":
+    \"string\",\n                            \"_type\": \"Value\"\n                          },\n
+    \                         \"_type\": \"Sequence\"\n                        },\n
+    \                       \"no_answer\": {\n                          \"dtype\":
+    \"bool\",\n                          \"_type\": \"Value\"\n                        }\n
+    \                     }\n                    }\n                  },\n                  \"sharded\":
+    {\n                    \"summary\": \"alexandrainst/da-wit: the parquet file for
+    the train split is partitioned into 9 shards\",\n                    \"value\":
+    {\n                      \"parquet_files\": [\n                        {\n                          \"dataset\":
+    \"alexandrainst/da-wit\",\n                          \"config\": \"alexandrainst--danish-wit\",\n
+    \                         \"split\": \"test\",\n                          \"url\":
+    \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-test.parquet\",\n
     \                         \"filename\": \"parquet-test.parquet\",\n                          \"size\":
-    48781933\n                        },\n                        {\n                          \"dataset\":
-    \"alexandrainst/danish-wit\",\n                          \"config\": \"alexandrainst--danish-wit\",\n
+    48684227\n                        },\n                        {\n                          \"dataset\":
+    \"alexandrainst/da-wit\",\n                          \"config\": \"alexandrainst--danish-wit\",\n
     \                         \"split\": \"train\",\n                          \"url\":
-    \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00000-of-00009.parquet\",\n
-    \                         \"filename\": \"parquet-train-00000-of-00009.parquet\",\n
-    \                         \"size\": 937127291\n                        },\n                        {\n
-    \                         \"dataset\": \"alexandrainst/danish-wit\",\n                          \"config\":
-    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
-    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00001-of-00009.parquet\",\n
-    \                         \"filename\": \"parquet-train-00001-of-00009.parquet\",\n
-    \                         \"size\": 925920565\n                        },\n                        {\n
-    \                         \"dataset\": \"alexandrainst/danish-wit\",\n                          \"config\":
-    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
-    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00002-of-00009.parquet\",\n
-    \                         \"filename\": \"parquet-train-00002-of-00009.parquet\",\n
-    \                         \"size\": 940390661\n                        },\n                        {\n
-    \                         \"dataset\": \"alexandrainst/danish-wit\",\n                          \"config\":
-    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
-    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00003-of-00009.parquet\",\n
-    \                         \"filename\": \"parquet-train-00003-of-00009.parquet\",\n
-    \                         \"size\": 934549621\n                        },\n                        {\n
-    \                         \"dataset\": \"alexandrainst/danish-wit\",\n                          \"config\":
-    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
-    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00004-of-00009.parquet\",\n
-    \                         \"filename\": \"parquet-train-00004-of-00009.parquet\",\n
-    \                         \"size\": 493004154\n                        },\n                        {\n
-    \                         \"dataset\": \"alexandrainst/danish-wit\",\n                          \"config\":
-    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
-    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00005-of-00009.parquet\",\n
-    \                         \"filename\": \"parquet-train-00005-of-00009.parquet\",\n
-    \                         \"size\": 942848888\n                        },\n                        {\n
-    \                         \"dataset\": \"alexandrainst/danish-wit\",\n                          \"config\":
-    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
-    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00006-of-00009.parquet\",\n
-    \                         \"filename\": \"parquet-train-00006-of-00009.parquet\",\n
-    \                         \"size\": 933373843\n                        },\n                        {\n
-    \                         \"dataset\": \"alexandrainst/danish-wit\",\n                          \"config\":
-    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
-    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00007-of-00009.parquet\",\n
-    \                         \"filename\": \"parquet-train-00007-of-00009.parquet\",\n
-    \                         \"size\": 936939176\n                        },\n                        {\n
-    \                         \"dataset\": \"alexandrainst/danish-wit\",\n                          \"config\":
-    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
-    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00008-of-00009.parquet\",\n
-    \                         \"filename\": \"parquet-train-00008-of-00009.parquet\",\n
-    \                         \"size\": 946933048\n                        },\n                        {\n
-    \                         \"dataset\": \"alexandrainst/danish-wit\",\n                          \"config\":
+    \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00000-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00000-of-00017.parquet\",\n
+    \                         \"size\": 465549291\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00001-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00001-of-00017.parquet\",\n
+    \                         \"size\": 465701535\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00002-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00002-of-00017.parquet\",\n
+    \                         \"size\": 463857123\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00003-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00003-of-00017.parquet\",\n
+    \                         \"size\": 456197486\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00004-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00004-of-00017.parquet\",\n
+    \                         \"size\": 465412051\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00005-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00005-of-00017.parquet\",\n
+    \                         \"size\": 469114305\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00006-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00006-of-00017.parquet\",\n
+    \                         \"size\": 460338645\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00007-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00007-of-00017.parquet\",\n
+    \                         \"size\": 468309376\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00008-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00008-of-00017.parquet\",\n
+    \                         \"size\": 490063121\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00009-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00009-of-00017.parquet\",\n
+    \                         \"size\": 460462764\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00010-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00010-of-00017.parquet\",\n
+    \                         \"size\": 476525998\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00011-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00011-of-00017.parquet\",\n
+    \                         \"size\": 470327354\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00012-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00012-of-00017.parquet\",\n
+    \                         \"size\": 457138334\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00013-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00013-of-00017.parquet\",\n
+    \                         \"size\": 464485292\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00014-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00014-of-00017.parquet\",\n
+    \                         \"size\": 466549376\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00015-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00015-of-00017.parquet\",\n
+    \                         \"size\": 460452174\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00016-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00016-of-00017.parquet\",\n
+    \                         \"size\": 480583533\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
     \"alexandrainst--danish-wit\",\n                          \"split\": \"val\",\n
-    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-val.parquet\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-val.parquet\",\n
     \                         \"filename\": \"parquet-val.parquet\",\n                          \"size\":
-    11437355\n                        }\n                      ]\n                    }\n
-    \                 }\n                }\n              }\n            }\n          },\n
-    \         \"401\": {\n            \"description\": \"If the external authentication
-    step on the Hugging Face Hub failed, and no authentication mechanism has been
-    provided. Retry with authentication.\",\n            \"headers\": {\n              \"Cache-Control\":
-    {\n                \"$ref\": \"#/components/headers/Cache-Control\"\n              },\n
-    \             \"Access-Control-Allow-Origin\": {\n                \"$ref\": \"#/components/headers/Access-Control-Allow-Origin\"\n
-    \             },\n              \"X-Error-Code\": {\n                \"$ref\":
-    \"#/components/headers/X-Error-Code-splits-401\"\n              }\n            },\n
-    \           \"content\": {\n              \"application/json\": {\n                \"schema\":
-    {\n                  \"$ref\": \"#/components/schemas/CustomError\"\n                },\n
-    \               \"examples\": {\n                  \"inexistent-dataset\": {\n
-    \                   \"summary\": \"The dataset does not exist.\",\n                    \"value\":
+    11434278\n                        }\n                      ],\n                      \"pending\":
+    [],\n                      \"failed\": [],\n                      \"partial\":
+    false\n                    }\n                  }\n                }\n              }\n
+    \           }\n          },\n          \"401\": {\n            \"description\":
+    \"If the external authentication step on the Hugging Face Hub failed, and no authentication
+    mechanism has been provided. Retry with authentication.\",\n            \"headers\":
+    {\n              \"Cache-Control\": {\n                \"$ref\": \"#/components/headers/Cache-Control\"\n
+    \             },\n              \"Access-Control-Allow-Origin\": {\n                \"$ref\":
+    \"#/components/headers/Access-Control-Allow-Origin\"\n              },\n              \"X-Error-Code\":
+    {\n                \"$ref\": \"#/components/headers/X-Error-Code-splits-401\"\n
+    \             }\n            },\n            \"content\": {\n              \"application/json\":
+    {\n                \"schema\": {\n                  \"$ref\": \"#/components/schemas/CustomError\"\n
+    \               },\n                \"examples\": {\n                  \"inexistent-dataset\":
+    {\n                    \"summary\": \"The dataset does not exist.\",\n                    \"value\":
     {\n                      \"error\": \"The dataset does not exist, or is not accessible
     without authentication (private or gated). Please check the spelling of the dataset
     name or retry with authentication.\"\n                    }\n                  },\n

===== apps/Deployment datasets-server/prod-datasets-server-admin ======
--- /tmp/argocd-diff814272897/prod-datasets-server-admin-live.yaml	2023-07-28 12:21:36.064198647 +0000
+++ /tmp/argocd-diff814272897/prod-datasets-server-admin	2023-07-28 12:21:36.060198588 +0000
@@ -480,7 +480,7 @@
           value: "9"
         - name: ADMIN_UVICORN_PORT
           value: "8080"
-        image: huggingface/datasets-server-services-admin:sha-fa7c7e8
+        image: huggingface/datasets-server-services-admin:sha-f0b5992
         imagePullPolicy: IfNotPresent
         livenessProbe:
           failureThreshold: 30

===== apps/Deployment datasets-server/prod-datasets-server-api ======
--- /tmp/argocd-diff174236318/prod-datasets-server-api-live.yaml	2023-07-28 12:21:36.080198881 +0000
+++ /tmp/argocd-diff174236318/prod-datasets-server-api	2023-07-28 12:21:36.076198823 +0000
@@ -379,7 +379,7 @@
           value: "9"
         - name: API_UVICORN_PORT
           value: "8080"
-        image: huggingface/datasets-server-services-api:sha-fa7c7e8
+        image: huggingface/datasets-server-services-api:sha-f0b5992
         imagePullPolicy: IfNotPresent
         livenessProbe:
           failureThreshold: 30

===== apps/Deployment datasets-server/prod-datasets-server-reverse-proxy ======
--- /tmp/argocd-diff1766863281/prod-datasets-server-reverse-proxy-live.yaml	2023-07-28 12:21:36.100199175 +0000
+++ /tmp/argocd-diff1766863281/prod-datasets-server-reverse-proxy	2023-07-28 12:21:36.096199116 +0000
@@ -337,7 +337,7 @@
   template:
     metadata:
       annotations:
-        checksum/config: 0a378473267f8c4dbdd9adee6e764a61b456827dc895071b155ed313ba7980f9
+        checksum/config: 21715e69f619c39084d57b84024ee0d7313f486b1a4603809f242be50d431b31
         co.elastic.logs/json.expand_keys: "true"
       creationTimestamp: null
       labels:

===== apps/Deployment datasets-server/prod-datasets-server-rows ======
--- /tmp/argocd-diff4213267921/prod-datasets-server-rows-live.yaml	2023-07-28 12:21:36.124199527 +0000
+++ /tmp/argocd-diff4213267921/prod-datasets-server-rows	2023-07-28 12:21:36.120199468 +0000
@@ -444,7 +444,7 @@
           value: "9"
         - name: API_UVICORN_PORT
           value: "8082"
-        image: huggingface/datasets-server-services-rows:sha-fa7c7e8
+        image: huggingface/datasets-server-services-rows:sha-f0b5992
         imagePullPolicy: IfNotPresent
         livenessProbe:
           failureThreshold: 30

===== apps/Deployment datasets-server/prod-datasets-server-worker-all ======
--- /tmp/argocd-diff2729663882/prod-datasets-server-worker-all-live.yaml	2023-07-28 12:21:36.168200172 +0000
+++ /tmp/argocd-diff2729663882/prod-datasets-server-worker-all	2023-07-28 12:21:36.164200114 +0000
@@ -711,7 +711,7 @@
           value: "0"
         - name: WORKER_JOB_TYPES_BLOCKED
         - name: WORKER_JOB_TYPES_ONLY
-        image: huggingface/datasets-server-services-worker:sha-fa7c7e8
+        image: huggingface/datasets-server-services-worker:sha-f0b5992
         imagePullPolicy: IfNotPresent
         name: prod-datasets-server-worker
         resources:

===== apps/Deployment datasets-server/prod-datasets-server-worker-light ======
--- /tmp/argocd-diff1340572032/prod-datasets-server-worker-light-live.yaml	2023-07-28 12:21:36.200200642 +0000
+++ /tmp/argocd-diff1340572032/prod-datasets-server-worker-light	2023-07-28 12:21:36.196200583 +0000
@@ -711,7 +711,7 @@
           value: "0"
         - name: WORKER_JOB_TYPES_BLOCKED
         - name: WORKER_JOB_TYPES_ONLY
-        image: huggingface/datasets-server-services-worker:sha-fa7c7e8
+        image: huggingface/datasets-server-services-worker:sha-f0b5992
         imagePullPolicy: IfNotPresent
         name: prod-datasets-server-worker
         resources:

===== batch/CronJob datasets-server/prod-datasets-server-job-backfill ======
--- /tmp/argocd-diff3480783511/prod-datasets-server-job-backfill-live.yaml	2023-07-28 12:21:36.216200876 +0000
+++ /tmp/argocd-diff3480783511/prod-datasets-server-job-backfill	2023-07-28 12:21:36.216200876 +0000
@@ -203,7 +203,7 @@
               value: CreateCommitError,LockedDatasetTimeoutError
             - name: LOG_LEVEL
               value: debug
-            image: huggingface/datasets-server-jobs-cache_maintenance:sha-fa7c7e8
+            image: huggingface/datasets-server-jobs-cache_maintenance:sha-f0b5992
             imagePullPolicy: IfNotPresent
             name: prod-datasets-server-backfill
             resources:

===== batch/CronJob datasets-server/prod-datasets-server-job-metrics-collector ======
--- /tmp/argocd-diff791710071/prod-datasets-server-job-metrics-collector-live.yaml	2023-07-28 12:21:36.228201053 +0000
+++ /tmp/argocd-diff791710071/prod-datasets-server-job-metrics-collector	2023-07-28 12:21:36.228201053 +0000
@@ -197,7 +197,7 @@
                   optional: false
             - name: CACHE_MAINTENANCE_ACTION
               value: collect-metrics
-            image: huggingface/datasets-server-jobs-cache_maintenance:sha-fa7c7e8
+            image: huggingface/datasets-server-jobs-cache_maintenance:sha-f0b5992
             imagePullPolicy: IfNotPresent
             name: prod-datasets-server-metrics-collector
             resources:

App: datasets-server-staging
YAML generation: Success 🟢
App sync status: Synced ✅

===== /ConfigMap datasets-server/staging-datasets-server-reverse-proxy ======
--- /tmp/argocd-diff2263708852/staging-datasets-server-reverse-proxy-live.yaml	2023-07-28 12:21:37.156214665 +0000
+++ /tmp/argocd-diff2263708852/staging-datasets-server-reverse-proxy	2023-07-28 12:21:37.132214313 +0000
@@ -1580,14 +1580,21 @@
     },\n            \"examples\": {\n              \"glue\": { \"summary\": \"a canonical
     dataset\", \"value\": \"glue\" },\n              \"Helsinki-NLP/tatoeba_mt\":
     {\n                \"summary\": \"a namespaced dataset\",\n                \"value\":
-    \"Helsinki-NLP/tatoeba_mt\"\n              }\n            }\n          }\n        ],\n
-    \       \"responses\": {\n          \"200\": {\n            \"description\": \"A
-    list of parquet files.</br>Beware: the response is not paginated.\",\n            \"headers\":
-    {\n              \"Cache-Control\": { \"$ref\": \"#/components/headers/Cache-Control\"
-    },\n              \"Access-Control-Allow-Origin\": {\n                \"$ref\":
-    \"#/components/headers/Access-Control-Allow-Origin\"\n              }\n            },\n
-    \           \"content\": {\n              \"application/json\": {\n                \"schema\":
-    {\n                  \"$ref\": \"#/components/schemas/ParquetFilesResponse\"\n
+    \"Helsinki-NLP/tatoeba_mt\"\n              }\n            }\n          },\n          {\n
+    \           \"name\": \"config\",\n            \"in\": \"query\",\n            \"description\":
+    \"The dataset configuration (or subset).\",\n            \"required\": false,\n
+    \           \"schema\": { \"type\": \"string\" },\n            \"examples\": {\n
+    \             \"cola\": {\n                \"summary\": \"a subset of the glue
+    dataset\",\n                \"value\": \"cola\"\n              },\n              \"yangdong/ecqa\":
+    {\n                \"summary\": \"the default configuration given by the \U0001F917
+    Datasets library\",\n                \"value\": \"yangdong--ecqa\"\n              }\n
+    \           }\n          }\n        ],\n        \"responses\": {\n          \"200\":
+    {\n            \"description\": \"A list of parquet files.</br>Beware: the response
+    is not paginated.\",\n            \"headers\": {\n              \"Cache-Control\":
+    { \"$ref\": \"#/components/headers/Cache-Control\" },\n              \"Access-Control-Allow-Origin\":
+    {\n                \"$ref\": \"#/components/headers/Access-Control-Allow-Origin\"\n
+    \             }\n            },\n            \"content\": {\n              \"application/json\":
+    {\n                \"schema\": {\n                  \"$ref\": \"#/components/schemas/ParquetFilesResponse\"\n
     \               },\n                \"examples\": {\n                  \"duorc\":
     {\n                    \"summary\": \"duorc: six parquet files, one per split\",\n
     \                   \"value\": {\n                      \"parquet_files\": [\n
@@ -1615,77 +1622,152 @@
     \"duorc\",\n                          \"config\": \"SelfRC\",\n                          \"split\":
     \"validation\",\n                          \"url\": \"https://huggingface.co/datasets/duorc/resolve/refs%2Fconvert%2Fparquet/SelfRC/duorc-validation.parquet\",\n
     \                         \"filename\": \"duorc-validation.parquet\",\n                          \"size\":
-    3114389\n                        }\n                      ]\n                    }\n
-    \                 },\n                  \"sharded\": {\n                    \"summary\":
-    \"alexandrainst/danish-wit: the parquet file for the train split is partitioned
-    into 9 shards\",\n                    \"value\": {\n                      \"parquet_files\":
-    [\n                        {\n                          \"dataset\": \"alexandrainst/danish-wit\",\n
-    \                         \"config\": \"alexandrainst--danish-wit\",\n                          \"split\":
-    \"test\",\n                          \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-test.parquet\",\n
+    3114389\n                        }\n                      ],\n                      \"pending\":
+    [],\n                      \"failed\": [],\n                      \"partial\":
+    false\n                    }\n                  },\n                  \"duorc
+    with ParaphraseRC config\": {\n                    \"summary\": \"duorc: three
+    parquet files for ParaphraseRC, one per split\",\n                    \"value\":
+    {\n                      \"parquet_files\": [\n                        {\n                          \"dataset\":
+    \"duorc\",\n                          \"config\": \"ParaphraseRC\",\n                          \"split\":
+    \"test\",\n                          \"url\": \"https://huggingface.co/datasets/duorc/resolve/refs%2Fconvert%2Fparquet/ParaphraseRC/duorc-test.parquet\",\n
+    \                         \"filename\": \"duorc-test.parquet\",\n                          \"size\":
+    6136590\n                        },\n                        {\n                          \"dataset\":
+    \"duorc\",\n                          \"config\": \"ParaphraseRC\",\n                          \"split\":
+    \"train\",\n                          \"url\": \"https://huggingface.co/datasets/duorc/resolve/refs%2Fconvert%2Fparquet/ParaphraseRC/duorc-train.parquet\",\n
+    \                         \"filename\": \"duorc-train.parquet\",\n                          \"size\":
+    26005667\n                        },\n                        {\n                          \"dataset\":
+    \"duorc\",\n                          \"config\": \"ParaphraseRC\",\n                          \"split\":
+    \"validation\",\n                          \"url\": \"https://huggingface.co/datasets/duorc/resolve/refs%2Fconvert%2Fparquet/ParaphraseRC/duorc-validation.parquet\",\n
+    \                         \"filename\": \"duorc-validation.parquet\",\n                          \"size\":
+    5566867\n                        }\n                      ],\n                      \"pending\":
+    [],\n                      \"failed\": [],\n                      \"partial\":
+    false,\n                      \"features\": {\n                        \"plot_id\":
+    {\n                          \"dtype\": \"string\",\n                          \"_type\":
+    \"Value\"\n                        },\n                        \"plot\": {\n                          \"dtype\":
+    \"string\",\n                          \"_type\": \"Value\"\n                        },\n
+    \                       \"title\": {\n                          \"dtype\": \"string\",\n
+    \                         \"_type\": \"Value\"\n                        },\n                        \"question_id\":
+    {\n                          \"dtype\": \"string\",\n                          \"_type\":
+    \"Value\"\n                        },\n                        \"question\": {\n
+    \                         \"dtype\": \"string\",\n                          \"_type\":
+    \"Value\"\n                        },\n                        \"answers\": {\n
+    \                         \"feature\": {\n                            \"dtype\":
+    \"string\",\n                            \"_type\": \"Value\"\n                          },\n
+    \                         \"_type\": \"Sequence\"\n                        },\n
+    \                       \"no_answer\": {\n                          \"dtype\":
+    \"bool\",\n                          \"_type\": \"Value\"\n                        }\n
+    \                     }\n                    }\n                  },\n                  \"sharded\":
+    {\n                    \"summary\": \"alexandrainst/da-wit: the parquet file for
+    the train split is partitioned into 9 shards\",\n                    \"value\":
+    {\n                      \"parquet_files\": [\n                        {\n                          \"dataset\":
+    \"alexandrainst/da-wit\",\n                          \"config\": \"alexandrainst--danish-wit\",\n
+    \                         \"split\": \"test\",\n                          \"url\":
+    \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-test.parquet\",\n
     \                         \"filename\": \"parquet-test.parquet\",\n                          \"size\":
-    48781933\n                        },\n                        {\n                          \"dataset\":
-    \"alexandrainst/danish-wit\",\n                          \"config\": \"alexandrainst--danish-wit\",\n
+    48684227\n                        },\n                        {\n                          \"dataset\":
+    \"alexandrainst/da-wit\",\n                          \"config\": \"alexandrainst--danish-wit\",\n
     \                         \"split\": \"train\",\n                          \"url\":
-    \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00000-of-00009.parquet\",\n
-    \                         \"filename\": \"parquet-train-00000-of-00009.parquet\",\n
-    \                         \"size\": 937127291\n                        },\n                        {\n
-    \                         \"dataset\": \"alexandrainst/danish-wit\",\n                          \"config\":
-    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
-    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00001-of-00009.parquet\",\n
-    \                         \"filename\": \"parquet-train-00001-of-00009.parquet\",\n
-    \                         \"size\": 925920565\n                        },\n                        {\n
-    \                         \"dataset\": \"alexandrainst/danish-wit\",\n                          \"config\":
-    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
-    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00002-of-00009.parquet\",\n
-    \                         \"filename\": \"parquet-train-00002-of-00009.parquet\",\n
-    \                         \"size\": 940390661\n                        },\n                        {\n
-    \                         \"dataset\": \"alexandrainst/danish-wit\",\n                          \"config\":
-    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
-    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00003-of-00009.parquet\",\n
-    \                         \"filename\": \"parquet-train-00003-of-00009.parquet\",\n
-    \                         \"size\": 934549621\n                        },\n                        {\n
-    \                         \"dataset\": \"alexandrainst/danish-wit\",\n                          \"config\":
-    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
-    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00004-of-00009.parquet\",\n
-    \                         \"filename\": \"parquet-train-00004-of-00009.parquet\",\n
-    \                         \"size\": 493004154\n                        },\n                        {\n
-    \                         \"dataset\": \"alexandrainst/danish-wit\",\n                          \"config\":
-    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
-    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00005-of-00009.parquet\",\n
-    \                         \"filename\": \"parquet-train-00005-of-00009.parquet\",\n
-    \                         \"size\": 942848888\n                        },\n                        {\n
-    \                         \"dataset\": \"alexandrainst/danish-wit\",\n                          \"config\":
-    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
-    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00006-of-00009.parquet\",\n
-    \                         \"filename\": \"parquet-train-00006-of-00009.parquet\",\n
-    \                         \"size\": 933373843\n                        },\n                        {\n
-    \                         \"dataset\": \"alexandrainst/danish-wit\",\n                          \"config\":
-    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
-    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00007-of-00009.parquet\",\n
-    \                         \"filename\": \"parquet-train-00007-of-00009.parquet\",\n
-    \                         \"size\": 936939176\n                        },\n                        {\n
-    \                         \"dataset\": \"alexandrainst/danish-wit\",\n                          \"config\":
-    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
-    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00008-of-00009.parquet\",\n
-    \                         \"filename\": \"parquet-train-00008-of-00009.parquet\",\n
-    \                         \"size\": 946933048\n                        },\n                        {\n
-    \                         \"dataset\": \"alexandrainst/danish-wit\",\n                          \"config\":
+    \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00000-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00000-of-00017.parquet\",\n
+    \                         \"size\": 465549291\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00001-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00001-of-00017.parquet\",\n
+    \                         \"size\": 465701535\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00002-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00002-of-00017.parquet\",\n
+    \                         \"size\": 463857123\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00003-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00003-of-00017.parquet\",\n
+    \                         \"size\": 456197486\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00004-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00004-of-00017.parquet\",\n
+    \                         \"size\": 465412051\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00005-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00005-of-00017.parquet\",\n
+    \                         \"size\": 469114305\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00006-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00006-of-00017.parquet\",\n
+    \                         \"size\": 460338645\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00007-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00007-of-00017.parquet\",\n
+    \                         \"size\": 468309376\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00008-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00008-of-00017.parquet\",\n
+    \                         \"size\": 490063121\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00009-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00009-of-00017.parquet\",\n
+    \                         \"size\": 460462764\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00010-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00010-of-00017.parquet\",\n
+    \                         \"size\": 476525998\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00011-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00011-of-00017.parquet\",\n
+    \                         \"size\": 470327354\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00012-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00012-of-00017.parquet\",\n
+    \                         \"size\": 457138334\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00013-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00013-of-00017.parquet\",\n
+    \                         \"size\": 464485292\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00014-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00014-of-00017.parquet\",\n
+    \                         \"size\": 466549376\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00015-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00015-of-00017.parquet\",\n
+    \                         \"size\": 460452174\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00016-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00016-of-00017.parquet\",\n
+    \                         \"size\": 480583533\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
     \"alexandrainst--danish-wit\",\n                          \"split\": \"val\",\n
-    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-val.parquet\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-val.parquet\",\n
     \                         \"filename\": \"parquet-val.parquet\",\n                          \"size\":
-    11437355\n                        }\n                      ]\n                    }\n
-    \                 }\n                }\n              }\n            }\n          },\n
-    \         \"401\": {\n            \"description\": \"If the external authentication
-    step on the Hugging Face Hub failed, and no authentication mechanism has been
-    provided. Retry with authentication.\",\n            \"headers\": {\n              \"Cache-Control\":
-    {\n                \"$ref\": \"#/components/headers/Cache-Control\"\n              },\n
-    \             \"Access-Control-Allow-Origin\": {\n                \"$ref\": \"#/components/headers/Access-Control-Allow-Origin\"\n
-    \             },\n              \"X-Error-Code\": {\n                \"$ref\":
-    \"#/components/headers/X-Error-Code-splits-401\"\n              }\n            },\n
-    \           \"content\": {\n              \"application/json\": {\n                \"schema\":
-    {\n                  \"$ref\": \"#/components/schemas/CustomError\"\n                },\n
-    \               \"examples\": {\n                  \"inexistent-dataset\": {\n
-    \                   \"summary\": \"The dataset does not exist.\",\n                    \"value\":
+    11434278\n                        }\n                      ],\n                      \"pending\":
+    [],\n                      \"failed\": [],\n                      \"partial\":
+    false\n                    }\n                  }\n                }\n              }\n
+    \           }\n          },\n          \"401\": {\n            \"description\":
+    \"If the external authentication step on the Hugging Face Hub failed, and no authentication
+    mechanism has been provided. Retry with authentication.\",\n            \"headers\":
+    {\n              \"Cache-Control\": {\n                \"$ref\": \"#/components/headers/Cache-Control\"\n
+    \             },\n              \"Access-Control-Allow-Origin\": {\n                \"$ref\":
+    \"#/components/headers/Access-Control-Allow-Origin\"\n              },\n              \"X-Error-Code\":
+    {\n                \"$ref\": \"#/components/headers/X-Error-Code-splits-401\"\n
+    \             }\n            },\n            \"content\": {\n              \"application/json\":
+    {\n                \"schema\": {\n                  \"$ref\": \"#/components/schemas/CustomError\"\n
+    \               },\n                \"examples\": {\n                  \"inexistent-dataset\":
+    {\n                    \"summary\": \"The dataset does not exist.\",\n                    \"value\":
     {\n                      \"error\": \"The dataset does not exist, or is not accessible
     without authentication (private or gated). Please check the spelling of the dataset
     name or retry with authentication.\"\n                    }\n                  },\n

===== apps/Deployment datasets-server/staging-datasets-server-reverse-proxy ======
--- /tmp/argocd-diff2621206676/staging-datasets-server-reverse-proxy-live.yaml	2023-07-28 12:21:37.220215604 +0000
+++ /tmp/argocd-diff2621206676/staging-datasets-server-reverse-proxy	2023-07-28 12:21:37.216215546 +0000
@@ -310,7 +310,7 @@
   template:
     metadata:
       annotations:
-        checksum/config: d7789621e8199bcb372a075df080cd1b28a575398fe9d4d7f24bf15fe4deae88
+        checksum/config: 8df48e1daa0f62d91288fa498f3782734c27b00a62ad2c912418032c20ee0df7
         co.elastic.logs/json.expand_keys: "true"
       creationTimestamp: null
       labels:

Legend Status
The app is synced in ArgoCD, and diffs you see are solely from this PR.
⚠️ The app is out-of-sync in ArgoCD, and the diffs you see include those changes plus any from this PR.
🛑 There was an error generating the ArgoCD diffs due to changes in this PR.

@lhoestq
Copy link
Member Author

lhoestq commented Jul 28, 2023

  • added comments
  • moved the clean_mongo_object function to the root function
    • it's applied on all the nested mongo objects (dicts and lists)
  • added tests to cover the case when features are present
  • update openapi

@lhoestq lhoestq merged commit e792862 into main Jul 28, 2023
18 checks passed
@lhoestq lhoestq deleted the use-cached-features-in-rows-endpoint branch July 28, 2023 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

/rows returns null images for some datasets
3 participants