Use cached features in /rows #1573

lhoestq · 2023-07-27T16:09:28Z

/rows needs the cached features since they're not always available in the parquet metadata.
This was causing some Image columns to be seen as a struct of binary data, which are not supported in the viewer (shown as "null").

Therefore I'm now passing the features from config-parquet-and-info to config-parquet and then to config-parquet-metadata. I kept it backward compatible in case a cached value doesn't have this field yet.

Therefore there's no need for a mongo migration. We can just re-run all the config-parquet and config-parquet-metadata jobs. I incremented their versions.

close #1421

codecov-commenter · 2023-07-27T16:13:16Z

Codecov Report

Patch coverage: 92.50% and project coverage change: -1.73% ⚠️

Comparison is base (f0b5992) 92.17% compared to head (d68e8be) 90.44%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1573      +/-   ##
==========================================
- Coverage   92.17%   90.44%   -1.73%     
==========================================
  Files          77      193     +116     
  Lines        5443    12027    +6584     
==========================================
+ Hits         5017    10878    +5861     
- Misses        426     1149     +723

Flag	Coverage Δ
jobs_cache_maintenance	`99.08% <ø> (?)`
jobs_mongodb_migration	`85.07% <ø> (?)`
libs_libcommon	`91.55% <82.35%> (?)`
services_admin	`85.87% <ø> (?)`
services_api	`88.06% <ø> (?)`
services_rows	`83.45% <100.00%> (?)`
services_worker	`92.18% <100.00%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed	Coverage Δ
...s/worker/src/worker/job_runners/dataset/parquet.py	`82.35% <ø> (ø)`
...orker/job_runners/split/first_rows_from_parquet.py	`95.08% <ø> (ø)`
...s/worker/tests/job_runners/dataset/test_parquet.py	`100.00% <ø> (ø)`
libs/libcommon/src/libcommon/simple_cache.py	`91.80% <80.00%> (ø)`
libs/libcommon/src/libcommon/constants.py	`100.00% <100.00%> (ø)`
services/rows/src/rows/routes/rows.py	`61.70% <100.00%> (ø)`
services/rows/tests/routes/test_rows.py	`92.85% <100.00%> (ø)`
services/worker/src/worker/dtos.py	`99.23% <100.00%> (+0.01%)`	⬆️
...es/worker/src/worker/job_runners/config/parquet.py	`100.00% <100.00%> (ø)`
.../src/worker/job_runners/config/parquet_metadata.py	`94.54% <100.00%> (+0.42%)`	⬆️
... and 2 more

... and 112 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

libs/libcommon/src/libcommon/parquet_utils.py

services/worker/src/worker/job_runners/config/parquet.py

services/worker/src/worker/job_runners/config/parquet_metadata.py

severo · 2023-07-27T17:23:48Z

services/worker/src/worker/job_runners/dataset/parquet.py

-                parquet_files=response["content"]["parquet_files"], partial=response["content"]["partial"]
+                parquet_files=response["content"]["parquet_files"],
+                partial=response["content"]["partial"],
+                features=None,


what is it used for? better not to provide the field if it's always None

config_parquet_content is instantiated for mypy typing I believe, but we don't return the features anyway in DatasetParquetResponse so no need to specify it.

I'll add a comment

severo · 2023-07-27T17:24:25Z

services/worker/tests/job_runners/config/test_parquet.py

maybe we could test the other branch of the if/else

services/rows/src/rows/routes/rows.py

severo · 2023-07-27T17:33:58Z

libs/libcommon/src/libcommon/parquet_utils.py

@@ -30,6 +31,18 @@ class FileSystemError(Exception):
    pass


+def _clean_mongo_objects(obj: Any) -> Any:


maybe move this to the root function in simple_cache.py that gets the content out of the MongoDB?

libs/libcommon/src/libcommon/parquet_utils.py

severo · 2023-07-27T17:37:51Z

also: I think we have to update the openapi spec

github-actions · 2023-07-28T10:50:23Z

ArgoCD Diff for commit `616c618`

Updated at 7/28/2023, 2:21:37 PM CEST

App: datasets-server-prod
YAML generation: Success 🟢
App sync status: Out of Sync ⚠️

===== /ConfigMap datasets-server/prod-datasets-server-reverse-proxy ======
--- /tmp/argocd-diff3500159394/prod-datasets-server-reverse-proxy-live.yaml	2023-07-28 12:21:35.992197591 +0000
+++ /tmp/argocd-diff3500159394/prod-datasets-server-reverse-proxy	2023-07-28 12:21:35.968197239 +0000
@@ -1580,14 +1580,21 @@
     },\n            \"examples\": {\n              \"glue\": { \"summary\": \"a canonical
     dataset\", \"value\": \"glue\" },\n              \"Helsinki-NLP/tatoeba_mt\":
     {\n                \"summary\": \"a namespaced dataset\",\n                \"value\":
-    \"Helsinki-NLP/tatoeba_mt\"\n              }\n            }\n          }\n        ],\n
-    \       \"responses\": {\n          \"200\": {\n            \"description\": \"A
-    list of parquet files.</br>Beware: the response is not paginated.\",\n            \"headers\":
-    {\n              \"Cache-Control\": { \"$ref\": \"#/components/headers/Cache-Control\"
-    },\n              \"Access-Control-Allow-Origin\": {\n                \"$ref\":
-    \"#/components/headers/Access-Control-Allow-Origin\"\n              }\n            },\n
-    \           \"content\": {\n              \"application/json\": {\n                \"schema\":
-    {\n                  \"$ref\": \"#/components/schemas/ParquetFilesResponse\"\n
+    \"Helsinki-NLP/tatoeba_mt\"\n              }\n            }\n          },\n          {\n
+    \           \"name\": \"config\",\n            \"in\": \"query\",\n            \"description\":
+    \"The dataset configuration (or subset).\",\n            \"required\": false,\n
+    \           \"schema\": { \"type\": \"string\" },\n            \"examples\": {\n
+    \             \"cola\": {\n                \"summary\": \"a subset of the glue
+    dataset\",\n                \"value\": \"cola\"\n              },\n              \"yangdong/ecqa\":
+    {\n                \"summary\": \"the default configuration given by the \U0001F917
+    Datasets library\",\n                \"value\": \"yangdong--ecqa\"\n              }\n
+    \           }\n          }\n        ],\n        \"responses\": {\n          \"200\":
+    {\n            \"description\": \"A list of parquet files.</br>Beware: the response
+    is not paginated.\",\n            \"headers\": {\n              \"Cache-Control\":
+    { \"$ref\": \"#/components/headers/Cache-Control\" },\n              \"Access-Control-Allow-Origin\":
+    {\n                \"$ref\": \"#/components/headers/Access-Control-Allow-Origin\"\n
+    \             }\n            },\n            \"content\": {\n              \"application/json\":
+    {\n                \"schema\": {\n                  \"$ref\": \"#/components/schemas/ParquetFilesResponse\"\n
     \               },\n                \"examples\": {\n                  \"duorc\":
     {\n                    \"summary\": \"duorc: six parquet files, one per split\",\n
     \                   \"value\": {\n                      \"parquet_files\": [\n
@@ -1615,77 +1622,152 @@
     \"duorc\",\n                          \"config\": \"SelfRC\",\n                          \"split\":
     \"validation\",\n                          \"url\": \"https://huggingface.co/datasets/duorc/resolve/refs%2Fconvert%2Fparquet/SelfRC/duorc-validation.parquet\",\n
     \                         \"filename\": \"duorc-validation.parquet\",\n                          \"size\":
-    3114389\n                        }\n                      ]\n                    }\n
-    \                 },\n                  \"sharded\": {\n                    \"summary\":
-    \"alexandrainst/danish-wit: the parquet file for the train split is partitioned
-    into 9 shards\",\n                    \"value\": {\n                      \"parquet_files\":
-    [\n                        {\n                          \"dataset\": \"alexandrainst/danish-wit\",\n
-    \                         \"config\": \"alexandrainst--danish-wit\",\n                          \"split\":
-    \"test\",\n                          \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-test.parquet\",\n
+    3114389\n                        }\n                      ],\n                      \"pending\":
+    [],\n                      \"failed\": [],\n                      \"partial\":
+    false\n                    }\n                  },\n                  \"duorc
+    with ParaphraseRC config\": {\n                    \"summary\": \"duorc: three
+    parquet files for ParaphraseRC, one per split\",\n                    \"value\":
+    {\n                      \"parquet_files\": [\n                        {\n                          \"dataset\":
+    \"duorc\",\n                          \"config\": \"ParaphraseRC\",\n                          \"split\":
+    \"test\",\n                          \"url\": \"https://huggingface.co/datasets/duorc/resolve/refs%2Fconvert%2Fparquet/ParaphraseRC/duorc-test.parquet\",\n
+    \                         \"filename\": \"duorc-test.parquet\",\n                          \"size\":
+    6136590\n                        },\n                        {\n                          \"dataset\":
+    \"duorc\",\n                          \"config\": \"ParaphraseRC\",\n                          \"split\":
+    \"train\",\n                          \"url\": \"https://huggingface.co/datasets/duorc/resolve/refs%2Fconvert%2Fparquet/ParaphraseRC/duorc-train.parquet\",\n
+    \                         \"filename\": \"duorc-train.parquet\",\n                          \"size\":
+    26005667\n                        },\n                        {\n                          \"dataset\":
+    \"duorc\",\n                          \"config\": \"ParaphraseRC\",\n                          \"split\":
+    \"validation\",\n                          \"url\": \"https://huggingface.co/datasets/duorc/resolve/refs%2Fconvert%2Fparquet/ParaphraseRC/duorc-validation.parquet\",\n
+    \                         \"filename\": \"duorc-validation.parquet\",\n                          \"size\":
+    5566867\n                        }\n                      ],\n                      \"pending\":
+    [],\n                      \"failed\": [],\n                      \"partial\":
+    false,\n                      \"features\": {\n                        \"plot_id\":
+    {\n                          \"dtype\": \"string\",\n                          \"_type\":
+    \"Value\"\n                        },\n                        \"plot\": {\n                          \"dtype\":
+    \"string\",\n                          \"_type\": \"Value\"\n                        },\n
+    \                       \"title\": {\n                          \"dtype\": \"string\",\n
+    \                         \"_type\": \"Value\"\n                        },\n                        \"question_id\":
+    {\n                          \"dtype\": \"string\",\n                          \"_type\":
+    \"Value\"\n                        },\n                        \"question\": {\n
+    \                         \"dtype\": \"string\",\n                          \"_type\":
+    \"Value\"\n                        },\n                        \"answers\": {\n
+    \                         \"feature\": {\n                            \"dtype\":
+    \"string\",\n                            \"_type\": \"Value\"\n                          },\n
+    \                         \"_type\": \"Sequence\"\n                        },\n
+    \                       \"no_answer\": {\n                          \"dtype\":
+    \"bool\",\n                          \"_type\": \"Value\"\n                        }\n
+    \                     }\n                    }\n                  },\n                  \"sharded\":
+    {\n                    \"summary\": \"alexandrainst/da-wit: the parquet file for
+    the train split is partitioned into 9 shards\",\n                    \"value\":
+    {\n                      \"parquet_files\": [\n                        {\n                          \"dataset\":
+    \"alexandrainst/da-wit\",\n                          \"config\": \"alexandrainst--danish-wit\",\n
+    \                         \"split\": \"test\",\n                          \"url\":
+    \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-test.parquet\",\n
     \                         \"filename\": \"parquet-test.parquet\",\n                          \"size\":
-    48781933\n                        },\n                        {\n                          \"dataset\":
-    \"alexandrainst/danish-wit\",\n                          \"config\": \"alexandrainst--danish-wit\",\n
+    48684227\n                        },\n                        {\n                          \"dataset\":
+    \"alexandrainst/da-wit\",\n                          \"config\": \"alexandrainst--danish-wit\",\n
     \                         \"split\": \"train\",\n                          \"url\":
-    \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00000-of-00009.parquet\",\n
-    \                         \"filename\": \"parquet-train-00000-of-00009.parquet\",\n
-    \                         \"size\": 937127291\n                        },\n                        {\n
-    \                         \"dataset\": \"alexandrainst/danish-wit\",\n                          \"config\":
-    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
-    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00001-of-00009.parquet\",\n
-    \                         \"filename\": \"parquet-train-00001-of-00009.parquet\",\n
-    \                         \"size\": 925920565\n                        },\n                        {\n
-    \                         \"dataset\": \"alexandrainst/danish-wit\",\n                          \"config\":
-    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
-    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00002-of-00009.parquet\",\n
-    \                         \"filename\": \"parquet-train-00002-of-00009.parquet\",\n
-    \                         \"size\": 940390661\n                        },\n                        {\n
-    \                         \"dataset\": \"alexandrainst/danish-wit\",\n                          \"config\":
-    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
-    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00003-of-00009.parquet\",\n
-    \                         \"filename\": \"parquet-train-00003-of-00009.parquet\",\n
-    \                         \"size\": 934549621\n                        },\n                        {\n
-    \                         \"dataset\": \"alexandrainst/danish-wit\",\n                          \"config\":
-    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
-    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00004-of-00009.parquet\",\n
-    \                         \"filename\": \"parquet-train-00004-of-00009.parquet\",\n
-    \                         \"size\": 493004154\n                        },\n                        {\n
-    \                         \"dataset\": \"alexandrainst/danish-wit\",\n                          \"config\":
-    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
-    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00005-of-00009.parquet\",\n
-    \                         \"filename\": \"parquet-train-00005-of-00009.parquet\",\n
-    \                         \"size\": 942848888\n                        },\n                        {\n
-    \                         \"dataset\": \"alexandrainst/danish-wit\",\n                          \"config\":
-    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
-    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00006-of-00009.parquet\",\n
-    \                         \"filename\": \"parquet-train-00006-of-00009.parquet\",\n
-    \                         \"size\": 933373843\n                        },\n                        {\n
-    \                         \"dataset\": \"alexandrainst/danish-wit\",\n                          \"config\":
-    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
-    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00007-of-00009.parquet\",\n
-    \                         \"filename\": \"parquet-train-00007-of-00009.parquet\",\n
-    \                         \"size\": 936939176\n                        },\n                        {\n
-    \                         \"dataset\": \"alexandrainst/danish-wit\",\n                          \"config\":
-    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
-    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00008-of-00009.parquet\",\n
-    \                         \"filename\": \"parquet-train-00008-of-00009.parquet\",\n
-    \                         \"size\": 946933048\n                        },\n                        {\n
-    \                         \"dataset\": \"alexandrainst/danish-wit\",\n                          \"config\":
+    \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00000-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00000-of-00017.parquet\",\n
+    \                         \"size\": 465549291\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00001-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00001-of-00017.parquet\",\n
+    \                         \"size\": 465701535\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00002-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00002-of-00017.parquet\",\n
+    \                         \"size\": 463857123\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00003-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00003-of-00017.parquet\",\n
+    \                         \"size\": 456197486\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00004-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00004-of-00017.parquet\",\n
+    \                         \"size\": 465412051\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00005-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00005-of-00017.parquet\",\n
+    \                         \"size\": 469114305\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00006-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00006-of-00017.parquet\",\n
+    \                         \"size\": 460338645\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00007-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00007-of-00017.parquet\",\n
+    \                         \"size\": 468309376\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00008-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00008-of-00017.parquet\",\n
+    \                         \"size\": 490063121\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00009-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00009-of-00017.parquet\",\n
+    \                         \"size\": 460462764\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00010-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00010-of-00017.parquet\",\n
+    \                         \"size\": 476525998\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00011-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00011-of-00017.parquet\",\n
+    \                         \"size\": 470327354\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00012-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00012-of-00017.parquet\",\n
+    \                         \"size\": 457138334\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00013-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00013-of-00017.parquet\",\n
+    \                         \"size\": 464485292\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00014-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00014-of-00017.parquet\",\n
+    \                         \"size\": 466549376\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00015-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00015-of-00017.parquet\",\n
+    \                         \"size\": 460452174\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00016-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00016-of-00017.parquet\",\n
+    \                         \"size\": 480583533\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
     \"alexandrainst--danish-wit\",\n                          \"split\": \"val\",\n
-    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-val.parquet\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-val.parquet\",\n
     \                         \"filename\": \"parquet-val.parquet\",\n                          \"size\":
-    11437355\n                        }\n                      ]\n                    }\n
-    \                 }\n                }\n              }\n            }\n          },\n
-    \         \"401\": {\n            \"description\": \"If the external authentication
-    step on the Hugging Face Hub failed, and no authentication mechanism has been
-    provided. Retry with authentication.\",\n            \"headers\": {\n              \"Cache-Control\":
-    {\n                \"$ref\": \"#/components/headers/Cache-Control\"\n              },\n
-    \             \"Access-Control-Allow-Origin\": {\n                \"$ref\": \"#/components/headers/Access-Control-Allow-Origin\"\n
-    \             },\n              \"X-Error-Code\": {\n                \"$ref\":
-    \"#/components/headers/X-Error-Code-splits-401\"\n              }\n            },\n
-    \           \"content\": {\n              \"application/json\": {\n                \"schema\":
-    {\n                  \"$ref\": \"#/components/schemas/CustomError\"\n                },\n
-    \               \"examples\": {\n                  \"inexistent-dataset\": {\n
-    \                   \"summary\": \"The dataset does not exist.\",\n                    \"value\":
+    11434278\n                        }\n                      ],\n                      \"pending\":
+    [],\n                      \"failed\": [],\n                      \"partial\":
+    false\n                    }\n                  }\n                }\n              }\n
+    \           }\n          },\n          \"401\": {\n            \"description\":
+    \"If the external authentication step on the Hugging Face Hub failed, and no authentication
+    mechanism has been provided. Retry with authentication.\",\n            \"headers\":
+    {\n              \"Cache-Control\": {\n                \"$ref\": \"#/components/headers/Cache-Control\"\n
+    \             },\n              \"Access-Control-Allow-Origin\": {\n                \"$ref\":
+    \"#/components/headers/Access-Control-Allow-Origin\"\n              },\n              \"X-Error-Code\":
+    {\n                \"$ref\": \"#/components/headers/X-Error-Code-splits-401\"\n
+    \             }\n            },\n            \"content\": {\n              \"application/json\":
+    {\n                \"schema\": {\n                  \"$ref\": \"#/components/schemas/CustomError\"\n
+    \               },\n                \"examples\": {\n                  \"inexistent-dataset\":
+    {\n                    \"summary\": \"The dataset does not exist.\",\n                    \"value\":
     {\n                      \"error\": \"The dataset does not exist, or is not accessible
     without authentication (private or gated). Please check the spelling of the dataset
     name or retry with authentication.\"\n                    }\n                  },\n

===== apps/Deployment datasets-server/prod-datasets-server-admin ======
--- /tmp/argocd-diff814272897/prod-datasets-server-admin-live.yaml	2023-07-28 12:21:36.064198647 +0000
+++ /tmp/argocd-diff814272897/prod-datasets-server-admin	2023-07-28 12:21:36.060198588 +0000
@@ -480,7 +480,7 @@
           value: "9"
         - name: ADMIN_UVICORN_PORT
           value: "8080"
-        image: huggingface/datasets-server-services-admin:sha-fa7c7e8
+        image: huggingface/datasets-server-services-admin:sha-f0b5992
         imagePullPolicy: IfNotPresent
         livenessProbe:
           failureThreshold: 30

===== apps/Deployment datasets-server/prod-datasets-server-api ======
--- /tmp/argocd-diff174236318/prod-datasets-server-api-live.yaml	2023-07-28 12:21:36.080198881 +0000
+++ /tmp/argocd-diff174236318/prod-datasets-server-api	2023-07-28 12:21:36.076198823 +0000
@@ -379,7 +379,7 @@
           value: "9"
         - name: API_UVICORN_PORT
           value: "8080"
-        image: huggingface/datasets-server-services-api:sha-fa7c7e8
+        image: huggingface/datasets-server-services-api:sha-f0b5992
         imagePullPolicy: IfNotPresent
         livenessProbe:
           failureThreshold: 30

===== apps/Deployment datasets-server/prod-datasets-server-reverse-proxy ======
--- /tmp/argocd-diff1766863281/prod-datasets-server-reverse-proxy-live.yaml	2023-07-28 12:21:36.100199175 +0000
+++ /tmp/argocd-diff1766863281/prod-datasets-server-reverse-proxy	2023-07-28 12:21:36.096199116 +0000
@@ -337,7 +337,7 @@
   template:
     metadata:
       annotations:
-        checksum/config: 0a378473267f8c4dbdd9adee6e764a61b456827dc895071b155ed313ba7980f9
+        checksum/config: 21715e69f619c39084d57b84024ee0d7313f486b1a4603809f242be50d431b31
         co.elastic.logs/json.expand_keys: "true"
       creationTimestamp: null
       labels:

===== apps/Deployment datasets-server/prod-datasets-server-rows ======
--- /tmp/argocd-diff4213267921/prod-datasets-server-rows-live.yaml	2023-07-28 12:21:36.124199527 +0000
+++ /tmp/argocd-diff4213267921/prod-datasets-server-rows	2023-07-28 12:21:36.120199468 +0000
@@ -444,7 +444,7 @@
           value: "9"
         - name: API_UVICORN_PORT
           value: "8082"
-        image: huggingface/datasets-server-services-rows:sha-fa7c7e8
+        image: huggingface/datasets-server-services-rows:sha-f0b5992
         imagePullPolicy: IfNotPresent
         livenessProbe:
           failureThreshold: 30

===== apps/Deployment datasets-server/prod-datasets-server-worker-all ======
--- /tmp/argocd-diff2729663882/prod-datasets-server-worker-all-live.yaml	2023-07-28 12:21:36.168200172 +0000
+++ /tmp/argocd-diff2729663882/prod-datasets-server-worker-all	2023-07-28 12:21:36.164200114 +0000
@@ -711,7 +711,7 @@
           value: "0"
         - name: WORKER_JOB_TYPES_BLOCKED
         - name: WORKER_JOB_TYPES_ONLY
-        image: huggingface/datasets-server-services-worker:sha-fa7c7e8
+        image: huggingface/datasets-server-services-worker:sha-f0b5992
         imagePullPolicy: IfNotPresent
         name: prod-datasets-server-worker
         resources:

===== apps/Deployment datasets-server/prod-datasets-server-worker-light ======
--- /tmp/argocd-diff1340572032/prod-datasets-server-worker-light-live.yaml	2023-07-28 12:21:36.200200642 +0000
+++ /tmp/argocd-diff1340572032/prod-datasets-server-worker-light	2023-07-28 12:21:36.196200583 +0000
@@ -711,7 +711,7 @@
           value: "0"
         - name: WORKER_JOB_TYPES_BLOCKED
         - name: WORKER_JOB_TYPES_ONLY
-        image: huggingface/datasets-server-services-worker:sha-fa7c7e8
+        image: huggingface/datasets-server-services-worker:sha-f0b5992
         imagePullPolicy: IfNotPresent
         name: prod-datasets-server-worker
         resources:

===== batch/CronJob datasets-server/prod-datasets-server-job-backfill ======
--- /tmp/argocd-diff3480783511/prod-datasets-server-job-backfill-live.yaml	2023-07-28 12:21:36.216200876 +0000
+++ /tmp/argocd-diff3480783511/prod-datasets-server-job-backfill	2023-07-28 12:21:36.216200876 +0000
@@ -203,7 +203,7 @@
               value: CreateCommitError,LockedDatasetTimeoutError
             - name: LOG_LEVEL
               value: debug
-            image: huggingface/datasets-server-jobs-cache_maintenance:sha-fa7c7e8
+            image: huggingface/datasets-server-jobs-cache_maintenance:sha-f0b5992
             imagePullPolicy: IfNotPresent
             name: prod-datasets-server-backfill
             resources:

===== batch/CronJob datasets-server/prod-datasets-server-job-metrics-collector ======
--- /tmp/argocd-diff791710071/prod-datasets-server-job-metrics-collector-live.yaml	2023-07-28 12:21:36.228201053 +0000
+++ /tmp/argocd-diff791710071/prod-datasets-server-job-metrics-collector	2023-07-28 12:21:36.228201053 +0000
@@ -197,7 +197,7 @@
                   optional: false
             - name: CACHE_MAINTENANCE_ACTION
               value: collect-metrics
-            image: huggingface/datasets-server-jobs-cache_maintenance:sha-fa7c7e8
+            image: huggingface/datasets-server-jobs-cache_maintenance:sha-f0b5992
             imagePullPolicy: IfNotPresent
             name: prod-datasets-server-metrics-collector
             resources:

App: datasets-server-staging
YAML generation: Success 🟢
App sync status: Synced ✅

===== /ConfigMap datasets-server/staging-datasets-server-reverse-proxy ======
--- /tmp/argocd-diff2263708852/staging-datasets-server-reverse-proxy-live.yaml	2023-07-28 12:21:37.156214665 +0000
+++ /tmp/argocd-diff2263708852/staging-datasets-server-reverse-proxy	2023-07-28 12:21:37.132214313 +0000
@@ -1580,14 +1580,21 @@
     },\n            \"examples\": {\n              \"glue\": { \"summary\": \"a canonical
     dataset\", \"value\": \"glue\" },\n              \"Helsinki-NLP/tatoeba_mt\":
     {\n                \"summary\": \"a namespaced dataset\",\n                \"value\":
-    \"Helsinki-NLP/tatoeba_mt\"\n              }\n            }\n          }\n        ],\n
-    \       \"responses\": {\n          \"200\": {\n            \"description\": \"A
-    list of parquet files.</br>Beware: the response is not paginated.\",\n            \"headers\":
-    {\n              \"Cache-Control\": { \"$ref\": \"#/components/headers/Cache-Control\"
-    },\n              \"Access-Control-Allow-Origin\": {\n                \"$ref\":
-    \"#/components/headers/Access-Control-Allow-Origin\"\n              }\n            },\n
-    \           \"content\": {\n              \"application/json\": {\n                \"schema\":
-    {\n                  \"$ref\": \"#/components/schemas/ParquetFilesResponse\"\n
+    \"Helsinki-NLP/tatoeba_mt\"\n              }\n            }\n          },\n          {\n
+    \           \"name\": \"config\",\n            \"in\": \"query\",\n            \"description\":
+    \"The dataset configuration (or subset).\",\n            \"required\": false,\n
+    \           \"schema\": { \"type\": \"string\" },\n            \"examples\": {\n
+    \             \"cola\": {\n                \"summary\": \"a subset of the glue
+    dataset\",\n                \"value\": \"cola\"\n              },\n              \"yangdong/ecqa\":
+    {\n                \"summary\": \"the default configuration given by the \U0001F917
+    Datasets library\",\n                \"value\": \"yangdong--ecqa\"\n              }\n
+    \           }\n          }\n        ],\n        \"responses\": {\n          \"200\":
+    {\n            \"description\": \"A list of parquet files.</br>Beware: the response
+    is not paginated.\",\n            \"headers\": {\n              \"Cache-Control\":
+    { \"$ref\": \"#/components/headers/Cache-Control\" },\n              \"Access-Control-Allow-Origin\":
+    {\n                \"$ref\": \"#/components/headers/Access-Control-Allow-Origin\"\n
+    \             }\n            },\n            \"content\": {\n              \"application/json\":
+    {\n                \"schema\": {\n                  \"$ref\": \"#/components/schemas/ParquetFilesResponse\"\n
     \               },\n                \"examples\": {\n                  \"duorc\":
     {\n                    \"summary\": \"duorc: six parquet files, one per split\",\n
     \                   \"value\": {\n                      \"parquet_files\": [\n
@@ -1615,77 +1622,152 @@
     \"duorc\",\n                          \"config\": \"SelfRC\",\n                          \"split\":
     \"validation\",\n                          \"url\": \"https://huggingface.co/datasets/duorc/resolve/refs%2Fconvert%2Fparquet/SelfRC/duorc-validation.parquet\",\n
     \                         \"filename\": \"duorc-validation.parquet\",\n                          \"size\":
-    3114389\n                        }\n                      ]\n                    }\n
-    \                 },\n                  \"sharded\": {\n                    \"summary\":
-    \"alexandrainst/danish-wit: the parquet file for the train split is partitioned
-    into 9 shards\",\n                    \"value\": {\n                      \"parquet_files\":
-    [\n                        {\n                          \"dataset\": \"alexandrainst/danish-wit\",\n
-    \                         \"config\": \"alexandrainst--danish-wit\",\n                          \"split\":
-    \"test\",\n                          \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-test.parquet\",\n
+    3114389\n                        }\n                      ],\n                      \"pending\":
+    [],\n                      \"failed\": [],\n                      \"partial\":
+    false\n                    }\n                  },\n                  \"duorc
+    with ParaphraseRC config\": {\n                    \"summary\": \"duorc: three
+    parquet files for ParaphraseRC, one per split\",\n                    \"value\":
+    {\n                      \"parquet_files\": [\n                        {\n                          \"dataset\":
+    \"duorc\",\n                          \"config\": \"ParaphraseRC\",\n                          \"split\":
+    \"test\",\n                          \"url\": \"https://huggingface.co/datasets/duorc/resolve/refs%2Fconvert%2Fparquet/ParaphraseRC/duorc-test.parquet\",\n
+    \                         \"filename\": \"duorc-test.parquet\",\n                          \"size\":
+    6136590\n                        },\n                        {\n                          \"dataset\":
+    \"duorc\",\n                          \"config\": \"ParaphraseRC\",\n                          \"split\":
+    \"train\",\n                          \"url\": \"https://huggingface.co/datasets/duorc/resolve/refs%2Fconvert%2Fparquet/ParaphraseRC/duorc-train.parquet\",\n
+    \                         \"filename\": \"duorc-train.parquet\",\n                          \"size\":
+    26005667\n                        },\n                        {\n                          \"dataset\":
+    \"duorc\",\n                          \"config\": \"ParaphraseRC\",\n                          \"split\":
+    \"validation\",\n                          \"url\": \"https://huggingface.co/datasets/duorc/resolve/refs%2Fconvert%2Fparquet/ParaphraseRC/duorc-validation.parquet\",\n
+    \                         \"filename\": \"duorc-validation.parquet\",\n                          \"size\":
+    5566867\n                        }\n                      ],\n                      \"pending\":
+    [],\n                      \"failed\": [],\n                      \"partial\":
+    false,\n                      \"features\": {\n                        \"plot_id\":
+    {\n                          \"dtype\": \"string\",\n                          \"_type\":
+    \"Value\"\n                        },\n                        \"plot\": {\n                          \"dtype\":
+    \"string\",\n                          \"_type\": \"Value\"\n                        },\n
+    \                       \"title\": {\n                          \"dtype\": \"string\",\n
+    \                         \"_type\": \"Value\"\n                        },\n                        \"question_id\":
+    {\n                          \"dtype\": \"string\",\n                          \"_type\":
+    \"Value\"\n                        },\n                        \"question\": {\n
+    \                         \"dtype\": \"string\",\n                          \"_type\":
+    \"Value\"\n                        },\n                        \"answers\": {\n
+    \                         \"feature\": {\n                            \"dtype\":
+    \"string\",\n                            \"_type\": \"Value\"\n                          },\n
+    \                         \"_type\": \"Sequence\"\n                        },\n
+    \                       \"no_answer\": {\n                          \"dtype\":
+    \"bool\",\n                          \"_type\": \"Value\"\n                        }\n
+    \                     }\n                    }\n                  },\n                  \"sharded\":
+    {\n                    \"summary\": \"alexandrainst/da-wit: the parquet file for
+    the train split is partitioned into 9 shards\",\n                    \"value\":
+    {\n                      \"parquet_files\": [\n                        {\n                          \"dataset\":
+    \"alexandrainst/da-wit\",\n                          \"config\": \"alexandrainst--danish-wit\",\n
+    \                         \"split\": \"test\",\n                          \"url\":
+    \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-test.parquet\",\n
     \                         \"filename\": \"parquet-test.parquet\",\n                          \"size\":
-    48781933\n                        },\n                        {\n                          \"dataset\":
-    \"alexandrainst/danish-wit\",\n                          \"config\": \"alexandrainst--danish-wit\",\n
+    48684227\n                        },\n                        {\n                          \"dataset\":
+    \"alexandrainst/da-wit\",\n                          \"config\": \"alexandrainst--danish-wit\",\n
     \                         \"split\": \"train\",\n                          \"url\":
-    \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00000-of-00009.parquet\",\n
-    \                         \"filename\": \"parquet-train-00000-of-00009.parquet\",\n
-    \                         \"size\": 937127291\n                        },\n                        {\n
-    \                         \"dataset\": \"alexandrainst/danish-wit\",\n                          \"config\":
-    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
-    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00001-of-00009.parquet\",\n
-    \                         \"filename\": \"parquet-train-00001-of-00009.parquet\",\n
-    \                         \"size\": 925920565\n                        },\n                        {\n
-    \                         \"dataset\": \"alexandrainst/danish-wit\",\n                          \"config\":
-    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
-    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00002-of-00009.parquet\",\n
-    \                         \"filename\": \"parquet-train-00002-of-00009.parquet\",\n
-    \                         \"size\": 940390661\n                        },\n                        {\n
-    \                         \"dataset\": \"alexandrainst/danish-wit\",\n                          \"config\":
-    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
-    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00003-of-00009.parquet\",\n
-    \                         \"filename\": \"parquet-train-00003-of-00009.parquet\",\n
-    \                         \"size\": 934549621\n                        },\n                        {\n
-    \                         \"dataset\": \"alexandrainst/danish-wit\",\n                          \"config\":
-    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
-    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00004-of-00009.parquet\",\n
-    \                         \"filename\": \"parquet-train-00004-of-00009.parquet\",\n
-    \                         \"size\": 493004154\n                        },\n                        {\n
-    \                         \"dataset\": \"alexandrainst/danish-wit\",\n                          \"config\":
-    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
-    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00005-of-00009.parquet\",\n
-    \                         \"filename\": \"parquet-train-00005-of-00009.parquet\",\n
-    \                         \"size\": 942848888\n                        },\n                        {\n
-    \                         \"dataset\": \"alexandrainst/danish-wit\",\n                          \"config\":
-    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
-    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00006-of-00009.parquet\",\n
-    \                         \"filename\": \"parquet-train-00006-of-00009.parquet\",\n
-    \                         \"size\": 933373843\n                        },\n                        {\n
-    \                         \"dataset\": \"alexandrainst/danish-wit\",\n                          \"config\":
-    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
-    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00007-of-00009.parquet\",\n
-    \                         \"filename\": \"parquet-train-00007-of-00009.parquet\",\n
-    \                         \"size\": 936939176\n                        },\n                        {\n
-    \                         \"dataset\": \"alexandrainst/danish-wit\",\n                          \"config\":
-    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
-    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00008-of-00009.parquet\",\n
-    \                         \"filename\": \"parquet-train-00008-of-00009.parquet\",\n
-    \                         \"size\": 946933048\n                        },\n                        {\n
-    \                         \"dataset\": \"alexandrainst/danish-wit\",\n                          \"config\":
+    \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00000-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00000-of-00017.parquet\",\n
+    \                         \"size\": 465549291\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00001-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00001-of-00017.parquet\",\n
+    \                         \"size\": 465701535\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00002-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00002-of-00017.parquet\",\n
+    \                         \"size\": 463857123\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00003-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00003-of-00017.parquet\",\n
+    \                         \"size\": 456197486\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00004-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00004-of-00017.parquet\",\n
+    \                         \"size\": 465412051\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00005-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00005-of-00017.parquet\",\n
+    \                         \"size\": 469114305\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00006-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00006-of-00017.parquet\",\n
+    \                         \"size\": 460338645\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00007-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00007-of-00017.parquet\",\n
+    \                         \"size\": 468309376\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00008-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00008-of-00017.parquet\",\n
+    \                         \"size\": 490063121\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00009-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00009-of-00017.parquet\",\n
+    \                         \"size\": 460462764\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00010-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00010-of-00017.parquet\",\n
+    \                         \"size\": 476525998\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00011-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00011-of-00017.parquet\",\n
+    \                         \"size\": 470327354\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00012-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00012-of-00017.parquet\",\n
+    \                         \"size\": 457138334\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00013-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00013-of-00017.parquet\",\n
+    \                         \"size\": 464485292\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00014-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00014-of-00017.parquet\",\n
+    \                         \"size\": 466549376\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00015-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00015-of-00017.parquet\",\n
+    \                         \"size\": 460452174\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
+    \"alexandrainst--danish-wit\",\n                          \"split\": \"train\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-train-00016-of-00017.parquet\",\n
+    \                         \"filename\": \"parquet-train-00016-of-00017.parquet\",\n
+    \                         \"size\": 480583533\n                        },\n                        {\n
+    \                         \"dataset\": \"alexandrainst/da-wit\",\n                          \"config\":
     \"alexandrainst--danish-wit\",\n                          \"split\": \"val\",\n
-    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/danish-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-val.parquet\",\n
+    \                         \"url\": \"https://huggingface.co/datasets/alexandrainst/da-wit/resolve/refs%2Fconvert%2Fparquet/alexandrainst--danish-wit/parquet-val.parquet\",\n
     \                         \"filename\": \"parquet-val.parquet\",\n                          \"size\":
-    11437355\n                        }\n                      ]\n                    }\n
-    \                 }\n                }\n              }\n            }\n          },\n
-    \         \"401\": {\n            \"description\": \"If the external authentication
-    step on the Hugging Face Hub failed, and no authentication mechanism has been
-    provided. Retry with authentication.\",\n            \"headers\": {\n              \"Cache-Control\":
-    {\n                \"$ref\": \"#/components/headers/Cache-Control\"\n              },\n
-    \             \"Access-Control-Allow-Origin\": {\n                \"$ref\": \"#/components/headers/Access-Control-Allow-Origin\"\n
-    \             },\n              \"X-Error-Code\": {\n                \"$ref\":
-    \"#/components/headers/X-Error-Code-splits-401\"\n              }\n            },\n
-    \           \"content\": {\n              \"application/json\": {\n                \"schema\":
-    {\n                  \"$ref\": \"#/components/schemas/CustomError\"\n                },\n
-    \               \"examples\": {\n                  \"inexistent-dataset\": {\n
-    \                   \"summary\": \"The dataset does not exist.\",\n                    \"value\":
+    11434278\n                        }\n                      ],\n                      \"pending\":
+    [],\n                      \"failed\": [],\n                      \"partial\":
+    false\n                    }\n                  }\n                }\n              }\n
+    \           }\n          },\n          \"401\": {\n            \"description\":
+    \"If the external authentication step on the Hugging Face Hub failed, and no authentication
+    mechanism has been provided. Retry with authentication.\",\n            \"headers\":
+    {\n              \"Cache-Control\": {\n                \"$ref\": \"#/components/headers/Cache-Control\"\n
+    \             },\n              \"Access-Control-Allow-Origin\": {\n                \"$ref\":
+    \"#/components/headers/Access-Control-Allow-Origin\"\n              },\n              \"X-Error-Code\":
+    {\n                \"$ref\": \"#/components/headers/X-Error-Code-splits-401\"\n
+    \             }\n            },\n            \"content\": {\n              \"application/json\":
+    {\n                \"schema\": {\n                  \"$ref\": \"#/components/schemas/CustomError\"\n
+    \               },\n                \"examples\": {\n                  \"inexistent-dataset\":
+    {\n                    \"summary\": \"The dataset does not exist.\",\n                    \"value\":
     {\n                      \"error\": \"The dataset does not exist, or is not accessible
     without authentication (private or gated). Please check the spelling of the dataset
     name or retry with authentication.\"\n                    }\n                  },\n

===== apps/Deployment datasets-server/staging-datasets-server-reverse-proxy ======
--- /tmp/argocd-diff2621206676/staging-datasets-server-reverse-proxy-live.yaml	2023-07-28 12:21:37.220215604 +0000
+++ /tmp/argocd-diff2621206676/staging-datasets-server-reverse-proxy	2023-07-28 12:21:37.216215546 +0000
@@ -310,7 +310,7 @@
   template:
     metadata:
       annotations:
-        checksum/config: d7789621e8199bcb372a075df080cd1b28a575398fe9d4d7f24bf15fe4deae88
+        checksum/config: 8df48e1daa0f62d91288fa498f3782734c27b00a62ad2c912418032c20ee0df7
         co.elastic.logs/json.expand_keys: "true"
       creationTimestamp: null
       labels:

Legend	Status
✅	The app is synced in ArgoCD, and diffs you see are solely from this PR.
⚠️	The app is out-of-sync in ArgoCD, and the diffs you see include those changes plus any from this PR.
🛑	There was an error generating the ArgoCD diffs due to changes in this PR.

lhoestq · 2023-07-28T12:24:42Z

added comments
moved the clean_mongo_object function to the root function
- it's applied on all the nested mongo objects (dicts and lists)
added tests to cover the case when features are present
update openapi

lhoestq added 2 commits July 27, 2023 18:08

use cached features in /rows

e58dc2d

style

6131ac4

mypy

f8ef075

lhoestq commented Jul 27, 2023

View reviewed changes

libs/libcommon/src/libcommon/parquet_utils.py Show resolved Hide resolved

fix

81fbbfa

lhoestq marked this pull request as ready for review July 27, 2023 17:19

lhoestq requested a review from severo July 27, 2023 17:19

severo approved these changes Jul 27, 2023

View reviewed changes

lhoestq added 5 commits July 28, 2023 12:04

comments

4d3466b

move _clean_nested_mongo_object

4f9e616

Merge branch 'main' into use-cached-features-in-rows-endpoint

2a3a769

update tests

8dcbd71

update openapi

d68e8be

minor

616c618

lhoestq merged commit e792862 into main Jul 28, 2023
18 checks passed

lhoestq deleted the use-cached-features-in-rows-endpoint branch July 28, 2023 12:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use cached features in /rows #1573

Use cached features in /rows #1573

lhoestq commented Jul 27, 2023 •

edited

Loading

codecov-commenter commented Jul 27, 2023 •

edited

Loading

severo Jul 27, 2023

lhoestq Jul 28, 2023

severo Jul 27, 2023

severo Jul 27, 2023

severo commented Jul 27, 2023

github-actions bot commented Jul 28, 2023 •

edited

Loading

lhoestq commented Jul 28, 2023

		@@ -30,6 +31,18 @@ class FileSystemError(Exception):
		pass


		def _clean_mongo_objects(obj: Any) -> Any:

Use cached features in /rows #1573

Use cached features in /rows #1573

Conversation

lhoestq commented Jul 27, 2023 • edited Loading

codecov-commenter commented Jul 27, 2023 • edited Loading

Codecov Report

severo Jul 27, 2023

Choose a reason for hiding this comment

lhoestq Jul 28, 2023

Choose a reason for hiding this comment

severo Jul 27, 2023

Choose a reason for hiding this comment

severo Jul 27, 2023

Choose a reason for hiding this comment

severo commented Jul 27, 2023

github-actions bot commented Jul 28, 2023 • edited Loading

ArgoCD Diff for commit 616c618

lhoestq commented Jul 28, 2023

lhoestq commented Jul 27, 2023 •

edited

Loading

codecov-commenter commented Jul 27, 2023 •

edited

Loading

github-actions bot commented Jul 28, 2023 •

edited

Loading

ArgoCD Diff for commit `616c618`