Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should Curator.validate() work with only read permissions? #2182

Open
Zethson opened this issue Nov 15, 2024 · 2 comments
Open

Should Curator.validate() work with only read permissions? #2182

Zethson opened this issue Nov 15, 2024 · 2 comments

Comments

@Zethson
Copy link
Member

Zethson commented Nov 15, 2024

Description of feature

Currently, we still default to read permission when collaborators are added to an instance.

I was running

adata = ad.AnnData(
    X=np.ones((2, 1000)),
    var=pd.DataFrame(index=range(1000)),
    obs=pd.DataFrame({"cell_type_author": ["Plasmablast", "cDC"]})
)

categoricals = {
    "cell_type_author": bt.CellType.name,
}
curator = ln.Curator.from_anndata(
    adata,
    var_index=bt.Gene.symbol,
    categoricals=categoricals, 
    organism="human",
)
curator.validate()

in Felix instance and was met with:

{
	"name": "ProgrammingError",
	"message": "permission denied for table bionty_source
",
	"stack": "---------------------------------------------------------------------------
InsufficientPrivilege                     Traceback (most recent call last)
File ~/miniconda3/envs/lamindb/lib/python3.11/site-packages/django/db/backends/utils.py:105, in CursorWrapper._execute(self, sql, params, *ignored_wrapper_args)
    104 else:
--> 105     return self.cursor.execute(sql, params)

InsufficientPrivilege: permission denied for table bionty_source


The above exception was the direct cause of the following exception:

ProgrammingError                          Traceback (most recent call last)
Cell In[3], line 16
      7 categoricals = {
      8     \"cell_type_author\": bt.CellType.name,
      9 }
     10 curator = ln.Curator.from_anndata(
     11     adata,
     12     var_index=bt.Gene.symbol,
     13     categoricals=categoricals, 
     14     organism=\"human\",
     15 )
---> 16 curator.validate()
     17 curator.add_new_from(\"cell_type_author\")

File ~/PycharmProjects/lamindb/lamindb/_curate.py:553, in AnnDataCurator.validate(self, organism)
    548     logger.important(
    549         f\"validating metadata using registries of instance {colors.italic(self._using_key)}\"
    550     )
    552 # add all validated records to the current instance
--> 553 self._update_registry_all()
    555 validated_var, non_validated_var = validate_categories(
    556     self._adata.var.index,
    557     field=self._var_field,
   (...)
    563     **self._kwargs,  # type: ignore
    564 )
    565 validated_obs, non_validated_obs = validate_categories_in_df(
    566     self._adata.obs,
    567     fields=self.categoricals,
   (...)
    571     **self._kwargs,
    572 )

File ~/PycharmProjects/lamindb/lamindb/_curate.py:523, in AnnDataCurator._update_registry_all(self, validated_only, **kwargs)
    521 def _update_registry_all(self, validated_only: bool = True, **kwargs):
    522     \"\"\"Save labels for all features.\"\"\"
--> 523     self._save_from_var_index(validated_only=validated_only, **self._kwargs)
    524     for name in self._obs_fields.keys():
    525         self._update_registry(name, validated_only=validated_only, **self._kwargs)

File ~/PycharmProjects/lamindb/lamindb/_curate.py:509, in AnnDataCurator._save_from_var_index(self, validated_only, organism)
    505 def _save_from_var_index(
    506     self, validated_only: bool = True, organism: str | None = None
    507 ):
    508     \"\"\"Save variable records.\"\"\"
--> 509     update_registry(
    510         values=list(self._adata.var.index),
    511         field=self.var_index,
    512         key=\"var_index\",
    513         save_function=\".add_new_from_var_index()\",
    514         using_key=self._using_key,
    515         validated_only=validated_only,
    516         organism=organism,
    517         source=self._sources.get(\"var_index\"),
    518         exclude=self._exclude.get(\"var_index\"),
    519     )

File ~/PycharmProjects/lamindb/lamindb/_curate.py:1501, in update_registry(values, field, key, save_function, using_key, validated_only, df, organism, dtype, source, standardize, warning, exclude, **kwargs)
   1498 # save from public
   1499 filter_kwargs_current = get_current_filter_kwargs(registry, filter_kwargs)
   1500 existing_and_public_records = (
-> 1501     registry.from_values(
   1502         list(values),
   1503         field=field,
   1504         **filter_kwargs_current,
   1505     )
   1506     if values
   1507     else []
   1508 )
   1510 labels_saved: dict = {\"from public\": [], \"without reference\": []}
   1512 public_records = [r for r in existing_and_public_records if r._state.adding]

File ~/PycharmProjects/lamindb/lamindb/_can_validate.py:40, in from_values(cls, values, field, create, organism, source, mute)
     37 from_source = True if cls.__module__.startswith(\"bionty.\") else False
     39 field_str = get_name_field(cls, field=field)
---> 40 return get_or_create_records(
     41     iterable=values,
     42     field=getattr(cls, field_str),
     43     create=create,
     44     from_source=from_source,
     45     organism=organism,
     46     source=source,
     47     mute=mute,
     48 )

File ~/PycharmProjects/lamindb/lamindb/_from_values.py:67, in get_or_create_records(iterable, field, create, from_source, organism, source, mute)
     64 if source_record:
     65     from bionty.core._add_ontology import check_source_in_db
---> 67     check_source_in_db(registry=registry, source=source_record)
     69     from_source = not source_record.in_db
     70 elif hasattr(registry, \"source_id\"):

File ~/PycharmProjects/bionty/bionty/core/_add_ontology.py:127, in check_source_in_db(registry, source, n_all, n_in_db)
    125 else:
    126     source.in_db = False
--> 127     source.save()

File ~/PycharmProjects/lamindb/lamindb/_record.py:659, in save(self, *args, **kwargs)
    656     # save unversioned record
    657     else:
    658         check_name_change(self)
--> 659         super(Record, self).save(*args, **kwargs)
    660         _store_record_old_name(self)
    661 # perform transfer of many-to-many fields
    662 # only supported for Artifact and Collection records

File ~/miniconda3/envs/lamindb/lib/python3.11/site-packages/django/db/models/base.py:891, in Model.save(self, force_insert, force_update, using, update_fields, *args)
    888     if loaded_fields:
    889         update_fields = frozenset(loaded_fields)
--> 891 self.save_base(
    892     using=using,
    893     force_insert=force_insert,
    894     force_update=force_update,
    895     update_fields=update_fields,
    896 )

File ~/miniconda3/envs/lamindb/lib/python3.11/site-packages/django/db/models/base.py:997, in Model.save_base(self, raw, force_insert, force_update, using, update_fields)
    993         force_insert = self._validate_force_insert(force_insert)
    994         parent_inserted = self._save_parents(
    995             cls, using, update_fields, force_insert
    996         )
--> 997     updated = self._save_table(
    998         raw,
    999         cls,
   1000         force_insert or parent_inserted,
   1001         force_update,
   1002         using,
   1003         update_fields,
   1004     )
   1005 # Store the database on which the object was saved
   1006 self._state.db = using

File ~/miniconda3/envs/lamindb/lib/python3.11/site-packages/django/db/models/base.py:1129, in Model._save_table(self, raw, cls, force_insert, force_update, using, update_fields)
   1120 values = [
   1121     (
   1122         f,
   (...)
   1126     for f in non_pks_non_generated
   1127 ]
   1128 forced_update = update_fields or force_update
-> 1129 updated = self._do_update(
   1130     base_qs, using, pk_val, values, update_fields, forced_update
   1131 )
   1132 if force_update and not updated:
   1133     raise DatabaseError(\"Forced update did not affect any rows.\")

File ~/miniconda3/envs/lamindb/lib/python3.11/site-packages/django/db/models/base.py:1194, in Model._do_update(self, base_qs, using, pk_val, values, update_fields, forced_update)
   1181 if self._meta.select_on_save and not forced_update:
   1182     return (
   1183         filtered.exists()
   1184         and
   (...)
   1192         (filtered._update(values) > 0 or filtered.exists())
   1193     )
-> 1194 return filtered._update(values) > 0

File ~/miniconda3/envs/lamindb/lib/python3.11/site-packages/django/db/models/query.py:1278, in QuerySet._update(self, values)
   1276 query.annotations = {}
   1277 self._result_cache = None
-> 1278 return query.get_compiler(self.db).execute_sql(CURSOR)

File ~/miniconda3/envs/lamindb/lib/python3.11/site-packages/django/db/models/sql/compiler.py:2003, in SQLUpdateCompiler.execute_sql(self, result_type)
   1996 def execute_sql(self, result_type):
   1997     \"\"\"
   1998     Execute the specified update. Return the number of rows affected by
   1999     the primary update query. The \"primary update query\" is the first
   2000     non-empty query that is executed. Row counts for any subsequent,
   2001     related queries are not available.
   2002     \"\"\"
-> 2003     cursor = super().execute_sql(result_type)
   2004     try:
   2005         rows = cursor.rowcount if cursor else 0

File ~/miniconda3/envs/lamindb/lib/python3.11/site-packages/django/db/models/sql/compiler.py:1574, in SQLCompiler.execute_sql(self, result_type, chunked_fetch, chunk_size)
   1572     cursor = self.connection.cursor()
   1573 try:
-> 1574     cursor.execute(sql, params)
   1575 except Exception:
   1576     # Might fail for server-side cursors (e.g. connection closed)
   1577     cursor.close()

File ~/miniconda3/envs/lamindb/lib/python3.11/site-packages/django/db/backends/utils.py:79, in CursorWrapper.execute(self, sql, params)
     78 def execute(self, sql, params=None):
---> 79     return self._execute_with_wrappers(
     80         sql, params, many=False, executor=self._execute
     81     )

File ~/miniconda3/envs/lamindb/lib/python3.11/site-packages/django/db/backends/utils.py:92, in CursorWrapper._execute_with_wrappers(self, sql, params, many, executor)
     90 for wrapper in reversed(self.db.execute_wrappers):
     91     executor = functools.partial(wrapper, executor)
---> 92 return executor(sql, params, many, context)

File ~/miniconda3/envs/lamindb/lib/python3.11/site-packages/django/db/backends/utils.py:100, in CursorWrapper._execute(self, sql, params, *ignored_wrapper_args)
     98     warnings.warn(self.APPS_NOT_READY_WARNING_MSG, category=RuntimeWarning)
     99 self.db.validate_no_broken_transaction()
--> 100 with self.db.wrap_database_errors:
    101     if params is None:
    102         # params default might be backend specific.
    103         return self.cursor.execute(sql)

File ~/miniconda3/envs/lamindb/lib/python3.11/site-packages/django/db/utils.py:91, in DatabaseErrorWrapper.__exit__(self, exc_type, exc_value, traceback)
     89 if dj_exc_type not in (DataError, IntegrityError):
     90     self.wrapper.errors_occurred = True
---> 91 raise dj_exc_value.with_traceback(traceback) from exc_value

File ~/miniconda3/envs/lamindb/lib/python3.11/site-packages/django/db/backends/utils.py:105, in CursorWrapper._execute(self, sql, params, *ignored_wrapper_args)
    103     return self.cursor.execute(sql)
    104 else:
--> 105     return self.cursor.execute(sql, params)

ProgrammingError: permission denied for table bionty_source
"
}

The validate() of Curator currently attempts to write a few things to the instance if necessary (missing values, [...]). This makes sense and simplifies things but it does now always require write permission. The name validate() does not have that away.

@sunnyosun
Copy link
Member

sunnyosun commented Nov 18, 2024

Hmm, it's a good question! What use case would it be that a user wanted to curate a dataset but not save it to any instance? I think it's unlikely, isn't it? If they want to save the artifact to an instance, they can just use using_key for curating with metadata from another instance, which was the main reason we added using_key in the CxG curator.

@Zethson
Copy link
Member Author

Zethson commented Nov 18, 2024

Yeah, you are right. It's rather unlikely. I'll improve the error message though because it's not clear enough.

@Zethson Zethson self-assigned this Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants