-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating custom collection #1
Comments
Hi @tino097, sorry for delay Your error is caused by {
'my-users': lambda n, p, **kwargs: cu.ApiListCollection(
n,
p,
data_factory=cu.ApiListData
data_settings={"action": 'user_list'})
} I'll rewrite readme and add examples of collection creation in the beginning, before diving into the internals. If you can give me a couple of use-cases, it would be a good material for documentation |
And regarding UserCollection at the end of your code snippet. Most likely, you want to register a collection, that uses a custom serializer, and assigning data_settings is accidental For such a situation, where you only want to replace a factory, you can omit the constructor and assign the factory to the corresponding attribute:
Signature of collection constructor is For example, when you build a collection with class MyCollection(cu.ApiCollection):
def __init__(self, name: str, params: dict, **kwargs):
super().__init__(name, params, **kwargs)
print("THIS IS DATA SETTINGS ->", kwargs.get("data_settings")) And, with these comments, we can try building your collection. If you just want a user list that filters user using # ICollection
def get_collection_factories(self) -> dict[str, CollectionFactory]:
return {
'my-users': MyUserCollection,
}
##### your implementation of UserSerializer is left unchanged ####
# ApiList and Api collections just override the data factory. We are going to do it
# ourselves, so there will be no difference if we just use a simple collection as a base class
class MyUserCollection(cu.Collection):
# Data.with_attributes defines the anonymous class with a specific attribute
# overriden. If you are not going to use your custom data factory
# elsewhere, this is the shortest possible syntax
DataFactory = cu.ApiListData.with_attributes(action="user_list")
SerializerFactory = UserSerializer BTW, in your initial implementation, instead of |
Thanks @smotornyuk |
Hey @smotornyuk
I've pulled latest from master |
Thanks. I forgot to commit internal.py. Now it's added to the repo, so issue must be fixed in latest commit BTW, I'm rewriting the documentation. At the moment, I finished pages above the red line Mainly, I'm trying to explain things gradually with more examples. And there is one change: instead of importing everything like from ckanext.collection.shared import collection, data, serialize
#and use it like below
collection.Collection
data.ApiSearchData
serialize.CsvSerializer |
To confirm, if i want to have a custom data, i would need to create my own action where i would get desired information? Or if i could use the |
Using ModelCollection is more efficient, but there are certain disadvantages. If you use ModelCollection with a specific model from CKAN, you'll get all the records from DB. Imagine that you create ModelCollection for the model.Package - you'll get public, private, deleted, and draft datasets at once. If you are showing this collection to admin only - it's ok. If you are filtering results from the collection before showing it to the anonymous user - it is also ok. But it's your responsibility to protect private data and show collection only to people with required access level If you are using API action instead of the model, all restrictions are handled inside the action. If you use ApiSearchCollection that takes data from So, the answer is:
|
My use cases are to get reports within CKAN, as example:
So there would be a filtering and restrictions over some of the data but im trying to figure it out what would be the right path Thanks again |
Cool, another example for the time, when I continue updating documentation. Here you can use models directly. It doesn't sound like you'll be able to use API actions that collect data elsewhere, so creating them is not much value. Here's the code that creates a collection of every user. The collection contains the user's ID, name, full name, and all groups + organizations of the user. Examplefrom __future__ import annotations
import sqlalchemy as sa
from ckan import model
from ckanext.collection.shared import collection, data, serialize
# aliases that required to select data from the same model twice, for `groups`
# column and for `organizations` column.
stmt_groups = sa.alias(model.Group, "groups")
stmt_orgs = sa.alias(model.Group, "organizations")
# Data factory that executes SQLAlchemy statement to compute data
# records. StatementSaData accepts `statement` attribute(sqlalchemy.sql.Select
# instance) and uses this statement to fetch data from DB. This is a low-level
# data factory that can be used when you need a Collection over arbitrary SQL
# query. I do not recommend using ModelData here, because ModelData optimized
# for work with a single model, while here we have to combine data from User,
# Member and Group models.
#
# I'm using CLS.with_attributes(...) here, but if you read documentation, you
# already know that it's the same as if I defined class:
#
# >>> class UserData(data.StatementSaData):
# >>> statement = sa.select(...) # and here goes the whole value of select attribute.
#
UserData = data.StatementSaData.with_attributes(
statement=sa.select(
model.User.id,
model.User.name,
model.User.fullname,
sa.func.string_agg(stmt_groups.c.name, ",").label("groups"),
sa.func.string_agg(stmt_orgs.c.name, ",").label("organizations"),
)
.outerjoin(
model.Member,
sa.and_(
model.User.id == model.Member.table_id,
model.Member.table_name == "user",
),
)
.outerjoin(
stmt_groups,
sa.and_(
stmt_groups.c.id == model.Member.group_id, stmt_groups.c.type == "group"
),
)
.outerjoin(
stmt_orgs,
sa.and_(
stmt_orgs.c.id == model.Member.group_id, stmt_orgs.c.type == "organization"
),
)
.group_by(model.User)
)
# the collection itself. As you can see, the heavy work is done by data factory.
class UserCollection(collection.Collection):
DataFactory = UserData
# I don't know what format of report you are going to use, so let's choose CSV
SerializerFactory = serialize.CsvSerializer
# initialize a collection
users = UserCollection()
# transform it into CSV
print(users.serializer.serialize()) To add filters to the collection, we need to modify the data factory. It will be converted into a standard class (instead of using Exampleclass UserData(data.StatementSaData):
# statement is not changed
statement = ...
# this method is responsible for filtration. It's called automatically,
# accepts `statement` of data factory and must return statement with
# filters applied
def statement_with_filters(self, stmt: sa.sql.Select) -> sa.sql.Select:
# `self.attached` is a reference to collection that holds data
# factory. `params` attribute contains data from the second argument
# passed to the collection constructor
params = self.attached.params
# let's filter by exact match when using name
if "name" in params:
stmt = stmt.where(stmt.selected_columns["name"] == params["name"])
# fullname will use case-insensitive substring match
if "fullname" in params:
fullname = params["fullname"]
stmt = stmt.where(stmt.selected_columns["fullname"].ilike(f"%{fullname}%"))
# groups/organizations can are filtered as fullname. But you'll
# probably use something more sophisticated
for group_type in ["groups", "organizations"]:
if group_type not in params:
continue
value = params[group_type]
stmt = stmt.having(stmt.selected_columns[group_type].contains(value))
return stmt
# this class remains unchanged
class UserCollection(collection.Collection):
...
# `params` used by `statement_with_filters` is a dictionary
# passed as a second argument to collection constructor. You can build html-form,
# submit it and extract data from `ckan.plugins.toolkit.request.args`. This value
# is a good candidate for `params`
users = UserCollection("", {"name": "default"})
# transform it into CSV
print(users.serializer.serialize()) And here's the distribution of datasets created by users in different organizations/grops defined in the same manner Examplefrom __future__ import annotations
import sqlalchemy as sa
from ckan import model
from ckanext.collection.shared import collection, data, serialize
# aliases that required to select data from the same model twice, for `groups`
# column and for `organizations` column.
package_membership = sa.alias(model.Member)
user_membership = sa.alias(model.Member)
class GroupStatsData(data.StatementSaData):
# statement is not changed
statement = (
sa.select(
model.Group.name.label("group_name"),
model.Group.title,
model.Group.type,
sa.func.count(model.Package.id).label("number of datasets"),
model.User.name.label("user_name"),
)
.join(user_membership, model.Group.id == user_membership.c.group_id)
.join(model.User, model.User.id == user_membership.c.table_id)
.join(package_membership, model.Group.id == package_membership.c.group_id)
.join(model.Package, model.Package.id == package_membership.c.table_id)
.where(
model.User.state == "active",
model.Package.state == "active",
model.Group.state == "active",
)
.group_by(model.User, model.Group)
)
class GroupStatsCollection(collection.Collection):
DataFactory = GroupStatsData
SerializerFactory = serialize.CsvSerializer
stats = GroupStatsCollection()
print(stats.serializer.serialize()) |
Here's implmenetation of the first collection using API action, just for reference. In this case, all the logic goes to action and collection becomes slim. You may find this style more readable, as you are more used for API actions Examplefrom __future__ import annotations
from ckanext.collection.shared import collection, data, serialize
# action definition
@tk.side_effect_free
def my_user_listing(context: Context, data_dict: dict[str, Any]) -> dict[str, Any]:
tk.check_access("my_user_listing", context, data_dict)
# ApiSearchData use package_search-style for parameter names. rows ->
# limit, start -> offset.
rows = tk.asint(data_dict.get("rows", 10))
start = tk.asint(data_dict.get("start", 0))
stmt = sa.select(model.User)
total = model.Session.scalar(sa.select(sa.func.count()).select_from(stmt))
stmt = stmt.limit(rows).offset(start)
# ApiSearchData expects package_search-like result, with `results` and
# `count` keys
return {
"results": [
{
"id": user.id,
"name": user.name,
"fullname": user.fullname,
"groups": user.get_group_ids("group"),
"organizations": user.get_group_ids("organization"),
}
for user in model.Session.scalars(stmt)
],
"count": total,
}
UserData = data.ApiSearchData.with_attributes(action="my_user_listing")
class UserCollection(collection.Collection):
DataFactory = UserData
SerializerFactory = serialize.CsvSerializer
users = UserCollection()
print(users.serializer.serialize()) |
Im tryng to implement the
ICollection
interface with the following codeSo this collection is getting registered and im able to select on the explorer component but when selected im getting the following error
What im missing to set to initialize the ApiListCollection ?
Thanks
The text was updated successfully, but these errors were encountered: