Feedback on our Documentation #23031
Replies: 24 comments 28 replies
-
Most example snippets are not runnable as is, missing imports or variables definition. They require us to figure out the rest of the code which is not always straighforward. I would like to be able to copy and paste. Love the ability to time-travel to old versions. Love your detailed changelogs on Github. |
Beta Was this translation helpful? Give feedback.
-
Dagster.yaml, Dagster.yaml, Dagster.yaml..... Side note: CLI based config should be a last resort, not the first example in documentation. I outright refuse to use any config that can't be versioned in code or at least in some .env file. (this include the cloud) |
Beta Was this translation helpful? Give feedback.
-
Role: Lead Data Architect I haven't had too many people ping me asking for how to build basic assets (incl resources and all of that), so my anecdotal impression is that the high-level explanation of concepts to get up and running seems to be fairly straightforward for most developers (even those without data experience, which has been the majority of people slotting into the system), so props on that side. On the other hand, I myself don't frequent the documentation quite as often anymore since I've been working with the system for so long, but one thing that I've been noticing more and more is that there's a lot of "hidden" functionality that hasn't been in the docs and I had to dig through the codebase to find. Some examples are specific system tags (e.g. Overall it seems like there's a lot of functionality added more or less recently ( |
Beta Was this translation helpful? Give feedback.
-
Well, either I just understand Dagster better, or the docs have gotten MUCH better in the last year. When I started you were transitioning from ops/graphs to assets, and I struggled a lot with understanding WHY you'd want to use certain abstractions or methods (e.g., what problem does it solve, with concrete examples) . I still get that feeling a bit when I read through new concepts, so maybe that's still an issue. I haven't actually gone through the course, but Dagster University helped a colleague get up to speed and start contributing to our repo very quickly. One thing I often run into is just some missing random missing piece of documentation - e.g., the link from a class to the api docs/code is broken, or a class doc is missing a method or return value. Unfortunately it doesn't usually feel worthwhile to go through the process of submitting a change request in github. It might be nice if there was some quick way to provide in-line feedback for the docs. |
Beta Was this translation helpful? Give feedback.
-
Role: Experience with Dagster (Low/Med/High): Your feedback: Why not just show the example fetching something public available, and use SQLite or duckeb? Cod would be copy paste and working without any extra work I ended up deviating from the example, I was successful but it took some extra |
Beta Was this translation helpful? Give feedback.
-
Role: Data Platform team lead
I'll add more as it comes to mind |
Beta Was this translation helpful? Give feedback.
-
I would appreciate some implementation examples for the infrastructure for Dagster+ Hybrid on ECS Fargate that goes more in depth about optimal/possible configurations. Go over some common, successful deployments of both the Self-Service and the Pro/Enterprise plan you see in the field and weigh some of the pros and cons of the approaches and explore benefits of the features you get with Pro/Enterprise that might make your implementation better. It's pretty hard to tell if the use-case I'm trying to serve can be handled with the self-service plan or if I need the pro/ent plan to handle some of our requirements. What I'm trying to do is implement dagster in the usual 1 AWS Account per Development environment setup with github actions for cicd in our data engineer repos to test changes with branch deployments in lower environments and scheduled prod jobs in our prod environment. example scenarios I'd like to see explored
The docs currently present a lot of possible options but they'd benefit from more prescriptive advice for those with little experience with dagster trying to implement it in a fairly mature cloud environment that goes beyond deploying the cloudformation templates. Some Devops/Infra/Platform Engineer focused docs would let me get things running and get my data engineers in and using the product to make the case for us to use it for more projects. |
Beta Was this translation helpful? Give feedback.
-
Role: SWE, Data |
Beta Was this translation helpful? Give feedback.
-
Role: Software Engineer I want to stop seeing all these AI generated answers in google search results please - I want the docs not some unverifiable spewing of a llm - thankz! |
Beta Was this translation helpful? Give feedback.
-
feedback
|
Beta Was this translation helpful? Give feedback.
-
Role: Senior Software Engineer Experience with Dagster: Low Feedback: I like that the Dagster university tutorial goes over how to use assets but I would have liked it to also go over ops, when to use assets vs ops, and how to tie things together when you have a multistep pipeline with multiple interchangeable parts (eg I could download pre-run inferences from S3 or submit a job to rerun a set of fresh inferences on a given model version, I could use validation code to compare post processed output to ground truth or provide a baseline set of post processed output to compare against both baseline and ground truth). In this case I'm trying to use Dagster to build a multistep customizable pipeline where I can either run all steps or use pre-generated outputs for some steps. Dagster seems to be quite powerful in terms of data lineage and asset materialization but I'm still trying to figure out the best way to apply these concepts to my use case and I think the getting started tutorials could go into more detail in these areas or link me to other tutorials which do. I feel like the detailed documentation is good but I'm looking for a higher-level intro that covers all the main concepts and how they fit together, as well as how to build a multistep, customizable, flexible pipeline like the above. |
Beta Was this translation helpful? Give feedback.
-
Role: Jr Quant & Data Engineer I've been trying to learn this for the last two months and am struggling... the documentation is really poor and incomplete, so I'm glad that it's a priority for this quarter! |
Beta Was this translation helpful? Give feedback.
-
Hi, Role: Data Engineer
|
Beta Was this translation helpful? Give feedback.
-
i'm trying to bring dagster into the org because i'm frustrated with the status quo (run a script in a vm, maybe have cron entry to do that for you) docs feedbackthis is a bit more than docs, think summarised friction log organizationplease look into using the 'write the docs' framework
this is the format larger projects use in the python ecosystem, it's what people expect as quality documentation. see: django, pandas, keras conceptsplease explain concepts for new users, think users that have never used these kinds of tools. lots of folks using just files, http apis, maybe a file server, or an embedded database. explain it more than once too, with different approaches of course.
tutorialsthere are like three different tutorials. examplesthese are either incomplete or too complicated - consider the basic use and how it connects with other parts of the system, slowly introduce variations appropriate to each example.
roleplayhave your people build projects from scratch. then improve documentation, api, and overall experience using their friction logs. get them to do this for every minor version but change the requirements each time. this should include anyone technical - devrel, engineering, devops, etc. the founder/ceo too. test the docsthe thursday after a new minor version a pair of devrels live streams a fresh deployment, debugs any eventual hiccups, and consults the docs as they go. astroturfing... as documentation get devrel folks to setup a personal deployment they run full time and where they can experiment with unique use cases. have them write blog posts and publish new integrations from their adventures. remove the training wheels by using the open source mode on a lightly supported cloud provider.
invite weirdness |
Beta Was this translation helpful? Give feedback.
-
Role:Software Engineer I don't know where to start learning so I read page by page from the top of the document tree and put it into practice. However, the order of the documents is not organised and I often have to read subsequent documents to understand and implement them. I would like to see a structure that takes into account the order of the documents and their relation to the different pages. Also, as others have said, many of the sample PGs cannot be run as they are, so I would like you to use samples that can be run. |
Beta Was this translation helpful? Give feedback.
-
Role: Data Engineer
|
Beta Was this translation helpful? Give feedback.
-
Role: Platform Engineer |
Beta Was this translation helpful? Give feedback.
-
This topic should get more documentation: #12251 I know I came in with the assumption that Dagster would have some mechanism to manage concurrency. It took waaayy too long to figure out that technically yes, it does but not in a way that is expected or useful. |
Beta Was this translation helpful? Give feedback.
-
Role: Engineering Team Lead The API documentation source code should support syntax highlighting. This makes readability much better. Example: https://docs.dagster.io/_modules/dagster/_core/definitions/decorators/asset_decorator#asset This becomes a blob of white text on a black background, it makes it hard to digest. |
Beta Was this translation helpful? Give feedback.
-
Role: Mainframer
Looking forward to 1.8. I hope this is the release that shows external assets the same as dagster-materialized assets in the UI, along with the excellent partition & checks sections in the "pill/card". |
Beta Was this translation helpful? Give feedback.
-
Role: Sr Staff Data Engineer The docs often fail to help us understand why to use a certain aspect of dagster. Here's one I just ran into today while I was trying to learn more about asset checks:
Can you elaborate, or provide examples of when using this method might be beneficial over other methods? |
Beta Was this translation helpful? Give feedback.
-
Role: Software tech-lead For example, the starter code on dagster.io @asset
def country_populations() -> DataFrame:
df = read_html("https://tinyurl.com/mry64ebh")[0]
df.columns = ["country", "pop2022", "pop2023", "change", "continent", "region"]
df["change"] = df["change"].str.rstrip("%").str.replace("−", "-").astype("float")
return df
@asset
def continent_change_model(country_populations: DataFrame) -> LinearRegression:
data = country_populations.dropna(subset=["change"])
return LinearRegression().fit(get_dummies(data[["continent"]]), data["change"])
@asset
def continent_stats(country_populations: DataFrame, continent_change_model: LinearRegression) -> DataFrame:
result = country_populations.groupby("continent").sum()
result["pop_change_factor"] = continent_change_model.coef_
return result but I was confused that most of the docs' |
Beta Was this translation helpful? Give feedback.
-
To add, documentation on code locations is also sparse:
|
Beta Was this translation helpful? Give feedback.
-
Thank you all for your input! Closing this discussion in favor of feedback on our new docs site: #23031 |
Beta Was this translation helpful? Give feedback.
-
Hi all!
This quarter, our team is focusing on improving our documentation. We know that Dagster can sometimes be complex to understand, and we're hoping to improve the overall experience from your first Hello World all the way to guides, detailed explanations, and API docs.
I'd love to hear what you love and hate about our docs.
Please use the template below and provide as much detail as you can:
Beta Was this translation helpful? Give feedback.
All reactions