-
Hi, While the Basics of Solids tutorial made a quite clear distinction between Input vs Config, when it comes to actual implementation, I am still unclear, particularly when the initial parameter to be passed into the initial solid is something more complex beyond a primitive type (e.g. a dict) I've noticed in a few examples (this and this, Secondly, in a long pipeline that is made up of many solids in serial, there are use cases where the input parameters would have to be part of the output for the downstream solids repeatedly, because some of the attributes are required. Does that mean it would be better to decorate the solid with my input as As an example, let's say I have a list of 10 different APIs with different schemas to be ingested on hourly basis.
Some of the API metadata such as name or id need to be used in a few solids downstream. The main use case of this pipeline is that sometimes downstream tasks might failed, but as long as the raw json in S3 in the first place, I can always backfill later. So my question is, how can I best describe my input for this kind of pipeline - as an input, config or resource? Thank you! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
@szeleeteo this is a good question. For your situation, it sounds to me like config or resource makes the most sense. Loading via "input" corresponds to defining a dagster_type_loader . This is useful when you have multiple different solid definitions that do different things, but operate on inputs with the same logical type, and you want a common way of loading those inputs. If I'm understanding correctly, this is not your situation, because you have at most a single solid definition for operating on each input source. Resources are useful when you want to be able to configure a set of solids all at once. If that's the case for you, and it sounds like it might be, then it might make sense. Does that answer your question? Happy to go into more detail if helpful. |
Beta Was this translation helpful? Give feedback.
-
Thanks @sryza for the clarification on input, it makes more sense now you've mentioned So the gist is, "input" is to solid as "resource" is to pipeline. To clarify:
Thank you |
Beta Was this translation helpful? Give feedback.
@szeleeteo this is a good question. For your situation, it sounds to me like config or resource makes the most sense.
Loading via "input" corresponds to defining a dagster_type_loader . This is useful when you have multiple different solid definitions that do different things, but operate on inputs with the same logical type, and you want a common way of loading those inputs. If I'm understanding correctly, this is not your situation, because you have at most a single solid definition for operating on each input source.
Resources are useful when you want to be able to configure a set of solids all at once. If that's the case for you, and it sounds like it might be, then it might make sense.