Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: adds DBT concept documentation #111

Merged
merged 5 commits into from
Jan 25, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions docs/concepts/dbt.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
.. _dbt:

data build tool (dbt)
*********************

``dbt`` is an open source, command-line tool managed by `dbtlabs`_ for generating and maintaining data transformations.

``dbt`` allows engineers to transform data by writing ``SELECT`` statements that reflect business logic which ``dbt``
materializes into tables and views that can be queried efficiently.

``dbt`` also allows engineers to modularize and re-use their transformation code using "packages" that can be shared
across projects or organizations.

dbt in Aspects
##############

Aspects uses the `aspects-dbt`_ package to define the transforms used by the Aspects project. This package manages
materialized views for data tables stored in `Clickhouse`_.
pomegranited marked this conversation as resolved.
Show resolved Hide resolved

Operators may create and install their own ``dbt`` packages; see `dbt extensions`_ for details.

`tutor-contrib-aspects`_ also provides a "do" command to proxy running `dbt commands`_ against your deployment; run
``tutor [dev|local] do dbt --help`` for details.

References
##########

* `dbtlabs`_: ``dbt`` documentation
* `dbt-core`_: core ``dbt`` package
* `aspects-dbt`_: Aspects dbt transforms
* `tutor-contrib-aspects`_: Aspects Tutor plugin

.. _aspects-dbt: https://github.com/openedx/aspects-dbt/#aspects-dbt
.. _clickhouse: clickhouse.html
.. _dbtlabs: https://docs.getdbt.com/
.. _dbt-core: https://github.com/dbt-labs/dbt-core
.. _dbt commands: https://docs.getdbt.com/reference/dbt-commands
.. _dbt extensions: ../how-tos/dbt_extensions.html
.. _tutor-contrib-aspects: https://github.com/openedx/tutor-contrib-aspects
1 change: 1 addition & 0 deletions docs/concepts/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ Concepts
xAPI <xapi_concepts>
Tracking Logs <tracking_logs>
Clickhouse <clickhouse>
dbt <dbt>
Ralph <ralph>
Vector <vector>
Pipelines <pipelines>
Expand Down
30 changes: 18 additions & 12 deletions docs/how-tos/dbt_extensions.rst
Original file line number Diff line number Diff line change
@@ -1,14 +1,20 @@
.. _dbt-extensions:

DBT extensions
**************

To extend the DBT project, you can use the following Tutor variables:

- **DBT_REPOSITORY**: A git repository URL to clone and use as the DBT project.
- **DBT_BRANCH**: The branch to use when cloning the DBT project.
- **DBT_PROJECT_DIR**: The directory to use as the DBT project.
- **EXTRA_DBT_PACKAGES**: A list of python packages for the DBT project to install.
- **DBT_ENABLE_OVERRIDE**: This variable determines whether the DBT project override feature
should be enabled or not. When enabled, it allows you to make changes to the **dbt_project.yml**
and **packages.yml** files using the tutor patches: `dbt-packages` and `dbt-project`.
Extending dbt
*************

As noted in `dbt concepts`_, you can install your own custom DBT packages to apply your own transforms to the event data
pomegranited marked this conversation as resolved.
Show resolved Hide resolved
in Aspects.

To change which DBT packages are installed, use the following Tutor variables:

- **EXTRA_DBT_PACKAGES**: A list of pip dbt packages for Aspects to install. Add your custom ddt packages here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ddt -> dbt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the preferred way to customize dbt is to change the DBT_REPOSITORY / DBT_BRANCH / DBT_REPOSITORY_PATH to your dbt project, and have that project make our dbt package a requirement. Is that you how you did it @Ian2012 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you can update it to point to the main branch (to be always up to date) but it's better to have a pinned version so we don't introduce breaking changes

Copy link
Contributor Author

@pomegranited pomegranited Jan 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting, I thought that EXTRA_DBT_PACKAGES was for adding custom dbt packages, rather than encouraging people to fork aspects-dbt or run with a different version?

Let me know which way it should read, and I'll update the recommendations here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, EXTRA_DBT_PACKAGES is a list of python requirement that will be installed before your custom version of aspects-dbt will be run.

While, you can change your DBT_REPOSITORY and create a little packages.yml with aspects-dbt as a package. This will allow you to have all the base functionality of aspects while creating your custom models without forking. See this commit for an example

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh cool, thank you for these details @Ian2012 ! I'll incorporate them here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@Ian2012 Ian2012 Jan 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, just missing two pieces I forgot to mention. You can create the project with dbt init and you must update the dbt_project.yml file to use the aspects profile. Sorry for going forward, once this is added I think is good to go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- **DBT_REPOSITORY**: A git repository URL to clone and use as the main Aspects DBT project.
- **DBT_BRANCH**: The branch to use when cloning ``DBT_REPOSITORY``.

To change how the ``dbt`` packages are configured, use these Tutor variables:

- **DBT_PROFILE_\***: variables used in the ``dbt/profiles.yml`` file, including several Clickhouse connection settings


.. _dbt concepts: ../concepts/dbt.html