dbt

Omnata Syncs can be fully embedded within a dbt pipeline via our dbt package. To install the package, follow the instructions in the GitHub repository readme.

Omnata is native to the Snowflake platform, so instead of behaving like a downstream system, our dbt package directly accesses the product via database queries.

We integrate into the dbt workflow in the following ways:

Early detection of breaking changes

For outbound syncs, our custom dbt tests can feed source model data through the record validators specific to the App, without triggering a Sync operation. This flags potential issues to dbt users who may not even be aware of the Syncs downstream of the model they changed.

Testing of Sync/model changes

Omnata Sync Branches can be activated and ran via the use of dbt targets. This means that changes to source data or sync parameters can be automatically routed to a non-production environment for verification. Once complete, the change is applied to production via the dbt project source, along with any model changes.

Replacement for sources

When using inbound syncs, you don't need to use a separate ELT tool and then create matching sources. Our dbt package uses a custom materialization, and inbound data can simply be ref'd downstream.

Automated data activation

When using outbound syncs, dbt is responsible for creating and updating the tables that feed into it. Therefore, it makes sense to run it whenever the data changes, rather than on a timed schedule. Our dbt package will run a Sync after its source model has completed.

If you'd like Sync issues to be reported as part of your dbt test run, you can use our custom dbt test to check the results of the sync for any problems.

dbt and branching

dbt is a tool that is normally used in conjunction with version control, but it's important to remember that it's not fully governed by it.

Typically you'll have dbt targets like 'prod' or 'ci' or 'qa' which are more tightly controlled; a developer shouldn't normally be running dbt directly using those targets. However, targets like 'dev' are often "pre version control", running some proposed changes in a development environment like someone's laptop.

We wanted to accommodate that flow of permissive development in isolated uncontrolled areas, but also provide guardrails for more important operations where the stakes are higher.

Something we need to consider is that multiple developers are running the 'dev' target with different project configurations. This isn't a problem when doing data modelling, as their target schemas will be different, so their tables and views are built in a unique location in the warehouse.

But sadly, app environments aren't so simple to clone. So with data syncing, we want the data to converge back into a limited, pre-determined set of App environments. And we don't want developers using the same dbt target to be automatically overwriting each other's changes.

So here's the flow (there's a diagram here):

1) Start in the Omnata UI

Sync changes are authored initially in the Omnata UI, and then activated by dbt via a copy-paste of the new model. The advantages here are two-fold:

  • We get a best-of-both-worlds configuration experience. Sync configuration fields can be complex and interdependent. It's much more productive to make a guided series of choices than try to hand-craft a conforming YAML/JSON document. But beyond this initial step, you get all the benefits of version control and deployment automation.

  • By creating a draft of the branch in advance in a central location, we're effectively choosing which dbt developer "owns" the dev target for that branch. If your model config doesn't match this draft, we ignore your run.

2) Activate the branch with a dbt run

After pasting the new model config into your dbt project, you can do a dbt run --target dev (or whatever your development target is), to activate the branch.

To control this behaviour, we use the match_required_targets config parameter to nominate one or more dbt targets that only have an effect when they match what's in the branch.

In the meantime, other developers doing the same dbt run with the outdated model configuration don't have any effect.

Adding --vars 'expect_omnata_match: True' to your dbt run will raise a compiler error if the model configuration can't be applied. This can be useful if you're making changes to an Omnata sync and you want to be certain that they've been applied.

3) Merge the changes

Once everything's ready to go, use your regular Pull Request process to get the change into the branch you use for your production dbt runs.

We use the main_target config parameter to nominate a single dbt target to correspond to the 'main' (production) branch of the sync. This is named 'prod' in many dbt projects; in other words, if you're doing a dbt run --target prod, then we're applying settings and running the sync for real in production.

If there are any dbt runs in-between (say a pre-production ci run+test), these will work as long as the target name isn't in match_required_targets.

What about connections?

You'll notice that the Connection name (the "slug") is included in the Sync's model config, but not the actual connection details.

Creating of connections is another thing we leave to the Omnata UI, rather than try to manage in your dbt project. This is driven by a mix of:

  • Necessity - many Apps use interactive OAuth for authorization

  • Security - it's tricky to do secret management well over a SQL API

  • Practicality - connections are expect to change infrequently, and can also be reused across different Syncs

What about permissions?

You might have noticed that everything described above is simply a convention, not a restriction. dbt developers could chose any target name they like when running locally, including the production target name.

So if you want to truly protect your main Sync from accidental/unauthorized changes, you'll need to use Snowflake role separation to limit who can actually modify or run it.

Last updated