Databricks
Omnata's integration service replicates data from Databricks and loads it into your destination (Snowflake) at regular intervals, as well as providing direct query capabilities into your Databricks instance.
Supported Services
Databricks on AWS
Databricks on Azure
Databricks on GCP
Connectivity Options
Omnata's Snowflake Native App can connect to your Databricks instance in three ways: directly (if publicly accessible), via Privatelink, or via Omnata's secure ngrok tunnel.
Direct
Connect to your Databricks instance using its fully qualified hostname and port (default 443). The instance must be network-accessible from Snowflake.
Privatelink
Privatelink provides a private network path between your Snowflake account and your Databricks workspace without traffic traversing the public internet.
Requirements:
Snowflake Business Critical Edition
Snowflake account and Databricks workspace on the same cloud provider and region
To configure, provision a Privatelink endpoint in your Snowflake account pointing to your Databricks workspace, then choose the Privatelink option when creating the connection in Omnata and provide the endpoint domain.
A Pre-requisite for creating a successful connection is setting up a network rule for the authentication. This can be done by calling
This will create a new network rule with the authentication endpoints and your storgae endpoints included. Keep a note as you will need it when creating new connections.
Authentication
Personal Access Token
Provide your Databricks personal access token. Tokens can be generated in your Databricks workspace under Settings > Developer > Access tokens.
Service Principal (OAuth M2M)
Omnata supports OAuth machine-to-machine authentication via an Service Principal. Provide:
Client ID — the Service Principal's application (client) ID
Client Secret — the Service Principal's client secret
and for Azure EntraID based Service Principals provide:
Tenant ID - The Azure tenant assocaited with the service principal
Connection Configuration
When configuring a new connection, you be asked to create a external access integration. If youa re using Private link, add the network rule you created earlier into the list of allowed rules.
When creating a connection you will need to provide:
Server Fully Qualified Name
The hostname of your Databricks workspace (e.g. adb-1234567890.1.azuredatabricks.net)
HTTP Path
The HTTP path of the SQL Warehouse or cluster to connect to. Found in your Databricks workspace under SQL Warehouses > [Your Warehouse] > Connection Details > HTTP Path (e.g. /sql/1.0/warehouses/abc123)
Cloud Provider
The cloud provider hosting your Databricks workspace: AWS, Azure, or GCP
External Storage Accounts
A comma-separated list of storage account names used for unloading data from Databricks. Found in your Databricks workspace under Catalog > Connect > External Data. The storage account name is extracted from the URL field: for Azure, it is the part after @ and before .dfs.core.windows.net; for AWS, it is the part before .amazonaws.com (ie bucket.s3.region). For GCP this is not required
Required Permissions
The Databricks user or service principal used for the connection requires:
USE CATALOGon the target catalogUSE SCHEMAon the target schema(s)SELECTon the tables and views to be synced
Example grants in Databricks SQL:
For broader access, grant at the catalog or schema level:
Inbound Syncs
Omnata supports the following sync strategies for Databricks.
Full Refresh
Retrieves all records from the source table or view on every sync run. Use this for tables or views that do not have a suitable cursor column.
Incremental (High Watermark)
Retrieves only records that have changed since the last sync run, using a cursor column to track progress. Omnata automatically detects suitable cursor columns based on their data type.
Supported cursor column types: date, timestamp, timestamp_ntz, timestamp_ltz, bigint, long, int, integer.
If no suitable cursor column is present, only Full Refresh will be available for that stream.
Outbound Syncs
Coming soon
Managing Performance
Tuning Parameters
Three parameters control the concurrency and batch size of sync operations:
Maximum Concurrent Streams (MCS)
The number of tables/views processed in parallel. Default: 8, range: 1–64.
Maximum Concurrent Queries (MCQ)
The number of concurrent queries issued per stream when fetching data. Default: 4, range: 1–64.
Target Records Per Query (MRQ)
The target row count per individual query. Omnata splits large tables into chunks based on index distribution. Default: 1,000,000, range: 100,000–25,000,000.
Example
Streams
16 tables × 5M rows each = 80M rows total
MCS
8
MCQ
4
MRQ
1,000,000
This results in 80 queries of ~1M rows each, with a maximum of 32 (MCS × MCQ) concurrent queries at any time.
Snowflake Warehouse
Omnata is designed to run on a single-cluster XS warehouse. During an initial historical backfill, consider temporarily enabling multi-cluster scaling with Standard scaling policy (not Economy) to reduce query queuing. Return to single-cluster once the initial sync completes.
Functions
Omnata exposes a SQL table function that lets you query your Databricks instance directly from Snowflake.
SQL_QUERY
Executes a SQL SELECT statement against the Databricks source and returns results as a table.
Parameters:
CONNECTION_SLIG
STRING
The slug associated with the omnata connection
QUERY
STRING
The SQL query to execute
Example:
Last updated