# Databricks

Omnata's integration service replicates data from Databricks and loads it into your destination (Snowflake) at regular intervals, as well as providing direct query capabilities into your Databricks instance.

***

### Supported Services

* Databricks on AWS
* Databricks on Azure
* Databricks on GCP

***

### Connectivity Options

Omnata's Snowflake Native App can connect to your Databricks instance in three ways: directly (if publicly accessible), via Privatelink, or via Omnata's secure ngrok tunnel.

#### Direct

Connect to your Databricks instance using its fully qualified hostname and port (default 443). The instance must be network-accessible from Snowflake.

#### Privatelink

Privatelink provides a private network path between your Snowflake account and your Databricks workspace without traffic traversing the public internet.

Requirements:

* Snowflake Business Critical Edition
* Snowflake account and Databricks workspace on the same cloud provider and region

To configure, provision a Privatelink endpoint in your Snowflake account pointing to your Databricks workspace, then choose the Privatelink option when creating the connection in Omnata and provide the endpoint domain.

A Pre-requisite for creating a successful connection is setting up a network rule for the authentication. This can be done by calling

```
call OMNATA_DATABRICKS_PLUGIN.PLUGIN_API.CREATE_PUBLIC_ENDPOINT_NETWORK_RULE('<<cloud provider>>', '<<network rule name>>', ['<<list of storage accounts>>'])
```

This will create a new network rule with the authentication endpoints and your storgae endpoints included. Keep a note as you will need it when creating new connections.

### Authentication

#### Personal Access Token

Provide your Databricks personal access token. Tokens can be generated in your Databricks workspace under **Settings > Developer > Access tokens**.

#### Service Principal (OAuth M2M)

Omnata supports OAuth machine-to-machine authentication via an Service Principal. Provide:

* **Client ID** — the Service Principal's application (client) ID
* **Client Secret** — the Service Principal's client secret

and for Azure EntraID based Service Principals provide:

* **Tenant ID** - The Azure tenant assocaited with the service principal

***

### Connection Configuration

When configuring a new connection, you be asked to create a external access integration. If youa re using Private link, add the network rule you created earlier into the list of allowed rules.

When creating a connection you will need to provide:

| Field                           | Description                                                                                                                                                                                                                                                                                                                                                                                                    |
| ------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Server Fully Qualified Name** | The hostname of your Databricks workspace (e.g. `adb-1234567890.1.azuredatabricks.net`)                                                                                                                                                                                                                                                                                                                        |
| **HTTP Path**                   | The HTTP path of the SQL Warehouse or cluster to connect to. Found in your Databricks workspace under **SQL Warehouses > \[Your Warehouse] > Connection Details > HTTP Path** (e.g. `/sql/1.0/warehouses/abc123`)                                                                                                                                                                                              |
| **Cloud Provider**              | The cloud provider hosting your Databricks workspace: `AWS`, `Azure`, or `GCP`                                                                                                                                                                                                                                                                                                                                 |
| **External Storage Accounts**   | A comma-separated list of storage account names used for unloading data from Databricks. Found in your Databricks workspace under **Catalog > Connect > External Data**. The storage account name is extracted from the URL field: for Azure, it is the part after `@` and before `.dfs.core.windows.net`; for AWS, it is the part before `.amazonaws.com` (ie bucket.s3.region). For GCP this is not required |

### Required Permissions

The Databricks user or service principal used for the connection requires:

* `USE CATALOG` on the target catalog
* `USE SCHEMA` on the target schema(s)
* `SELECT` on the tables and views to be synced

Example grants in Databricks SQL:

```
GRANT USE CATALOG ON CATALOG my_catalog TO `omnata_user`;
GRANT USE SCHEMA ON SCHEMA my_catalog.my_schema TO `omnata_user`;
GRANT SELECT ON TABLE my_catalog.my_schema.my_table TO `omnata_user`;
```

For broader access, grant at the catalog or schema level:

```
GRANT USE CATALOG, USE SCHEMA, SELECT ON CATALOG my_catalog TO `omnata_user`;
```

***

### Inbound Syncs

Omnata supports the following sync strategies for Databricks.

#### Full Refresh

Retrieves all records from the source table or view on every sync run. Use this for tables or views that do not have a suitable cursor column.

#### Incremental (High Watermark)

Retrieves only records that have changed since the last sync run, using a cursor column to track progress. Omnata automatically detects suitable cursor columns based on their data type.

Supported cursor column types: `date`, `timestamp`, `timestamp_ntz`, `timestamp_ltz`, `bigint`, `long`, `int`, `integer`.

If no suitable cursor column is present, only Full Refresh will be available for that stream.

### Outbound Syncs

Coming soon

***

### Managing Performance

#### Tuning Parameters

Three parameters control the concurrency and batch size of sync operations:

**Maximum Concurrent Streams (MCS)**

The number of tables/views processed in parallel. Default: `8`, range: `1–64`.

**Maximum Concurrent Queries (MCQ)**

The number of concurrent queries issued per stream when fetching data. Default: `4`, range: `1–64`.

**Target Records Per Query (MRQ)**

The target row count per individual query. Omnata splits large tables into chunks based on index distribution. Default: `1,000,000`, range: `100,000–25,000,000`.

**Example**

| Setting | Value                                     |
| ------- | ----------------------------------------- |
| Streams | 16 tables × 5M rows each = 80M rows total |
| MCS     | 8                                         |
| MCQ     | 4                                         |
| MRQ     | 1,000,000                                 |

This results in 80 queries of \~1M rows each, with a maximum of 32 (MCS × MCQ) concurrent queries at any time.

#### Snowflake Warehouse

Omnata is designed to run on a single-cluster XS warehouse. During an initial historical backfill, consider temporarily enabling multi-cluster scaling with **Standard** scaling policy (not Economy) to reduce query queuing. Return to single-cluster once the initial sync completes.

***

### Functions

Omnata exposes a SQL table function that lets you query your Databricks instance directly from Snowflake.

#### SQL\_QUERY

Executes a SQL `SELECT` statement against the Databricks source and returns results as a table.

**Parameters:**

| Parameter         | Type   | Description                                    |
| ----------------- | ------ | ---------------------------------------------- |
| `CONNECTION_SLIG` | STRING | The slug associated with the omnata connection |
| `QUERY`           | STRING | The SQL query to execute                       |

**Example:**

```
SELECT *
FROM TABLE(OMNATA_DATABRICKS_PLUGIN.UDFS.SQL_QUERY(
    <connection_slug>>,
    'SELECT * FROM my_catalog.my_schema.my_table'
));
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.omnata.com/omnata-product-documentation/omnata-sync-for-snowflake/apps/databricks.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.