Microsoft SQL Server
About
Omnata's integration service replicates data from your SQL Server source database and loads it into your destination at regular intervals as well as providing direct query capabilities to your SQL Server database.
Supported Services
Generic SQL Server
Azure SQL Database
Azure SQL Managed Instance
Amazon SQL Server RDS
Compatibility Levels
The SQL Server plugin it supported for SQL Server databases that have a compatibility level of 100 or higher.
Connectivity options
Omnata's Snowflake Native App can connect to your database in three ways; using built-in secure tunnelling, Privatelink (only available on Snowflake Business Critical Edition AWS and Azure), or directly if the database network is publicly accessible.
Private link
To use Private link:
You must have Snowflake Business Critical Edition
Your Snowflake account must be using the same cloud provider as the one you are connecting to. For example, to connect to an Azure SQL Database instance via Privatelink, your Snowflake account must be on Azure.
Omnata-ngrok tunnels
The Omnata tunnelling uses an SSH Tunnel which is provided by our OEM partner ngrok. Omnata establishes a secure TLS tunnel to the ngrok server and then connects to the SQL Server database through a tunnel established from your Snowflake account. This ensures that your database is not exposed to the internet and that your data is secure.
Authentication
SQL Server Authentication
Omnata supports SQL Server Authentication. When you create a connection, you will need to provide the username and password for the SQL Server user you want to use to connect to the database.
If connecting to Azure SQL Database, you will need to provide the username@servername
for the username instead of just the username
.
Connection Configuration
SQL Server Configuration
No additional configuration is required for SQL Server databases unless you are wanting to use Change Data Capture (CDC).
Azure SQL Database Configuration
Private Access - Create a new private endpoint for the Azure SQL Database
Connectivity - Update the connection policy to
Default
orProxy
to allow access from the private endpoint
Should I use Privatelink or ngrok?
The use of Privatelink or ngrok are mutually exclusive. If you have Snowflake Business Critical and the required cloud infrastructure, then we recommend using Privatelink and you won't need ngrok.
If you have any other Snowflake Edition, mis-matched cloud providers and/or on-premise data, then ngrok is the best solution for private connectivity.
Privatelink
You can use Azure or AWS Private link to connect to databases hosted in your cloud VPC, these are the two examples are documented below.
It is also possible to connect via Private link to databases hosted on-premise. This uses the same network load balancer approach, shown in the AWS example below, and an existing network bridge between your cloud and on-premise environments.
Azure
To connect via Privatelink to an Azure SQL database, first provision a Privatelink endpoint in your Snowflake account:
Then, simply choose the Privatelink option when creating the connection.
You can also connect to SQL Server running on a VM in Azure, the architecture will be similar to that described below.
AWS
To connect via Privatelink to an AWS RDS service, you may need to deploy some extra infrastructure in your AWS account before provisioning the Privatelink endpoint. The following diagram explains how to connect to RDS via a network load balancer:
ngrok Tunnel Configuration
The Omnata SQL Server plugin uses ngrok to establish a secure tunnel to your SQL Server database. This ensures that your database is not exposed to the internet and that your data is secure.
The connection workflow is as follows:
Omnata Sync establishes an outbound TLS connection from Snowflake to the ngrok service
You install the ngrok agent on your network, where it can access your SQL Server
The agent establishes an outbound TLS connection from your network to the ngrok service, you do not open any ports.
Omnata Sync connects through this tunnel to your SQL Server, again using TLS.
Download & Install the ngrok Agent
For MacOS, use Homebrew:
For Linux, use Apt:
For Windows, use Chocolatey (You'll need to run this in an Administrator Command Prompt):
If you don't have one of these package managers installed or prefer to install the ngrok agent yourself, you can download the ngrok agent from the Ngrok website and install it on your network.
Configure & launch the ngrok agent
You can run the ngrok agent either as a standalone process or as a service. Instructions for each are provided in tabs on the 'launch agent' step in the Omnata UI.
Complete the Omnata UI form to populate values
The Omnata UI will provide you with a code snippet that contains the --auth token and --label edge for the connection, these pre-populated values configure the outbound connection from your network to Omnata's ngrok instance.
You will need to provide the [address]:[port] values that points the ngrok agent to the location of your SQL Server inside your network.
Common variations in agent-to-database configuration
If you run the agent on the same server as the SQL Server, you can just the port number [port]. For SQL Server, the default port is 1433.
If you run the agent on a different server, you will need to use the hostname of the server where the SQL Server is running [address], as well as the port [port].
If using a Named Instance, ensure it is accessible via [address]:[port], instead of the address/instance-name
Run as a standalone process
To run the ngrok agent as a standalone process, follow the steps provided in the Omnata UI. Omnata will provide you with a code snippet that looks like the following:
No modifications to this code are required and the configuration is applied and launched in the one command.
Run as a service
To configure the ngrok agent to run as a service, follow these steps:
Locate the configuration directory for the ngrok agent.
For MacOS, the configuration directory is
~/.config/ngrok
For Linux, the configuration directory is
~/Library/Application Support/ngrok
For Windows, the configuration directory is
%HOMEPATH%\AppData\Local\ngrok
In the directory, create a configuration file (called
omnata.yml
) which will be used by ngrok agent. The configuration file should contain the following information:
Install the ngrok agent as a service by running the following command:
Start the ngrok service by running the following command:
Troubleshooting Connectivity Issues
The following are common issues and solutions when connecting.
Cannot open server "xxxxxxx.ngrok.app" requested by the login. The login failed
Please enter your password again in the connection setup. If you are using SQL Azure Database, please enter the username@servername
for the username.
The TCP/IP connection to the host xxxxxxx.ngrok.app, port 443 has failed.
Please ensure that the ngrok agent is running and that it can see the SQL Server.
The TCP/IP connection to the host xxxxxx-a.worker.database.windows.net (redirected from xxxxxxx.ngrok.app), port 11063 has failed.
Azure SQL Database
: You need to modify the connection policy to Default
or Proxy
to allow access from the private endpoint.
Azure Managed Instance
: You need to ensure your private endpoint is accessing the Managed Instance via Private IP addresses
Inbound Syncs
Omnata connects to your database using SQL Server's default transaction isolation level, Read Committed. This ensures that the data you are reading is not locked by other transactions. Omnata uses the following methods to replicate data from your SQL Server source database:
Change Data Capture (CDC)
Change Tracking (CT)
Highwater Mark
Change Data Capture (CDC)
If you have a SQL Server database that supports Change Data Capture (CDC), Omnata can use CDC to replicate data from your source database. CDC is a feature that captures insert, update, and delete operations that are performed on a table and makes the details of the changes available in change tables. Omnata reads the change tables to replicate the changes to your destination database. Omnata uses the CDC instance to capture changes to the source database and replicate them to the destination database and then extracts any records not currently found in your CDC instance from the source table.
Using the APPEND
strategy will run the get_all_changes
function to get all changes from the CDC instance and then append them to the destination table. This is the default merge strategy for CDC syncs. Using the MERGE
strategy will run the get_net_changes
function to get net changes from the CDC instance and then merge them into the destination table. This merge strategy is useful when you only need to keep the latest version of a record in the destination table.
Change Tracking (CT)
If you have a SQL Server database that supports Change Tracking (CT), Omnata can use CT to replicate data from your source database. Change Tracking is a feature that tracks changes to a table and makes the details of the changes available in change tables. Omnata reads the change tables to replicate the changes to your destination database. Omnata uses the CT instance to capture changes to the source database and replicate them to the destination database and then extracts any records not currently found in your CT instance from the source table.
Full Refresh / Incremental (High Watermark)
If you have Tables which are not CDC enabled or Views you need to sync, Omnata can use Highwater Marks to replicate your data from your source database. This method is less efficient than CDC and only captures current state each run, therefore the only history available is from the last sync.
If a table or view don't have only of the following field types available date
, datetime
, datetime2
, smalldatetime
, datetimeoffset
or rowversion
field available, then only a full refresh
option will be available.
Composite keys
Omnata supports composite keys for high watermarking.
Managing Performance
SQL Server workload
We generally want to optimize our syncs so that they run in shortest timeframe possible. However, we need to be mindful of the load these routines place on our SQL Server instances. When you run a sync for the first time the Plugin is requesting all the historical data for the Streams (Tables) and at this time the heaviest load will be placed on your SQL Server instance. Subsequent loads will be a much lighter touch for CDC and Increment High Watermark strategies. Full refresh strategies will place a similar load on the Server for each run of the sync, use full refresh strategy carefully.
Concurrency settings
There are several levers the user can set to optimize performance. Each SQL Server database instance is unique in it's setup, CPU cores, Memory, Disk configuration and Network bandwidth and Server configurations all play a part in how much load we place on the server and the performance we can achieve.
Below are the three tuning levers you can use to tune the initial historic backfill of data.
Maximum Concurrent Streams (MCS)
This setting represents the how many Streams (Tables) that the Plugin will query concurrently.
Maximum Concurrent Queries (MCQ)
This setting represents how many queries per Stream (Table) that the plugin will query concurrently.
Maximum Records Per Query (MRQ)
This setting represents the maximum number of rows that each query will request.
Example
Sync has 16 Streams (Tables) Each Table has 5 Million rows MCS = 8 MCQ = 8 MRQ = 1,000,000.
In this example, we have 80 millions rows in total to acquire from our SQL Server Database. In this example that will be 80 queries in total for the sync of 1,000,000 rows per query. The maximum number of queries the Plugin will concurrently send to the Server is 64 (MCS x MRQ), each one of those queries is ask for 1 million (MRQ) rows. Once a query is processed another will be sent to the server until all the queries have been serviced by the server but 64 is the maximum concurrent queries at any one time.
Note: The above is only true if each of those Tables has a Clustered Index, Heap tables (Tables with no Clustered Index) will ignore MCQ and MRQ (but not MCS). Take care when acquiring data from Heaped Tables as there is limited concurrency control.
Snowflake warehouse settings
We have designed Omnata to be able to run on a single cluster XS (Extra Small) Warehouse. However, during these initial sync runs, we are requesting, receiving and processing a lot of data rows. With a single cluster XS Warehouse you will likely see query queueing on the Snowflake Warehouse. In order to reduce this queueing, you can temporarily set the Warehouse to be multi-clustered, this will reduce queueing and the overall duration of these initial sync runs. We recommend you set the Scaling policy to 'Standard', not 'Economy'. In our testing, a Warehouse set to 'Economy' doesn't scale out with this workload. Once the initial sync has run, you can return the Warehouse to a single cluster. We advise you continue to monitor the query queueing events and adjust your Snowflake Warehouse as needed.
Outbound Syncs
Omanta currently does not support outbound syncs for SQL Server databases.
Functions
To enable you to query your SQL Server database directly from your destination database, Omnata provides a set of UDFs that you can use to query your SQL Server database. These UDFs are available in the OMNATA_SQL_SERVER_PLUGIN
database and the UDFS
schema.
SQL_QUERY
Executes a SQL query on the source database and returns the result as a table.
Parameters:
CONNECTION_SLUG (VARCHAR): The slug of the connection to query
DATABASE_NAME (VARCHAR): The name of the database to query
QUERY (VARCHAR): The SQL query
Examples
SQL_QUERY_FILTER
Executes a SQL query (with a filter applied) on the source database and returns the result as a table. This has been designed for use with Lambda
views.
Parameters:
CONNECTION_SLUG (VARCHAR): The slug of the connection to query
DATABASE_NAME (VARCHAR): The name of the database to query
QUERY (VARCHAR): The SQL query
COLUMN_NAME (VARCHAR): The name of the column to use in the where clause
OPERATOR (VARCHAR): The operator for the where clause
COLUMN_VALUE (VARCHAR): The value to use in the where clause (a string must be enclosed in quotes)
Examples
Managing CDC on SQL Server
If you don't already have CDC enabled on the SQL Server instance you want to use with Omnata, then follow the Microsoft documentation.
Grant read access to CDC tables
You will need a database role in the database you are syncing from, that role will need the following permissions for each of the objects that the sync will read from.
Create a new CDC reader role for Omnata
Below is a script that will create a role (cdc_reader) add the user (omnata_reader) and add the grant statements for each CDC enabled table in your SQL Server instance. It will also update the role_name field in the cdc.change_tables table. Use this at your own risk, if you are unsure contact your DBA.
Last updated