Microsoft SQL Server
About
The SQL Server plugin is in public preview on the Snowflake Marketplace with a 7-day free trial. Please contact us to discuss any extra requirements.
Omnata's integration service replicates data from your SQL Server source database and loads it into your destination at regular intervals as well as providing direct query capabilities to your SQL Server database.
Architecture
Omnata's Snowflake Native App can connect to your database in two ways; using built-in secure tunnelling or Privatelink (only available on Snowflake Business Critical Edition).
The Omnata tunnelling uses an SSH Tunnel which is provided by our OEM partner ngrok. Omnata establishes a secure TLS tunnel to the ngrok server and then connects to the SQL Server database through a tunnel established from your . This ensures that your database is not exposed to the internet and that your data is secure.
Supported Services
Generic SQL Server
Azure SQL Database
Azure SQL Managed Instance
Amazon SQL Server RDS
Compatibility Levels
The SQL Server plugin it supported for SQL Server databases that have a compatibility level of 100 or higher.
Authentication
SQL Server Authentication
Omnata supports SQL Server Authentication. When you create a connection, you will need to provide the username and password for the SQL Server user you want to use to connect to the database.
If connecting to Azure SQL Database, you will need to provide the username@servername
for the username instead of just the username
.
Configuration
SQL Server Configuration
No additional configuration is required for SQL Server databases unless you are wanting to use Change Data Capture (CDC).
Azure SQL Database Configuration
Private Access - Create a new private endpoint for the Azure SQL Database
Connectivity - Update the connection policy to
Default
orProxy
to allow access from the private endpoint
Tunnels Configuration
The Omnata SQL Server plugin uses ngrok to establish a secure tunnel to your SQL Server database. This ensures that your database is not exposed to the internet and that your data is secure.
In order to establish a tunnel to your SQL Server database, you will need to deploy the ngrok agent on your network which has access to your SQL Server. The ngrok agent will establish a secure TLS tunnel to the ngrok server and then connect to your SQL Server database through a tunnel established from your SQL Server host machine to the ngrok server.
ngrok Agent Download
For MacOS, use Homebrew:
For Linux, use Apt:
For Windows, use Chocolatey (You'll need to run this in an Administrator Command Prompt):
If you don't have one of these package managers installed or prefer to install the ngrok agent yourself, you can download the ngrok agent from the Ngrok website and install it on your network.
ngrok Agent Configuration
You can run the ngrok agent either as a standalone process or as a service. To run the ngrok agent as a standalone process, follow the steps provided in the connection setup which will look like this:
To configure the ngrok agent to run as a service, follow these steps:
Locate the configuration directory for the ngrok agent.
For MacOS, the configuration directory is
~/.config/ngrok
For Linux, the configuration directory is
~/Library/Application Support/ngrok
For Windows, the configuration directory is
%HOMEPATH%\AppData\Local\ngrok
In the directory, create a configuration file (called
omnata.yml
) which will be used by ngrok agent. The configuration file should contain the following information:
replacing the placeholders with the appropriate values. If you run the agent on the same server as the SQL Server, you can just the port number. If you run the agent on a different server, you will need to use the hostname of the server where the SQL Server is running as well as the port.
Install the ngrok agent as a service by running the following command:
Start the ngrok service by running the following command:
Troubleshooting Connectivity Issues
The following are common issues and solutions when connecting.
Message | Action |
---|---|
Cannot open server "xxxxxxx.ngrok.app" requested by the login. The login failed | Please enter your password again in the connection setup. If you are using SQL Azure Database, please enter the |
The TCP/IP connection to the host xxxxxxx.ngrok.app, port 443 has failed. | Please ensure that the ngrok agent is running and that it can see the SQL Server. |
The TCP/IP connection to the host xxxxxx-a.worker.database.windows.net (redirected from xxxxxxx.ngrok.app), port 11063 has failed. |
|
Privatelink connections
Public docs to follow...
Connections via Privatelink for AWS and Azure are currently in Private Preview (17-Oct-2024). Snowflake Business Critical Edition is required. Please get in touch if you would like to test this feature.
Inbound Syncs
Omnata connects to your database using SQL Server's default transaction isolation level, Read Committed. This ensures that the data you are reading is not locked by other transactions. Omnata uses the following methods to replicate data from your SQL Server source database:
Change Data Capture (CDC)
Change Tracking (CT)
Highwater Mark
Change Data Capture (CDC)
If you have a SQL Server database that supports Change Data Capture (CDC), Omnata can use CDC to replicate data from your source database. CDC is a feature that captures insert, update, and delete operations that are performed on a table and makes the details of the changes available in change tables. Omnata reads the change tables to replicate the changes to your destination database. Omnata uses the CDC instance to capture changes to the source database and replicate them to the destination database and then extracts any records not currently found in your CDC instance from the source table.
Using the APPEND
strategy will run the get_all_changes
function to get all changes from the CDC instance and then append them to the destination table. This is the default merge strategy for CDC syncs. Using the MERGE
strategy will run the get_net_changes
function to get net changes from the CDC instance and then merge them into the destination table. This merge strategy is useful when you only need to keep the latest version of a record in the destination table.
Change Tracking (CT)
If you have a SQL Server database that supports Change Tracking (CT), Omnata can use CT to replicate data from your source database. Change Tracking is a feature that tracks changes to a table and makes the details of the changes available in change tables. Omnata reads the change tables to replicate the changes to your destination database. Omnata uses the CT instance to capture changes to the source database and replicate them to the destination database and then extracts any records not currently found in your CT instance from the source table.
Full Refresh / Incremental (High Watermark)
If you have Tables which are not CDC enabled or Views you need to sync, Omnata can use Highwater Marks to replicate your data from your source database. This method is less efficient than CDC and only captures current state each run, therefore the only history available is from the last sync.
Omnata recommends the use of High Watermark method to sync only tables that do not have a CDC instance available. This method is less efficient than CDC and only captures current state each run, therefore the only history available is from the last sync.
If a table or view don't have only of the following field types available date
, datetime
, datetime2
, smalldatetime
, datetimeoffset
or rowversion
field available, then only a full refresh
option will be available.
Composite keys
Omnata supports composite keys for high watermarking.
Managing Performance
The following should be viewed as general guidance.
SQL Server workload
We generally want to optimize our syncs so that they run in shortest timeframe possible. However, we need to be mindful of the load these routines place on our SQL Server instances. When you run a sync for the first time the Plugin is requesting all the historical data for the Streams (Tables) and at this time the heaviest load will be placed on your SQL Server instance. Subsequent loads will be a much lighter touch for CDC and Increment High Watermark strategies. Full refresh strategies will place a similar load on the Server for each run of the sync, use full refresh strategy carefully.
Concurrency settings
There are several levers the user can set to optimize performance. Each SQL Server database instance is unique in it's setup, CPU cores, Memory, Disk configuration and Network bandwidth and Server configurations all play a part in how much load we place on the server and the performance we can achieve.
Below are the three tuning levers you can use to tune the initial historic backfill of data.
Maximum Concurrent Streams (MCS)
This setting represents the how many Streams (Tables) that the Plugin will query concurrently.
Maximum Concurrent Queries (MCQ)
This setting represents how many queries per Stream (Table) that the plugin will query concurrently.
Maximum Records Per Query (MRQ)
This setting represents the maximum number of rows that each query will request.
Example
Sync has 16 Streams (Tables) Each Table has 5 Million rows MCS = 8 MCQ = 8 MRQ = 1,000,000.
In this example, we have 80 millions rows in total to acquire from our SQL Server Database. In this example that will be 80 queries in total for the sync of 1,000,000 rows per query. The maximum number of queries the Plugin will concurrently send to the Server is 64 (MCS x MRQ), each one of those queries is ask for 1 million (MRQ) rows. Once a query is processed another will be sent to the server until all the queries have been serviced by the server but 64 is the maximum concurrent queries at any one time.
Note: The above is only true if each of those Tables has a Clustered Index, Heap tables (Tables with no Clustered Index) will ignore MCQ and MRQ (but not MCS). Take care when acquiring data from Heaped Tables as there is limited concurrency control.
Snowflake warehouse settings
We have designed Omnata to be able to run on a single cluster XS (Extra Small) Warehouse. However, during these initial sync runs, we are requesting, receiving and processing a lot of data rows. With a single cluster XS Warehouse you will likely see query queueing on the Snowflake Warehouse. In order to reduce this queueing, you can temporarily set the Warehouse to be multi-clustered, this will reduce queueing and the overall duration of these initial sync runs. We recommend you set the Scaling policy to 'Standard', not 'Economy'. In our testing, a Warehouse set to 'Economy' doesn't scale out with this workload. Once the initial sync has run, you can return the Warehouse to a single cluster. We advise you continue to monitor the query queueing events and adjust your Snowflake Warehouse as needed.
Outbound Syncs
Omanta currently does not support outbound syncs for SQL Server databases.
We have a outbound syncs in private preview, please get in touch to try an early version of this feature.
Functions
To enable you to query your SQL Server database directly from your destination database, Omnata provides a set of UDFs that you can use to query your SQL Server database. These UDFs are available in the OMNATA_SQL_SERVER_PLUGIN
database and the UDFS
schema.
SQL_QUERY
Executes a SQL query on the source database and returns the result as a table.
Parameters:
CONNECTION_SLUG (VARCHAR): The slug of the connection to query
DATABASE_NAME (VARCHAR): The name of the database to query
QUERY (VARCHAR): The SQL query
Examples
SQL_QUERY_FILTER
Executes a SQL query (with a filter applied) on the source database and returns the result as a table. This has been designed for use with Lambda
views.
Parameters:
CONNECTION_SLUG (VARCHAR): The slug of the connection to query
DATABASE_NAME (VARCHAR): The name of the database to query
QUERY (VARCHAR): The SQL query
COLUMN_NAME (VARCHAR): The name of the column to use in the where clause
OPERATOR (VARCHAR): The operator for the where clause
COLUMN_VALUE (VARCHAR): The value to use in the where clause (a string must be enclosed in quotes)
Examples
Managing CDC on SQL Server
If you don't already have CDC enabled on the SQL Server instance you want to use with Omnata, then follow the Microsoft documentation.
Grant read access to CDC tables
You will need a database role in the database you are syncing from, that role will need the following permissions for each of the objects that the sync will read from.
Create a new CDC reader role for Omnata
Below is a script that will create a role (cdc_reader) add the user (omnata_reader) and add the grant statements for each CDC enabled table in your SQL Server instance. It will also update the role_name field in the cdc.change_tables table. Use this at your own risk, if you are unsure contact your DBA.
Last updated