PostgreSQL
PostgreSQL also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance.
Omnata's integration service replicates data from your PostgreSQL source database and loads it into your destination at regular intervals as well as providing direct query capabilities.
The PostgreSQL plugin is available on the Snowflake Marketplace with a 7-day free trial. Please contact us to discuss any extra POC requirements.
Connectivity
Omnata's Snowflake Native App can connect to your database in three ways; using built-in secure tunnelling, Privatelink (only available on Snowflake Business Critical Edition), or directly if the database network is publicly accessible.
Private link
To use AWS PrivateLink or Azure Private link:
You must have Snowflake Business Critical Edition
Your Snowflake account must be using the same cloud provider as the one you are connecting to. For example, to connect to an AWS Database instance via Privatelink, your Snowflake account must be on AWS.
Omnata tunnels
The Omnata tunnelling uses an SSH Tunnel which is provided by our OEM partner ngrok. Omnata establishes a secure TLS tunnel to the ngrok server and then connects to the SQL Server database through a tunnel established from your Snowflake account. This ensures that your database is not exposed to the internet and that your data is secure.
Supported Services
Generic PostgreSQL
Azure Database for PostgreSQL
Amazon RDS for PostgreSQL
Compatibility Levels
The plugin supports database versions 9.4 and above.
Authentication
PostgreSQL Authentication
Omnata supports PostgreSQL Authentication. When you create a connection, you will need to provide the username and password for the PostgreSQL user you want to use to connect to the database.
Microsoft Entra Service Principal (Only valid Azure Database for PostgreSQL)
Omnata supports Microsoft Entra Service Principal authentication. When you create a connection, you will need to provide the tenant_id
, client_id
and client_secret
for the Microsoft Entra Service Principal you want to use to connect to the database.
Configuration
PostgreSQL Configuration
No additional configuration is required for PostgreSQL databases at this time.
Azure Database for PostgreSQL Configuration
Private Access - Create a new private endpoint for the Azure SQL Database
Connectivity - Update the connection policy to
Default
orProxy
to allow access from the private endpoint
Tunnels Configuration
If you choose Omnata-ngrok tunnel as the connectivity method, the Omnata Sync Engine will guide you through the steps to establish a secure tunnel to your PostgreSQL database using ngrok.
You do not need to sign up for an ngrok account. We provide ngrok as an embedded OEM product.
In order to establish a tunnel to your PostgreSQL database, you will need to deploy the ngrok agent on your network which has access to your PostgreSQL. The ngrok agent will establish a secure TLS tunnel to the ngrok server and then connect to your PostgreSQL database through a tunnel established from your PostgreSQL host machine to the ngrok server.
ngrok Agent Download For MacOS, use Homebrew:
For Linux, use Apt:
For Windows, use Chocolatey (You'll need to run this in an Administrator Command Prompt):
If you don't have one of these package managers installed or prefer to install the ngrok agent yourself, you can download the ngrok agent from the Ngrok website and install it on your network.
ngrok Agent Configuration
You can run the ngrok agent either as a standalone process or as a service. To run the ngrok agent as a standalone process, follow the steps provided in the connection setup which will look like this:
To configure the ngrok agent to run as a service, follow these steps:
Locate the configuration directory for the ngrok agent.
For MacOS, the configuration directory is
~/.config/ngrok
For Linux, the configuration directory is
~/Library/Application Support/ngrok
For Windows, the configuration directory is
%HOMEPATH%\AppData\Local\ngrok
In the directory, create a configuration file (called
omnata.yml
) which will be used by ngrok agent. The configuration file should contain the following information:
replacing the placeholders with the appropriate values. If you run the agent on the same server as the PostgreSQL, you can just the port number. If you run the agent on a different server, you will need to use the hostname of the server where the PostgreSQL is running as well as the port.
Install the ngrok agent as a service by running the following command:
Start the ngrok service by running the following command:
Private Link Configuration
If you are using Snowflake Business Critical you can use Private Link to connect to your PostgreSQL database directly from Snowflake. To do this, you will need to ensure you have provisioned a Private Link endpoint (and approved the connection) for your PostgreSQL database.
Connectivity Issues
If you are experiencing connectivity issues such as:
Cannot open server "xxxxxxx.ngrok.app" requested by the login. The login failed
Please enter your password again in the connection setup.
The TCP/IP connection to the host xxxxxxx.ngrok.app, port 443 has failed.
Please ensure that the ngrok agent is running and that it can see the PostgrePostgreSQL.
The TCP/IP connection to the host xxxxxx-a.worker.database.windows.net (redirected from xxxxxxx.ngrok.app), port 11063 has failed.
Azure Database for PostgreSQL
: You need to modify the connection policy to Default
or Proxy
to allow access from the private endpoint.
Inbound Syncs
Omnata connects to your database using PostgreSQL's default transaction isolation level, Read Committed. This ensures that the data you are reading is not locked by other transactions. However you can change the isolation level to Read Uncommitted or Repeatable Read if required.
Omnata uses the following methods to replicate data from your PostgreSQL source database:
High Watermark (using XMIN)
Full Refresh
Managing performance
Setting Concurrency Levels.
There are several levers the user can set to optimize performance. The following should be viewed as general guidance. Each PostgreSQL database instance is unique in it's setup, CPU cores, Memory, Disk configuration and Network bandwidth and Server configurations all play a part in how much load we place on the server and the performance we can achieve.
Controlling the load on both Snowflake and PostgreSQL
We generally want to optimize our syncs so that they run in shortest timeframe possible. However, we also need to be mindful of the load these routines place on our PostgreSQL instances. When you run a sync for the first time the Plugin is requesting all the historical data for the Streams (Tables) and at this time the heaviest load will be placed on your PostgreSQL instance. Subsequent loads will be a much lighter touch for cdc and increment high watermark strategies. Full refresh strategies will place a similar load on the Server for each run of the sync, use full refresh strategy carefully. Below are the 3 tuning levers users can use to tune the initial historic backfill of data.
Maximum Concurrent Streams (MCS)
This setting represents the how many Streams (Tables) that the Plugin will query concurrently.
Maximum Concurrent Queries (MCQ)
This setting represents how many queries per Stream (Table) that the plugin will query concurrently.
Maximum Records Per Query (MRQ)
This setting represents the maximum number of rows that each query will request.
Example.
Sync has 16 Streams (Tables) Each Table has 5 Million rows MCS = 8 MCQ = 8 MRQ = 1,000,000.
In this example we have 80 millions rows in total to acquire from our PostgreSQL Database. In this example that will be 80 queries in total for the sync of 1,000,000 rows per query. The maximum number of queries the Plugin will concurrently send to the Server is 64 (MCS x MRQ), each one of those queries is ask for 1 million (MRQ) rows. Once a query is processed another will be sent to the server until all the queries have been serviced by the server but 64 is the maximum concurrent queries at any one time.
Outbound Syncs
Outbound syncs for PostgreSQL databases are not yet supported. Contact us to discuss a private preview of this feature.
Functions
SQL_QUERY
Executes a SQL query on the source database and returns the result as a table.
Parameters:
CONNECTION_SLUG (VARCHAR): The slug of the connection to query
QUERY (VARCHAR): The SQL query
Examples
SQL_QUERY_FILTER
Executes a SQL query on the source database and returns the result as a table.
Parameters:
CONNECTION_SLUG (VARCHAR): The slug of the connection to query
QUERY (VARCHAR): The SQL query
COLUMN_NAME (VARCHAR): The name of the column to use in the where clause
OPERATOR (VARCHAR): The operator for the where clause
COLUMN_VALUE (VARCHAR): The value to use in the where clause (a string must be enclosed in quotes)
Examples
Last updated