Infor Data Lake
Last updated
Last updated
The Infor Data Lake is a cloud-based data repository provided by Infor, designed to store and manage large volumes of structured and unstructured data. It is part of the Infor OS (Operating Service) platform, which offers a suite of tools and services for data integration, analytics, and business intelligence.
The Omnata Infor Datalake Plugin provides an alternative approach to extract data from the Infor Data Lake using the Infor Compass JDBC Driver (BYO). Data is only loaded into the Infor Data Lake every 10 minutes, so this plugin is useful for near real-time data extraction where as the Infor Streaming from the data lake could be more suitable for real-time data extraction but has limitations and doesn't handle things like schema drift which this plugin does.
Ensure the following file have been published to the datalake for each environment you are connecting to:
CFIHEA (File header information)
CFIFFD (File field information)
CFIACP (Access path information)
With a customers licensing arrangement with Infor, a customer is required to upload their own version of the infor-compass-jdbc-2023.10.jar
for use within the plugin by uploading the file into a stage within Snowflake.
Login to Infor OS and start the ION application.
Expand the ION menu and select Configuration > Downloads
Select the Compass JDBC driver download. (For details, see the )
Upload the infor-compass-jdbc-2023.10.jar
to an internal stage within your snowflake instance
Run the following command (ensuring you specify the stage you have uploaded the jar file into):
The Infor Compass JDBC Driver requires the use of ION API Security Credentials to authenticate with the Infor Data Lake. The credentials are downloaded through the Infor OS Portal or Infor Ming.le. The role of IONAPI-Administrator is required to setup security credentials. For more information, see the Infor OS documentation.
Generating ION API Security Credentials
Follow these steps to download the ION API Security Credentials:
Log on to Infor OS Portal or Infor Ming.le.
From the application tray, select the API Gateway application.
Select Authorized Apps.
Add a new non-Infor authorized application.
Specify this information:
Name : Infor Compass JDBC Driver
(The name must be exactly as shown)
Description: Infor Compass JDBC Driver
(The description must be exactly as shown)
Select the backend service type.
Save the app.
From the Authorized app menu, navigate to the new Infor Compass JDBC driver application.
Click Download.
Select Create Service Account.
In the User Name field, specify the first few characters of your user name. Then select your username from the type-ahead list.
Click Download.
Open the downloaded file and copy the contents into the Omnata Connection Wizard
The Omnata Infor Datalake Plugin provides a set of inbound syncs that can be used to load data from data lake into Snowflake. These syncs can be scheduled to run at regular intervals to keep the data up-to-date.
It is recommended that all syncs are Incremental
syncs to reduce load on the Infor Data Lake. After the initial run, the plugin will use the infor.lastModified()
field to determine which records to load. The plugin will only load records that have been modified since the last sync run.
Using the APPEND
strategy will run the Variations to Retrieve
specified in the sync. Using the MERGE
strategy will enforce the HighestVariation
strategy and only keep the latest version of a record in the destination table.
In addition to the fields of the table, the following fields are also retreived:
infor.lastModified() - Representing the timestamp in which the data object was added to Data Lake.
infor.Dataobjectid() - Representing the unique identifier of Data Lake data object in which the record is stored.
infor.DataObjectSeqId() - Representing the record or line number associated with the record location in the data object.
infor.Partition() - Representing the partition in which the data object is stored.
There are several levers the user can set to optimize performance. The following should be viewed as general guidance. Depending on the size of the data and the performance of the Infor Data Lake, these settings may need to be adjusted.
The plugin has the ability to adjust the size and concurrency of queries being sent to the source database. The following are available settings in the Omnata UI:
Maximum Concurrent Streams (MCS) This setting represents the how many Streams (Tables) that the Plugin will query concurrently.
Maximum Concurrent Queries (MCQ) This setting represents how many queries per Stream (Table) that the plugin will query concurrently.
Maximum Queries Per Stream Per Sync (MQS) This setting represents how may queries per Stream (Table) that the plugin will send to the server during a sync.
Maximum Records Per Query (MRQ) This setting represents the maximum number of rows that each query will request.
Usage Example
Sync has 16 Streams (Tables) Each Table has 5 Million rows MCS = 8 MCQ = 8 MRQ = 1,000,000.
In this example we have 80 millions rows in total to acquire from the DataLake. In this example that will be 80 queries in total for the sync of 1,000,000 rows per query. The maximum number of queries the Plugin will concurrently send to the Data Lake is 64 (MCS x MRQ), each one of those queries is ask upto 1 million (MRQ) rows. Once a query is processed another will be sent to the server until all the queries have been serviced by the server but 64 is the maximum concurrent queries at any one time.
The initial data load will be the same as the above example but the plugin will not stop until all the data has been loaded. The plugin will continue to send queries to the server until all the data has been loaded. When determining the number of queries to send to the server the plugin will use the following formula:
Determine the number of records (based on timestamp) that are available for a given table grouped by year
Determine the number of records (based on timestamp) that are available for a given table grouped by year and month for the years where the number of records is greater than the specified MRQ
Determine the number of records (based on timestamp) that are available for a given table grouped by year, month and day for the years and months where the number of records is greater than the specified MRQ
Determine the number of records (based on timestamp) that are available for a given table grouped by year, month, day and hour for the years, months and days where the number of records is greater than the specified MRQ
Omnata typically runs sync workloads on a single cluster XS warehouse. For this endpoint, large initial sync runs can often lead to query queueing. If you see queuing, you will be able to reduce sync times by adding multi-clustering. We recommend setting the Scaling policy to 'Standard', not 'Economy'. After the initial sync, you can return the Warehouse to a single cluster for the incremental syncs. We advise you to monitor the query queueing events and adjust your Snowflake Warehouse as needed.
Alternatively, you can opt for a Serverless Warehouse, this will scale out as needed and you will not see queueing events. However, keep in mind you have less control over costs when using Serverless compute.
Omnata provides a set of UDFs that you can use to query the Infor Datalake. These UDFs are available in the OMNATA_INFOR_DATA_LAKE_PLUGIN
database and the UDFS
schema.
The INFOR_DIRECT_QUERY
function allows you to query the Infor Datalake directly from your destination database. The function takes the following parameters:
CONNECTION_SLUG (VARCHAR): The slug of the connection to query
QUERY (VARCHAR): The SQL query
Examples
The INFOR_TABLE_STATS
function allows you to get stats out of the Infor Datalake for a given table. The function takes the following parameters:
CONNECTION_SLUG (VARCHAR): The slug of the connection to query
TABLE_NAME (VARCHAR): The name of the table to get statistics for
COMPANIES (ARRAY): The List of Companies to get statistics for
FROM_TIMESTAMP (STRING): The date from which to get statistics since based on a records timestamp
ADDITIONAL_WHERE_CLAUSE (STRING): Additional where clause to filter the statistics
Examples
The INFOR_TABLE_DISTINCT_STATS
function allows you to get stats out of the Infor Data Lake for a given table and displays the number of distinct records as duplicates could exist within the DataLake.
The function takes the following parameters:
CONNECTION_SLUG (VARCHAR): The slug of the connection to query
TABLE_NAME (VARCHAR): The name of the table to get statistics for
PRIMARY_KEYS (ARRAY): The List of Primary Keys to get statistics for
COMPANIES (ARRAY): The List of Companies to get statistics for
FROM_TIMESTAMP (STRING): The date from which to get statistics since based on a records timestamp
ADDITIONAL_WHERE_CLAUSE (STRING): Additional where clause to filter the statistics
Examples
The INFOR_TABLE_ROW_STATS
function allows you to get stats out of the Infor Data Lake for a given table group based on the columns passed in. The function takes the following parameters:
CONNECTION_SLUG (VARCHAR): The slug of the connection to query
TABLE_NAME (VARCHAR): The name of the table to get statistics for
PRIMARY_KEYS (ARRAY): The List of Primary Keys to get statistics for
COMPANIES (ARRAY): The List of Companies to get statistics for
FROM_TIMESTAMP (STRING): The date from which to get statistics since based on a records timestamp
ADDITIONAL_WHERE_CLAUSE (STRING): Additional where clause to filter the statistics
Examples
It is recommended that you load your data into a so that you can retain ownership of the database and schemas. The following SQL outlines how to grant permissions to the Omnata application.
Initial syncs can often run for longer than the of 4 hours, and your Snowflake account or warehouse level (s). We recommend setting these task timeout to a value greater than the expected duration of the sync. You will need to adjust this value based on the size of the data being loaded. As a rough benchmark, a table with 1 billion records may take more than 6 hours to sync.
Due to the volume of data being transitioned through this plugin from the Infor Datalake, we recommend to 0 days. This will reduce overall sync times, however, you will no longer be able to view a record-level sync activity log.