Infor Data Lake

About this plugin

The Infor Data Lake is a cloud-based data repository provided by Infor, designed to store and manage large volumes of structured and unstructured data. It is part of the Infor OS (Operating Service) platform, which offers a suite of tools and services for data integration, analytics, and business intelligence.

The Omnata Infor Datalake Plugin provides an alternative approach to extract data from the Infor Data Lake using the Infor Compass JDBC Driver (BYO). Data is only loaded into the Infor Data Lake every 10 minutes, so this plugin is useful for near real-time data extraction where as the Infor Streaming from the data lake could be more suitable for real-time data extraction but has limitations and doesn't handle things like schema drift which this plugin does.

Prerequisites

Ensure the following file have been published to the datalake for each environment you are connecting to:

CFIHEA (File header information)
CFIFFD (File field information)
CFIACP (Access path information)

Compass JDBC Driver

With a customers licensing arrangement with Infor, a customer is required to upload their own version of the infor-compass-jdbc-2023.10.jar for use within the plugin by uploading the file into a stage within Snowflake.

ION download

Login to Infor OS and start the ION application.
Expand the ION menu and select Configuration > Downloads
Select the Compass JDBC driver download. (For details, see the Infor ION User Guide)

Installing the driver into Snowflake

Upload the infor-compass-jdbc-2023.10.jar to an internal stage within your snowflake instance
Run the following command (ensuring you specify the stage you have uploaded the jar file into):

COPY FILES INTO @OMNATA_INFOR_DATA_LAKE_PLUGIN.CUSTOM_PACKAGES.PACKAGES
FROM @<<DATABASE_NAME>>.<<SCHEMA_NAME>>.<<STAGE_NAME>>;

Authentication

ION API Security Credentials

The Infor Compass JDBC Driver requires the use of ION API Security Credentials to authenticate with the Infor Data Lake. The credentials are downloaded through the Infor OS Portal or Infor Ming.le. The role of IONAPI-Administrator is required to setup security credentials. For more information, see the Infor OS documentation.

Generating ION API Security Credentials

Follow these steps to download the ION API Security Credentials:

Log on to Infor OS Portal or Infor Ming.le.
From the application tray, select the API Gateway application.
Select Authorized Apps.
Add a new non-Infor authorized application.
Specify this information:

Name : Infor Compass JDBC Driver (The name must be exactly as shown)
Description: Infor Compass JDBC Driver (The description must be exactly as shown)

Select the backend service type.
Save the app.
From the Authorized app menu, navigate to the new Infor Compass JDBC driver application.
Click Download.

Caution: The Download Credentials dialog box is displayed. Do not click Download yet.

Select Create Service Account.
In the User Name field, specify the first few characters of your user name. Then select your username from the type-ahead list.
Click Download.
Open the downloaded file and copy the contents into the Omnata Connection Wizard

Inbound Syncs

The Omnata Infor Datalake Plugin provides a set of inbound syncs that can be used to load data from data lake into Snowflake. These syncs can be scheduled to run at regular intervals to keep the data up-to-date.

It is recommended that all syncs are Incremental syncs to reduce load on the Infor Data Lake. After the initial run, the plugin will use the infor.lastModified() field to determine which records to load. The plugin will only load records that have been modified since the last sync run.

Storage Strategy

Using the APPEND strategy will run the Variations to Retrieve specified in the sync. Using the MERGE strategy will enforce the HighestVariation strategy and only keep the latest version of a record in the destination table.

Additional Fields Retrieved

In addition to the fields of the table, the following fields are also retreived:

infor.lastModified() - Representing the timestamp in which the data object was added to Data Lake.
infor.Dataobjectid() - Representing the unique identifier of Data Lake data object in which the record is stored.
infor.DataObjectSeqId() - Representing the record or line number associated with the record location in the data object.
infor.Partition() - Representing the partition in which the data object is stored.

Customizing the Omnata storage location

It is recommended that you load your data into a custom storage location so that you can retain ownership of the database and schemas. The following SQL outlines how to grant permissions to the Omnata application.

grant usage on database <<Your Database>> to application OMNATA_SYNC_ENGINE;
grant usage on schema <<Your Database>>.<<Schema for Raw data>> to application OMNATA_SYNC_ENGINE;
grant usage on schema <<Your Database>>.<<Schema for views>> to application OMNATA_SYNC_ENGINE;
grant create view on schema <<Your Database>>.<<Schema for views>> to application OMNATA_SYNC_ENGINE;
grant create table on schema <<Your Database>>.<<Schema for Raw data>> to application OMNATA_SYNC_ENGINE;

Managing performance

There are several levers the user can set to optimize performance. The following should be viewed as general guidance. Depending on the size of the data and the performance of the Infor Data Lake, these settings may need to be adjusted.

Concurrency Level Settings

The plugin has the ability to adjust the size and concurrency of queries being sent to the source database. The following are available settings in the Omnata UI:

Maximum Concurrent Streams (MCS) This setting represents the how many Streams (Tables) that the Plugin will query concurrently.

Maximum Concurrent Queries (MCQ) This setting represents how many queries per Stream (Table) that the plugin will query concurrently.

Maximum Queries Per Stream Per Sync (MQS) This setting represents how may queries per Stream (Table) that the plugin will send to the server during a sync.

Maximum Records Per Query (MRQ) This setting represents the maximum number of rows that each query will request.

Usage Example

Sync has 16 Streams (Tables) Each Table has 5 Million rows MCS = 8 MCQ = 8 MRQ = 1,000,000.

In this example we have 80 millions rows in total to acquire from the DataLake. In this example that will be 80 queries in total for the sync of 1,000,000 rows per query. The maximum number of queries the Plugin will concurrently send to the Data Lake is 64 (MCS x MRQ), each one of those queries is ask upto 1 million (MRQ) rows. Once a query is processed another will be sent to the server until all the queries have been serviced by the server but 64 is the maximum concurrent queries at any one time.

Initial Data load

The initial data load will be the same as the above example but the plugin will not stop until all the data has been loaded. The plugin will continue to send queries to the server until all the data has been loaded. When determining the number of queries to send to the server the plugin will use the following formula:

Determine the number of records (based on timestamp) that are available for a given table grouped by year
Determine the number of records (based on timestamp) that are available for a given table grouped by year and month for the years where the number of records is greater than the specified MRQ
Determine the number of records (based on timestamp) that are available for a given table grouped by year, month and day for the years and months where the number of records is greater than the specified MRQ
Determine the number of records (based on timestamp) that are available for a given table grouped by year, month, day and hour for the years, months and days where the number of records is greater than the specified MRQ

Snowflake warehouse size

Omnata typically runs sync workloads on a single cluster XS warehouse. For this endpoint, large initial sync runs can often lead to query queueing. If you see queuing, you will be able to reduce sync times by adding multi-clustering. We recommend setting the Scaling policy to 'Standard', not 'Economy'. After the initial sync, you can return the Warehouse to a single cluster for the incremental syncs. We advise you to monitor the query queueing events and adjust your Snowflake Warehouse as needed.

Alternatively, you can opt for a Serverless Warehouse, this will scale out as needed and you will not see queueing events. However, keep in mind you have less control over costs when using Serverless compute.

Task timeout

Initial syncs can often run for longer than the default Omnata timeout of 4 hours, and your Snowflake account or warehouse level query timeout(s). We recommend setting these task timeout to a value greater than the expected duration of the sync. You will need to adjust this value based on the size of the data being loaded. As a rough benchmark, a table with 1 billion records may take more than 6 hours to sync.

Record History Retention

Due to the volume of data being transitioned through this plugin from the Infor Datalake, we recommend setting the retention period to 0 days. This will reduce overall sync times, however, you will no longer be able to view a record-level sync activity log.

Functions

Omnata provides a set of UDFs that you can use to query the Infor Datalake. These UDFs are available in the OMNATA_INFOR_DATA_LAKE_PLUGIN database and the UDFS schema.

INFOR_DIRECT_QUERY

The INFOR_DIRECT_QUERY function allows you to query the Infor Datalake directly from your destination database. The function takes the following parameters:

CONNECTION_SLUG (VARCHAR): The slug of the connection to query
QUERY (VARCHAR): The SQL query

Examples

select *
from table(OMNATA_INFOR_DATA_LAKE_PLUGIN.UDFS.INFOR_DIRECT_QUERY(
    'my-infor-connection-slug',
    'select * from infor.include(''MITMAS'', ''ACTIVE, DELETED, ARCHIVED'', ''AllVariations'')'
))

INFOR_TABLE_STATS

The INFOR_TABLE_STATS function allows you to get stats out of the Infor Datalake for a given table. The function takes the following parameters:

CONNECTION_SLUG (VARCHAR): The slug of the connection to query
TABLE_NAME (VARCHAR): The name of the table to get statistics for
COMPANIES (ARRAY): The List of Companies to get statistics for
FROM_TIMESTAMP (STRING): The date from which to get statistics since based on a records timestamp
ADDITIONAL_WHERE_CLAUSE (STRING): Additional where clause to filter the statistics

Examples

select *
from table(OMNATA_INFOR_DATA_LAKE_PLUGIN.UDFS.INFOR_TABLE_STATS(
    'my-infor-connection-slug',
    'CINACC',
    [100],
    '',
    ''
))

INFOR_TABLE_DISTINCT_STATS

The INFOR_TABLE_DISTINCT_STATS function allows you to get stats out of the Infor Data Lake for a given table and displays the number of distinct records as duplicates could exist within the DataLake.

The function takes the following parameters:

CONNECTION_SLUG (VARCHAR): The slug of the connection to query
TABLE_NAME (VARCHAR): The name of the table to get statistics for
PRIMARY_KEYS (ARRAY): The List of Primary Keys to get statistics for
COMPANIES (ARRAY): The List of Companies to get statistics for
FROM_TIMESTAMP (STRING): The date from which to get statistics since based on a records timestamp
ADDITIONAL_WHERE_CLAUSE (STRING): Additional where clause to filter the statistics

Examples

select *
from table(OMNATA_INFOR_DATA_LAKE_PLUGIN.UDFS.INFOR_TABLE_DISTINCT_STATS(
    'my-infor-connection-slug',
    'CINACC',
    ['CONO', 'DIVI', 'ANBR', 'SENO'],
    [100],
    '',
    ''
))

INFOR_TABLE_ROW_STATS

The INFOR_TABLE_ROW_STATS function allows you to get stats out of the Infor Data Lake for a given table group based on the columns passed in. The function takes the following parameters:

CONNECTION_SLUG (VARCHAR): The slug of the connection to query
TABLE_NAME (VARCHAR): The name of the table to get statistics for
PRIMARY_KEYS (ARRAY): The List of Primary Keys to get statistics for
COMPANIES (ARRAY): The List of Companies to get statistics for
FROM_TIMESTAMP (STRING): The date from which to get statistics since based on a records timestamp
ADDITIONAL_WHERE_CLAUSE (STRING): Additional where clause to filter the statistics

Examples

select *
from table(OMNATA_INFOR_DATA_LAKE_PLUGIN.UDFS.INFOR_TABLE_ROW_STATS(
    'my-infor-connection-slug',
    'CINACC',
    ['CONO', 'DIVI', 'ANBR', 'SENO'],
    [100],
    '',
    ''
))

PreviousRelease History NextRelease History

Last updated 1 month ago