Infor Data Lake
About this plugin
The Infor Data Lake is a cloud-based data repository provided by Infor, designed to store and manage large volumes of structured and unstructured data. It is part of the Infor OS (Operating Service) platform, which offers a suite of tools and services for data integration, analytics, and business intelligence.
The Omnata Infor Datalake Plugin provides an alternative approach to extract data from the Infor Data Lake using the Infor Compass JDBC Driver (BYO). Data is only loaded into the Infor Data Lake every 10 minutes, so this plugin is useful for near real-time data extraction where as the Infor Streaming from the data lake could be more suitable for real-time data extraction but has limitations and doesn't handle things like schema drift which this plugin does.
Prerequisites
Data Catalog API
The Infor Data Lake provides a Data Catalog API that allows you to query the metadata of the data stored in the Infor Data Lake. This API is used by the Omnata Infor Datalake Plugin to retrieve the metadata of the tables and fields in the Infor Data Lake.
Compass JDBC Driver
Before you create a connection in Omnata, you are required to upload your own version of the infor-compass-jdbc-2025.08.jar for use by the Omnata plugin app.
Follow these steps to upload the driver:
ION download
Login to Infor OS and start the ION application.
Expand the ION menu and select Configuration > Downloads
Select the Compass JDBC driver download. (For details, see the Infor ION User Guide)
Installing the driver into Snowflake
Upload the
infor-compass-jdbc-2025.08.jarto an internal stage within your snowflake instanceRun the following command (ensuring you specify the stage you have uploaded the jar file into):
COPY FILES INTO @OMNATA_INFOR_DATA_LAKE_PLUGIN.CUSTOM_PACKAGES.PACKAGES
FROM @<<DATABASE_NAME>>.<<SCHEMA_NAME>>.<<STAGE_NAME>>;Authentication
ION API Security Credentials
The Infor Compass JDBC Driver requires the use of ION API Security Credentials to authenticate with the Infor Data Lake. The credentials are downloaded through the Infor OS Portal or Infor Ming.le. The role of IONAPI-Administrator is required to setup security credentials. For more information, see the Infor OS documentation.
Generating ION API Security Credentials
Follow these steps to download the ION API Security Credentials:
Log on to Infor OS Portal or Infor Ming.le.
From the application tray, select the API Gateway application.
Select Authorized Apps.
Add a new non-Infor authorized application.
Specify this information:
Name :
Infor Compass JDBC Driver(The name must be exactly as shown)Description:
Infor Compass JDBC Driver(The description must be exactly as shown)
Select the backend service type.
Save the app.
From the Authorized app menu, navigate to the new Infor Compass JDBC driver application.
Click Download.
Select Create Service Account.
In the User Name field, specify the first few characters of your user name. Then select your username from the type-ahead list.
Click Download.
Open the downloaded file and copy the contents into the Omnata Connection Wizard
Inbound Syncs
The Omnata Infor Datalake Plugin provides a set of inbound syncs that can be used to load data from data lake into Snowflake. These syncs can be scheduled to run at regular intervals to keep the data up-to-date.
It is recommended that all syncs are Incremental syncs to reduce load on the Infor Data Lake. After the initial run, the plugin will use the infor.lastModified() field to determine which records to load. The plugin will only load records that have been modified since the last sync run.
Storage Strategy
Using the APPEND strategy will run the Variations to Retrieve specified in the sync. Using the MERGE strategy will enforce the HighestVariation strategy and only keep the latest version of a record in the destination table.
Additional Fields Retrieved
In addition to the fields of the table, the following fields are also retreived:
infor.lastModified() - Representing the timestamp in which the data object was added to Data Lake.
infor.Dataobjectid() - Representing the unique identifier of Data Lake data object in which the record is stored.
infor.DataObjectSeqId() - Representing the record or line number associated with the record location in the data object.
infor.Partition() - Representing the partition in which the data object is stored.
Customizing the Omnata storage location
It is recommended that you configure a custom storage location in Omnata.
Managing performance
There are several levers the user can set to optimize performance. The following should be viewed as general guidance. Depending on the size of the data and the performance of the Infor Data Lake, these settings may need to be adjusted.
Concurrency Level Settings
The plugin has the ability to adjust the size and concurrency of queries being sent to the source database. The following are available settings in the Omnata UI:
Maximum Concurrent Streams (MCS) This setting represents the how many Streams (Tables) that the Plugin will query concurrently.
Maximum Concurrent Queries (MCQ) This setting represents how many queries per Stream (Table) that the plugin will query concurrently.
Maximum Queries Per Stream Per Sync (MQS) This setting represents how may queries per Stream (Table) that the plugin will send to the server during a sync.
Maximum Records Per Query (MRQ) This setting represents the maximum number of rows that each query will request.
Usage Example
Sync has 16 Streams (Tables) Each Table has 5 Million rows MCS = 8 MCQ = 8 MRQ = 1,000,000.
In this example we have 80 millions rows in total to acquire from the DataLake. In this example that will be 80 queries in total for the sync of 1,000,000 rows per query. The maximum number of queries the Plugin will concurrently send to the Data Lake is 64 (MCS x MRQ), each one of those queries is ask upto 1 million (MRQ) rows. Once a query is processed another will be sent to the server until all the queries have been serviced by the server but 64 is the maximum concurrent queries at any one time.
Initial Data load
The initial data load will be the same as the above example but the plugin will not stop until all the data has been loaded. The plugin will continue to send queries to the server until all the data has been loaded. When determining the number of queries to send to the server the plugin will use the following formula:
Determine the number of records (based on timestamp) that are available for a given table grouped by year
Determine the number of records (based on timestamp) that are available for a given table grouped by year and month for the years where the number of records is greater than the specified
MRQDetermine the number of records (based on timestamp) that are available for a given table grouped by year, month and day for the years and months where the number of records is greater than the specified
MRQDetermine the number of records (based on timestamp) that are available for a given table grouped by year, month, day and hour for the years, months and days where the number of records is greater than the specified
MRQ
Snowflake warehouse size
Omnata typically runs sync workloads on a single cluster XS warehouse. For this endpoint, large initial sync runs can often lead to query queueing. If you see queuing, you will be able to reduce sync times by adding multi-clustering. We recommend setting the Scaling policy to 'Standard', not 'Economy'. After the initial sync, you can return the Warehouse to a single cluster for the incremental syncs. We advise you to monitor the query queueing events and adjust your Snowflake Warehouse as needed.
Alternatively, you can opt for a Serverless Warehouse, this will scale out as needed and you will not see queueing events. However, keep in mind you have less control over costs when using Serverless compute.
Task timeout
Initial syncs can often run for longer than the default Omnata timeout of 4 hours, and your Snowflake account or warehouse level query timeout(s). We recommend setting these task timeout to a value greater than the expected duration of the sync. You will need to adjust this value based on the size of the data being loaded. As a rough benchmark, a table with 1 billion records may take more than 6 hours to sync.
Record History Retention
Due to the volume of data being transitioned through this plugin from the Infor Datalake, we recommend setting the retention period to 0 days. This will reduce overall sync times, however, you will no longer be able to view a record-level sync activity log.
Functions
Omnata provides a set of UDFs that you can use to query the Infor Datalake. These UDFs are available in the OMNATA_INFOR_DATA_LAKE_PLUGIN database and the UDFS schema.
INFOR_DIRECT_QUERY
The INFOR_DIRECT_QUERY function allows you to query the Infor Datalake directly from your destination database. The function takes the following parameters:
CONNECTION_SLUG (VARCHAR): The slug of the connection to query
QUERY (VARCHAR): The SQL query
Examples
select *
from table(OMNATA_INFOR_DATA_LAKE_PLUGIN.UDFS.INFOR_DIRECT_QUERY(
'my-infor-connection-slug',
'select * from infor.include(''MITMAS'', ''ACTIVE, DELETED, ARCHIVED'', ''AllVariations'')'
))INFOR_TABLE_STATS
The INFOR_TABLE_STATS function allows you to get stats out of the Infor Datalake for a given table. The function takes the following parameters:
CONNECTION_SLUG (VARCHAR): The slug of the connection to query
TABLE_NAME (VARCHAR): The name of the table to get statistics for
COMPANIES (ARRAY): The List of Companies to get statistics for
FROM_TIMESTAMP (STRING): The date from which to get statistics since based on a records timestamp
ADDITIONAL_WHERE_CLAUSE (STRING): Additional where clause to filter the statistics
Examples
select *
from table(OMNATA_INFOR_DATA_LAKE_PLUGIN.UDFS.INFOR_TABLE_STATS(
'my-infor-connection-slug',
'CINACC',
[100],
'',
''
))INFOR_TABLE_DISTINCT_STATS
The INFOR_TABLE_DISTINCT_STATS function allows you to get stats out of the Infor Data Lake for a given table and displays the number of distinct records as duplicates could exist within the DataLake.
The function takes the following parameters:
CONNECTION_SLUG (VARCHAR): The slug of the connection to query
TABLE_NAME (VARCHAR): The name of the table to get statistics for
PRIMARY_KEYS (ARRAY): The List of Primary Keys to get statistics for
COMPANIES (ARRAY): The List of Companies to get statistics for
FROM_TIMESTAMP (STRING): The date from which to get statistics since based on a records timestamp
ADDITIONAL_WHERE_CLAUSE (STRING): Additional where clause to filter the statistics
Examples
select *
from table(OMNATA_INFOR_DATA_LAKE_PLUGIN.UDFS.INFOR_TABLE_DISTINCT_STATS(
'my-infor-connection-slug',
'CINACC',
['CONO', 'DIVI', 'ANBR', 'SENO'],
[100],
'',
''
))INFOR_TABLE_ROW_STATS
The INFOR_TABLE_ROW_STATS function allows you to get stats out of the Infor Data Lake for a given table group based on the columns passed in. The function takes the following parameters:
CONNECTION_SLUG (VARCHAR): The slug of the connection to query
TABLE_NAME (VARCHAR): The name of the table to get statistics for
PRIMARY_KEYS (ARRAY): The List of Primary Keys to get statistics for
COMPANIES (ARRAY): The List of Companies to get statistics for
FROM_TIMESTAMP (STRING): The date from which to get statistics since based on a records timestamp
ADDITIONAL_WHERE_CLAUSE (STRING): Additional where clause to filter the statistics
Examples
select *
from table(OMNATA_INFOR_DATA_LAKE_PLUGIN.UDFS.INFOR_TABLE_ROW_STATS(
'my-infor-connection-slug',
'CINACC',
['CONO', 'DIVI', 'ANBR', 'SENO'],
[100],
'',
''
))GET_OBJECT_CATEGORIES
The GET_OBJECT_CATEGORIES function allows you to retrieve the available categories and associated tables from available in the datalake. The function takes the following parameters:
CONNECTION_SLUG (VARCHAR): The slug of the connection to query
Examples
select *
from table(OMNATA_INFOR_DATA_LAKE_PLUGIN.UDFS.GET_OBJECT_CATEGORIES('my-infor-connection-slug'))GET_INBOUND_STREAM_METADATA
The GET_INBOUND_STREAM_METADATA function allows you to retrieve the metdata data required to add a stream to a sync for a given set of object names. The function takes the following parameters:
CONNECTION_SLUG (VARCHAR): The slug of the connection to query
OBJECT_NAMES (ARRAY): The List of objects to retrieve metadata for
Examples
select *
from table(OMNATA_INFOR_DATA_LAKE_PLUGIN.UDFS.GET_OBJECT_CATEGORIES('my-infor-connection-slug', ['MITMAS', 'CINACC']))GET_M3_TABLE_COLUMN_CHECKS
The GET_M3_TABLE_COLUMN_CHECKS function allows you to validate if the columns exists for a given M3 Table. The M3 Prefix is added onto the front of the columns for the actual queries. The function takes the following parameters:
CONNECTION_SLUG (VARCHAR): The slug of the connection to query
TABLE_NAME (VARCHAR): The table to retrieve the columns for
COLUMN_CHECKS (ARRAY): The List of columns to check for existence
Examples
select *
from table(infor_datalake.udfs.GET_M3_TABLE_COLUMN_CHECKS('infor-demo-test', 'FGLEDG', ['RGTM', 'RGDT', 'LMDT', 'LMTS']))GET_M3_TABLE_COUNTS
The GET_M3_TABLE_COUNTS function allows you to retrieve the counts of the specified M3 Table. The function takes the following parameters:
CONNECTION_SLUG (VARCHAR): The slug of the connection to query
TABLE_NAME (VARCHAR): The table to retrieve the counts for
PARAMETERS (OBJECT): The object containing the field names and date fields to use for the table counts. The object should contain the following fields:
date_field_name (str): The field to use for date filtering (should be in YYYYMMDD format).
time_field_name (str): The field to use for time filtering (should be in HHMMSS format).
year (int): The year for which to get data.
month (int): The month for which to get data.
day (int): The day for which to get data.
hour (int): The hour for which to get data.
minute_group_size (int): The size of the minute groupings (e.g., 15 for 15-minute intervals).
count_level (str): The level of counts to retrieve. Can be one of the following:
"year": Counts grouped by year
"month": Counts grouped by month
"day": Counts grouped by day
"hour": Counts grouped by hour
"minute_group": Counts grouped by minute groupings
"custom": Custom counts based on a where clause
where_clause (str): An optional where clause to filter the counts
count_all_companies (bool): Whether to count all records or only those associated with the company specified for the connection. Defaults to true.
Examples
select *
from table(OMNATA_INFOR_DATA_LAKE_PLUGIN.UDFS.GET_M3_TABLE_COUNTS('my-infor-connection-slug', 'FGLEDG', to_object(parse_json('{"date_field_name": "EGRGDT", "time_field_name": "EGRGTM", "year": 2025, "month": 1, "day": 5, "hour": 14, "minute_group_size": 15}'))))GET_M3_TABLE_DATA
The GET_M3_TABLE_DATA function allows you to retrieve the data of the specified M3 Table. However, due to restrictions with the EXPORTMI API, the results returned are limited to 10,000. The function takes the following parameters:
CONNECTION_SLUG (VARCHAR): The slug of the connection to query
TABLE_NAME (VARCHAR): The table to retrieve the counts for
PARAMETERS (OBJECT): The object containing the field names and date fields to use for the table counts. The object should contain the following fields:
date_field_name (str): The field to use for date filtering (should be in YYYYMMDD format).
time_field_name (str): The field to use for time filtering (should be in HHMMSS format).
year (int): The year for which to get data.
month (int): The month for which to get data.
day (int): The day for which to get data.
hour (int): The hour for which to get data.
start_minute (int): The starting minute of the group (eg 0 which will be set to 00:00:00)
end_minute (int): The ending minute of the group. (eg 14 which will be set to 14:59:59)
where_clause (str): An optional where clause to filter the data
data_level (str): The level of data to retrieve. Can be one of the following:
"minute_group": Data grouped by minute intervals
"hour": Data grouped by hour
"month": Data grouped by day
"custom": Custom data based on a where clause
columns (str): A comma-separated list of columns to retrieve. If not specified, all columns will be retrieved.
max_records (int): The maximum number of records to fetch (default is 10,000)
Examples
select *
from table(OMNATA_INFOR_DATA_LAKE_PLUGIN.UDFS.GET_M3_TABLE_DATA('my-infor-connection-slug', 'FGLEDG', to_object(parse_json('{"date_field_name": "EGRGDT", "time_field_name": "EGRGTM", "year": 2025, "month": 1, "day": 5, "hour": 14, "start_minute": 0, "end_minute": 14}'))))
Last updated