top of page

From Infor M3 to Snowflake: Unveiling the Secrets of a Data Extraction Project.


Snowflake Data Warehouse

We recently had an opportunity to work on a data engineering project to extract data from Infor M3, transform and load the data into a Snowflake Data Warehouse. Infor M3 is a cloud-based, manufacturing and distribution ERP system designed for manufacturers and distributors.


The client, in this specific case, is an existing AWS customer and was keen to utilise the AWS platform where possible. This requirement shaped the evaluation to initially focus on AWS Glue, as the key extraction tool. During the evaluation we encountered a number of blocks which caused us to pivot the approach.


This article provides an in-depth review of the evaluation process, our findings and an overview of the final data pipeline solution that best met the requirements of the client’s specific use case.


Infor Data Lake

As a large scale platform, Infor M3 presents a number of complexities in extracting data and loading it into a data warehouse. The Infor Data Lake is Infor's solution for storing data and is part of Infor OS. When migrating data out of Infor M3, the Infor Data Lake is the starting point or source.


Infor provides online documentation which is quite light in terms of detail. Infor also has a YouTube channel, which in my experience provides much more detailed information. There are three different approaches to extracting data from the Infor Data Lake:

  1. Custom JDBC Connector

  2. ION Data Lake Flows

  3. Compass APIs

Custom JDBC Connector with DBeaver

As a starting point we utilised DBeaver and the Custom JDBC Connector to connect to and query data in the Infor Data Lake. The objective of this was to validate the driver and also the security credentials, rather than being a final extraction solution.


The setup of the DBeaver connection was straight forward. The online Infor documentation for the Custom JDBC driver is very good and there is also a video available. The key points to remember are:

  • The ionapi credential file must be in the same folder as the driver.

  • The classname of the driver must be set to com.infor.idl.jdbc.Driver

Once the connection type is created in DBeaver we were able to easily query the data in the Infor Data Lake.


AWS Glue with custom JDBC Driver

As the client was already an AWS customer, they were keen to evaluate AWS Glue as a tool to extract the data from the Infor Data Lake and load into SnowFlake. AWS Glue does support the use of custom JDBC drivers. The use of AWS Glue would also have the potential to load data from other source systems, which was a known future requirement.


AWS has documentation available to assist with the configuration of the custom JDBC driver in AWS Glue. The high-level process with AWS Glue is:

  1. Create the custom connector.

  2. Create a connection from the connector.

  3. Create a job using the connection to load data from your source to your selected destination.

Using the ionapi credential file

The first step was to load the driver files into an S3 bucket, which is a straightforward process. Remembering that the Infor ionapi credential file must be stored in the same folder as the JDBC driver, we also loaded this file into the same S3 bucket.


An AWS custom connector was created specifying:

  • URL = jdbc:infordatalake://TENANTNNAME

  • Class = com.infor.idl.jdbc.Driver

Infor Connector Properties

Creating the connection, we specified the connection credential type to default.


Infor Connection Access

To test the connection we created a basic job to load data from a single Infor table and place it into an S3 bucket.

Infor S3 Bucket Test

When attempting to run the job from the connection the following error was encountered:


Glue ETL Marketplace: Either user/password secretId should be provided for JDBC connector

The credential file that was placed in the S3 bucket alongside the driver file was not being loaded. As a workaround we attempted to try loading the credential file into the AWS Secret Manager and specifying a secret.


Loading ionapi credential into secret manager

For the second AWS Glue approach we first loaded the JSON credential string into the AWS Secret Manager. We then created a new custom JDBC connection, this time specifying the use of a Secret. In trying to create a job from the new custom connection we encountered authentication errors again, as the driver was not able to access the credentials successfully.


Glue ETL Marketplace: Either user/password secretId should be provided for JDBC connector.


Appending the credential to the query string

Infor documentation provides multiple ways to pass the credentials to the driver. As we have discussed previously, the credentials can be placed in the ionapi credential file. Alternatively, you can append the credentials to the query string:


jdbc:infordatalake://TENANT_TST?ionApiCredentials=...


We created a new AWS Glue connector, this time adding the credentials as a parameter on the connection setup.

AWS Glue connector

You can see in the above screenshot that the URL parameter delimiter that we specified on the connection is being used to separate the base URL from the first parameter, rather than between multiple parameters. For this to work we would need the base URL and the first parameter to be separated by the “?” character.

JDBC URL Base

We attempted to run the job with the “&” and as expected we received the same error:

Glue ETL Marketplace: Either user/password secretId should be provided for JDBC connector.


Appending the credential to the query string with "?" as delimiter

We created a new connector this time specifying the "?" as the delimiter. We then created a new connection and provided the credentials as per the earlier example. This time the behaviour was different. We did not see the credentials added to the Connection URL preview.

Connection URL Preview

Again however, we experienced the same error with the credentials unable to be passed through.


Glue ETL Marketplace: Either user/password secretId should be provided for JDBC connector.


AWS Glue Outcome

After trialing multiple approaches with AWS Glue and working with AWS Support, we determined that AWS Glue with the custom JDBC driver would not be a workable approach. This was a major pivot in our planned approach to data extraction, from Infor M3 and the Infor Data Lake.


ION Data Lake Flows

Our next evaluation was to use the in-built ION Data Lake Flows. With this approach, we worked closely with Infor Support to manage the configuration on the Infor side. The documentation is a little light on, however there is a good video to assist with the setup.


When correctly configured the ION API Gateway will extract the data in the selected table and then load it into the specified S3 bucket in a JSON format. The first load is a full table load and then subsequent loads are delta loads.

Data Lake and into the S3 Bucket

This approach was able to successfully push data from the Infor Data Lake and into the S3 Bucket. Once in the S3 Bucket, loading the data into SnowFlake was straightforward.


Understanding the JSON Structure

The JSON structure is very simple with a flat structure that maps directly to the Infor Table Columns. As I noted earlier, the data is a delta load, the upserts can be identified from the CHNO column (Change Number), which is included in each JSON file for each table.


When CHNO=1 the record is an insert, when CHNO>1 the record is an update to an existing row. When a row is deleted the DELETED value will equal true.

CHNO column

Compass APIs

The Compass APIs provide a third approach to data extraction. In our case, we wanted to avoid custom coding and given the success of the ION Data Lake Flows, we decided not to further investigate the Compass API approach. Given the ease of the ION Data Lake Flows, I have confidence that this approach would also work well and in some ways provide a cleaner implementation.


Loading Data Into SnowFlake

Once data is loaded into an S3 bucket, an external stage from SnowFlake to the S3 can be created. While you can directly copy the data from S3 into SnowFlake, an external stage is a recommended SnowFlake approach. An external stage holds the permissions for the bucket, which assists with security management. From the external stage the copy into command can be used to move the data into a table in the Landing Layer, before further transformations are performed.

Snowflake to the S3 bucket

Conclusion

The process of evaluating the data extraction from the Infor Data Lake encountered a number of obstacles. These may have been specific to AWS Glue, as we were able to successfully use the custom JDBC driver with DBeaver. However, in our case the AWS Glue/JDBC driver was not a viable approach.


In this specific client case, the ION Data Lake Flows provided a reliable method of extracting data and loading into S3. The technical complexity has been kept low, as this method uses the in-built capability of the Infor Platform. Once in the S3 bucket, loading into SnowFlake is trivial using the copy into command. The downside of this approach is that it is specific to Infor, as a source system. This means that for a client with many different source systems, a different method of data extraction would be required to load data into SnowFlake or their cloud data warehouse.


If you have any questions about this use case you can get in touch with us via email at info@pivotanalytics.com.au or calling 1300 475 510.

1,264 views0 comments

ความคิดเห็น


Register Your Interest!

Exclusive CIO & IT Leader Lunch Event iMelbourne, 12th September 2024.

Thank you for your interest.

Modern search driven analytics is changing the way retailers do business.
Download our eBook to understand how we provide true self-service analytics with Search & AI

bottom of page