From Infor M3 to Snowflake: Unveiling the Secrets of a Data Extraction Project.
We recently had an opportunity to work on a data engineering project to extract data from Infor M3, transform and load the data into a Snowflake Data Warehouse. Infor M3 is a cloud-based, manufacturing and distribution ERP system designed for manufacturers and distributors.
The client, in this specific case, is an existing AWS customer and was keen to utilise the AWS platform where possible. This requirement shaped the evaluation to initially focus on AWS Glue, as the key extraction tool. During the evaluation we encountered a number of blocks which caused us to pivot the approach.
This article provides an in-depth review of the evaluation process, our findings and an overview of the final data pipeline solution that best met the requirements of the client’s specific use case.
Infor Data Lake
As a large scale platform, Infor M3 presents a number of complexities in extracting data and loading it into a data warehouse. The Infor Data Lake is Infor's solution for storing data and is part of Infor OS. When migrating data out of Infor M3, the Infor Data Lake is the starting point or source.
Infor provides online documentation which is quite light in terms of detail. Infor also has a YouTube channel, which in my experience provides much more detailed information. There are three different approaches to extracting data from the Infor Data Lake:
Custom JDBC Connector
ION Data Lake Flows
Compass APIs
Custom JDBC Connector with DBeaver
As a starting point we utilised DBeaver and the Custom JDBC Connector to connect to and query data in the Infor Data Lake. The objective of this was to validate the driver and also the security credentials, rather than being a final extraction solution.
The setup of the DBeaver connection was straight forward. The online Infor documentation for the Custom JDBC driver is very good and there is also a video available. The key points to remember are:
The ionapi credential file must be in the same folder as the driver.
The classname of the driver must be set to com.infor.idl.jdbc.Driver
Once the connection type is created in DBeaver we were able to easily query the data in the Infor Data Lake.
AWS Glue with custom JDBC Driver
As the client was already an AWS customer, they were keen to evaluate AWS Glue as a tool to extract the data from the Infor Data Lake and load into SnowFlake. AWS Glue does support the use of custom JDBC drivers. The use of AWS Glue would also have the potential to load data from other source systems, which was a known future requirement.
AWS has documentation available to assist with the configuration of the custom JDBC driver in AWS Glue. The high-level process with AWS Glue is:
Create the custom connector.
Create a connection from the connector.
Create a job using the connection to load data from your source to your selected destination.
Using the ionapi credential file
The first step was to load the driver files into an S3 bucket, which is a straightforward process. Remembering that the Infor ionapi credential file must be stored in the same folder as the JDBC driver, we also loaded this file into the same S3 bucket.
An AWS custom connector was created specifying:
URL = jdbc:infordatalake://TENANTNNAME
Class = com.infor.idl.jdbc.Driver
Creating the connection, we specified the connection credential type to default.
To test the connection we created a basic job to load data from a single Infor table and place it into an S3 bucket.
When attempting to run the job from the connection the following error was encountered:
Glue ETL Marketplace: Either user/password secretId should be provided for JDBC connector
The credential file that was placed in the S3 bucket alongside the driver file was not being loaded. As a workaround we attempted to try loading the credential file into the AWS Secret Manager and specifying a secret.
Loading ionapi credential into secret manager
For the second AWS Glue approach we first loaded the JSON credential string into the AWS Secret Manager. We then created a new custom JDBC connection, this time specifying the use of a Secret. In trying to create a job from the new custom connection we encountered authentication errors again, as the driver was not able to access the credentials successfully.
Glue ETL Marketplace: Either user/password secretId should be provided for JDBC connector.
Appending the credential to the query string
Infor documentation provides multiple ways to pass the credentials to the driver. As we have discussed previously, the credentials can be placed in the ionapi credential file. Alternatively, you can append the credentials to the query string:
jdbc:infordatalake://TENANT_TST?ionApiCredentials=...
We created a new AWS Glue connector, this time adding the credentials as a parameter on the connection setup.
You can see in the above screenshot that the URL parameter delimiter that we specified on the connection is being used to separate the base URL from the first parameter, rather than between multiple parameters. For this to work we would need the base URL and the first parameter to be separated by the “?” character.
We attempted to run the job with the “&” and as expected we received the same error:
Glue ETL Marketplace: Either user/password secretId should be provided for JDBC connector.
Appending the credential to the query string with "?" as delimiter
We created a new connector this time specifying the "?" as the delimiter. We then created a new connection and provided the credentials as per the earlier example. This time the behaviour was different. We did not see the credentials added to the Connection URL preview.
Again however, we experienced the same error with the credentials unable to be passed through.
Glue ETL Marketplace: Either user/password secretId should be provided for JDBC connector.
AWS Glue Outcome
After trialing multiple approaches with AWS Glue and working with AWS Support, we determined that AWS Glue with the custom JDBC driver would not be a workable approach. This was a major pivot in our planned approach to data extraction, from Infor M3 and the Infor Data Lake.
ION Data Lake Flows
Our next evaluation was to use the in-built ION Data Lake Flows. With this approach, we worked closely with Infor Support to manage the configuration on the Infor side. The documentation is a little light on, however there is a good video to assist with the setup.
When correctly configured the ION API Gateway will extract the data in the selected table and then load it into the specified S3 bucket in a JSON format. The first load is a full table load and then subsequent loads are delta loads.
This approach was able to successfully push data from the Infor Data Lake and into the S3 Bucket. Once in the S3 Bucket, loading the data into SnowFlake was straightforward.
Understanding the JSON Structure
The JSON structure is very simple with a flat structure that maps directly to the Infor Table Columns. As I noted earlier, the data is a delta load, the upserts can be identified from the CHNO column (Change Number), which is included in each JSON file for each table.
When CHNO=1 the record is an insert, when CHNO>1 the record is an update to an existing row. When a row is deleted the DELETED value will equal true.
Compass APIs
The Compass APIs provide a third approach to data extraction. In our case, we wanted to avoid custom coding and given the success of the ION Data Lake Flows, we decided not to further investigate the Compass API approach. Given the ease of the ION Data Lake Flows, I have confidence that this approach would also work well and in some ways provide a cleaner implementation.
Loading Data Into SnowFlake
Once data is loaded into an S3 bucket, an external stage from SnowFlake to the S3 can be created. While you can directly copy the data from S3 into SnowFlake, an external stage is a recommended SnowFlake approach. An external stage holds the permissions for the bucket, which assists with security management. From the external stage the copy into command can be used to move the data into a table in the Landing Layer, before further transformations are performed.
Conclusion
The process of evaluating the data extraction from the Infor Data Lake encountered a number of obstacles. These may have been specific to AWS Glue, as we were able to successfully use the custom JDBC driver with DBeaver. However, in our case the AWS Glue/JDBC driver was not a viable approach.
In this specific client case, the ION Data Lake Flows provided a reliable method of extracting data and loading into S3. The technical complexity has been kept low, as this method uses the in-built capability of the Infor Platform. Once in the S3 bucket, loading into SnowFlake is trivial using the copy into command. The downside of this approach is that it is specific to Infor, as a source system. This means that for a client with many different source systems, a different method of data extraction would be required to load data into SnowFlake or their cloud data warehouse.
If you have any questions about this use case you can get in touch with us via email at info@pivotanalytics.com.au or calling 1300 475 510.
Comments