Is Axional Analytics Suite right for your business?

Axional Analytics Suite is a great fit for many businesses. Find out if it's the right choice for you! Are you looking for a Axional Analytics Suite demo, technical support, product or pricing information or to simply to compare Axional Analytics Suite with competitive options? Quickly get answers and the information you need.

End users love Business Intelligence tools. They provide graphs, moving targets, drill-downs, and drill-through. But much of the work in an analytical environment involves getting the data from operational systems into the data warehouse so that business intelligence tools can display those pretty pictures.

The ETL module is a scalable enterprise data integration platform with exceptional extract, transform, and load (ETL) capabilities. The module is based on a Data Transformation Engine. It operates at a higher level of abstraction than many visual development tools. Generally, developers can accomplish much more with the engine in far fewer statements than they can with 3GLs:

  • Simple to program: Based on XSQL scripting language, allowing a straightforward definition for complex ETL processes.
  • Powerful: Extensive set of built-in functions. More than 300 pre-built functions and connectors to multiple databases and file formats, allowing definition for complex control flow, data flow, event-driven logic, and logging.

This module consolidates data from different source systems, having different data organizations or formats. It has been specifically designed to move high data volumes and apply complex business rules to bring all the data together in a standardized and homogeneous environment. The key features of this tool are:

  • Readers/writers for access to most data sources.
  • Consolidate data sources into a central data warehouse or datamart.
  • Provides qualitative and quantitative metrics for job optimization.
  • Unicode support. Unicode which enables all international character sets means that no matter where the work is done, it can be collected and stored from around the globe.
  • User definable functions that will be executed during the ETL process.
  • Changed Data Capture. Identify and capture data that has been added to, or updated in.

The XSQL language allows the definition of complex rules to transform the raw data. Some examples are:

  • Data validation. Includes cross validation between several columns.
  • Encoding free-form values (e.g., mapping texts to codes).
  • Deriving a new calculated value.
  • Anonimizing values for data privacy requirements.
  • Transposing columns/rows.
  • Aggregation functions.
  • Data is finally loaded into the end target, usually a data warehouse or datamart. Updates may be by replacing the previous data, by updating existing data or by adding cumulative information.

EXTRACTION

The tool provides with connectors for extraction from multiple Data Sources, from plain text to Databases, to Excel, XML and others.

The data source is extracted by data streaming and can be loaded on-the-fly to the destination database with no intermediate data storage. Advanced loading functions allows in most cases to integrate all ETL phases in a single process, increasing security and reliability.

An intrinsic part of the extraction involves the parsing of extracted data, resulting in a check if the data meets an expected pattern or structure. If not, the data may be rejected entirely or in part:

  • Relational Data Sources: JDBC/ODBC connectors for databases including IBM Informix, IBM DB2, Oracle, SQL Server, MySQL or Postgres. XML defined queries supports defining relationships and multiple-table data retrieving.
  • Non- Relational Data Sources: Direct Excel retrieval supporting multi-sheets range and named range. Multi-lingual encoded Text, with native loaders for CSV, Fixed Width, or RegEx when advanced text extraction is required. XQuery/XPath support for XML extraction.
  • Auxiliary Data Sources: Create in-memory data sources for small data sets such as a code table or properties translation. It’s a safe method for holding and transmitting encrypted data.
  • Extract file attributes from a FileSystem in a folder to be processed as regular records.
  • Access to data —including XML and text— over HTTP/HTTPS, TCP, and FTP/FTPS.
  • Methods for uncompress source files with different methods: zip, compress, gzip, b2zip, etc. Methods for extracting data from archives, and manipulate files and directories.

TRANSFORM

The transform stage applies a series of rules or functions to the source’s extracted data to derive the end-data to be loaded into the target DB. To tool provides with the capability to:

  • Check data against hundreds or even thousands of business rules without performance limitations.
  • Functions for error-logging, record rejecting, and execution flow control.
  • Rich set of functions for transforming data allowing:
  • Translation of coded values (e.g., if the source system stores 1 for on and 2 for off, but the warehouse stores O for on and F for off).
  • Deriving a new calculated value (e.g., saleamount = qty * unitprice – discount_value).
  • Lookup and validate the relevant data from tables or referential files for slowly changing dimensions.
  • Anonymization, supporting md5 and Whirlpool hashing algorithms.
  • Simple or complex data validation. If validation fails, it may result in a full, partial or no rejection of the data.
  • Generating surrogate-key values.

Load data and anonymize in one step: Anonymization and normalization is performed on-the-fly, during the loading of external data from files. For each line of the source file, the following steps are executed:

  • Read line in memory.
  • Filtering operations are executed in memory, allowing anonymization and normalization. The use of local defined function as filter allows for custom anonymization procedures.
  • The resulting data are loaded into the table.

LOAD

The load phase stores the data into the end target, usually the data warehouse (DW) or a stage database.

Inserting new records or updating existing ones can be done searching the primary key columns. If the data loaded has a PK of an existing record, then the record is updated and if it’s a new PK, then the record is inserted. For optimizing performance, two different updating algorithms can be defined depending of the data type. For fact tables, first an insert is tried and if the PK exists then the update is done. For master tables, the first operation tried is the update and if no records are updated, then the insert is done.

The system provides with the capability to handle historical data as well. In this case, all changes in records are kept allowing to reproduce any previous report.

Target systems could be a database connected via JDBC, a file, or ever a streamed data seeded by posting to HTTP, by FTP or by TCP.

Integration with Informix IWA: After loading data in target Informix database, ETL process could automatically call the process for updating Warehouse Accelerator Datamart data.

ETL PERFORMANCE

The tool supports massive parallel processing (MPP) for large data volumes. In any case, one should consider than Simplicity is also performance, and the One Step “Extract, apply transformation functions & load” method is the faster algorithm for ETL procedures.

In our test systems, a file with 1 million records can be extracted, anonymized with whirlpool methods, normalized and loaded into Database table in less than 8 minutes, without using parallelization.

Other Applications

The following applications are designed to work hand-in-hand with ETL Process:

"ETL Process" is part of the Axional Analytics Suite line of products, developed by Diester.