End users love Business Intelligence tools. They provide graphs, moving targets, drill-downs, and drill-through. But much of the work in an analytical environment involves getting the data from operational systems into the data warehouse so that business intelligence tools can display those pretty pictures.
The ETL module is a scalable enterprise data integration platform with exceptional extract, transform, and load (ETL) capabilities. The module is based on a Data Transformation Engine. It operates at a higher level of abstraction than many visual development tools. Generally, developers can accomplish much more with the engine in far fewer statements than they can with 3GLs:
Simple to program: Based on XSQL scripting language, allowing a straightforward definition for complex ETL processes.
Powerful: Extensive set of built-in functions. More than 300 pre-built functions and connectors to multiple databases and file formats, allowing definition for complex control flow, data flow, event-driven logic, and logging.
This module consolidates data from different source systems, having different data organizations or formats. It has been specifically designed to move high data volumes and apply complex business rules to bring all the data together in a standardized and homogeneous environment. The key features of this tool are:
Readers/writers for access to most data sources.
Consolidate data sources into a central data warehouse or datamart.
Provides qualitative and quantitative metrics for job optimization.
Unicode support. Unicode which enables all international character sets means that no matter where the work is done, it can be collected and stored from around the globe.
User definable functions that will be executed during the ETL process.
Changed Data Capture. Identify and capture data that has been added to, or updated in.
The XSQL language allows the definition of complex rules to transform the raw data. Some examples are:
Data validation. Includes cross validation between several columns.
Encoding free-form values (e.g., mapping texts to codes).
Deriving a new calculated value.
Anonimizing values for data privacy requirements.
Data is finally loaded into the end target, usually a data warehouse or datamart. Updates may be by replacing the previous data, by updating existing data or by adding cumulative information.
The tool provides with connectors for extraction from multiple Data Sources, from plain text to Databases, to Excel, XML and others.
The data source is extracted by data streaming and can be loaded on-the-fly to the destination database with no intermediate data storage. Advanced loading functions allows in most cases to integrate all ETL phases in a single process, increasing security and reliability.
An intrinsic part of the extraction involves the parsing of extracted data, resulting in a check if the data meets an expected pattern or structure. If not, the data may be rejected entirely or in part:
Relational Data Sources: JDBC/ODBC connectors for databases including IBM Informix, IBM DB2, Oracle, SQL Server, MySQL or Postgres. XML defined queries supports defining relationships and multiple-table data retrieving.
Non- Relational Data Sources: Direct Excel retrieval supporting multi-sheets range and named range. Multi-lingual encoded Text, with native loaders for CSV, Fixed Width, or RegEx when advanced text extraction is required. XQuery/XPath support for XML extraction.
Auxiliary Data Sources: Create in-memory data sources for small data sets such as a code table or properties translation. It’s a safe method for holding and transmitting encrypted data.
Extract file attributes from a FileSystem in a folder to be processed as regular records.
Access to data —including XML and text— over HTTP/HTTPS, TCP, and FTP/FTPS.
Methods for uncompress source files with different methods: zip, compress, gzip, b2zip, etc. Methods for extracting data from archives, and manipulate files and directories.
The transform stage applies a series of rules or functions to the source’s extracted data to derive the end-data to be loaded into the target DB. To tool provides with the capability to:
Check data against hundreds or even thousands of business rules without performance limitations.
Functions for error-logging, record rejecting, and execution flow control.
Rich set of functions for transforming data allowing:
Translation of coded values (e.g., if the source system stores 1 for on and 2 for off, but the warehouse stores O for on and F for off).
Deriving a new calculated value (e.g., sale
amount = qty * unitprice – discount_value). Lookup and validate the relevant data from tables or referential files for slowly changing dimensions.
Anonymization, supporting md5 and Whirlpool hashing algorithms.
Simple or complex data validation. If validation fails, it may result in a full, partial or no rejection of the data.
Generating surrogate-key values.
Load data and anonymize in one step: Anonymization and normalization is performed on-the-fly, during the loading of external data from files. For each line of the source file, the following steps are executed:
Read line in memory.
Filtering operations are executed in memory, allowing anonymization and normalization. The use of local defined function as filter allows for custom anonymization procedures.
The resulting data are loaded into the table.
The load phase stores the data into the end target, usually the data warehouse (DW) or a stage database.
Inserting new records or updating existing ones can be done searching the primary key columns. If the data loaded has a PK of an existing record, then the record is updated and if it’s a new PK, then the record is inserted. For optimizing performance, two different updating algorithms can be defined depending of the data type. For fact tables, first an insert is tried and if the PK exists then the update is done. For master tables, the first operation tried is the update and if no records are updated, then the insert is done.
The system provides with the capability to handle historical data as well. In this case, all changes in records are kept allowing to reproduce any previous report.
Target systems could be a database connected via JDBC, a file, or ever a streamed data seeded by posting to HTTP, by FTP or by TCP.
Integration with Informix IWA: After loading data in target Informix database, ETL process could automatically call the process for updating Warehouse Accelerator Datamart data.
The tool supports massive parallel processing (MPP) for large data volumes. In any case, one should consider than Simplicity is also performance, and the One Step “Extract, apply transformation functions & load” method is the faster algorithm for ETL procedures.
In our test systems, a file with 1 million records can be extracted, anonymized with whirlpool methods, normalized and loaded into Database table in less than 8 minutes, without using parallelization.
Axional Analytics on IBM IWA
On the analytical world, queries can be a source of consuming time. Time lost waiting for answers is a productivity loss. As organizations amass more and more data, even basic queries can take a substantial amount of time. That’s why Axional Analytics is fully integrated with IBM Informix Warehouse Accelerator (aka IWA), the first columnar in-memory database that can scan tons of data in seconds.
Informix Warehouse Accelerator is a perfect system to implement datawarehouses. It provides extreme performance while removing most of the tuning required for traditional datawarehouse systems.
The Informix accelerator has this awesome key features:
Provide acceleration without manual tuning for each workload. No index to create, no index reorganization required, no statistics to collect, no partitioning, no tuning, no storage management.
Summary tables or materialized views are not needed, you can go directly with fact tables.
Scans billions of rows in milliseconds or seconds. the deep columnar data representation, query processing on compressed data voids the need of tuning.
Linear performance. The query performance depends on data volume not on query complexity.
Axional Analytics, also provides some key features that allow to work better and more integrated with Informix Warehouse System:
Automatic snapshot of ETL updated data to Warehouse Accelerator.
SQL Statement optimization to ensure IWA compatibility.
Warnings about queries with table or columns not present in datamart.
Axional ROLAP Engine
The purpose of the ROLAP Engine is to respond to queries from cube end users. The OLAP queries are performed in the server side using Java interface. To increase performance, Axional ROLAP Engine may runs on specially configured server computers.
With its specific architecture, Axional ROLAP engine includes the following features:
Minimize the amount of data movement between RDBMS and client app by:
Storing (temp tables, derived tables) intermediate query results from multi-pass SQL and joining them in RDBMS.
Pushing all calculations to RDBMS.
Pushing all filtering and grouping to RDBMS.
Pushing all filtering and grouping on calculations to RDBMS.
Optimize SQL based on RDBMS type.
Support multi-pass SQL. This is required to answer analytical questions that cannot be answered in a single pass of SQL.
Handle normalized and de-normalized data models efficiently.
Automatically resolve typical start/snowflake query errors and conflicts.
The Axional OLAP module uses a multidimensional view of aggregate data to provide quick access to strategic information for further analysis. Users can gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information. This allows everyone in the organization to see the corporate data warehouse from their particular point of view.
This module also provides users with the information they need to make effective decisions about an organization’s strategic directions. Its features range from basic navigation and browsing, to complex calculations, to more serious analyses–such as time series. As decision-makers gain experience with OLAP capabilities, they move from data access to information to knowledge.
The tool’s goal is to convert your business data into business intelligence. To achieve this objective, it uses a pre-configured datamart structure that at the same time offers a flexible configuration.
The Axional OLAP module insulates users from complex query syntax, modeling designs and elaborate joins. Its multidimensional view of data provides the foundation for analytical processing through flexible information access. Users are able to analyze data across any dimension, at different levels of aggregation, with equal functionality and ease.
The OLAP module works on facts and facts are numbers. A fact could be a count of sales, the sum of the sales amounts, or an average of sales amounts. Facts are also known as Measures and are organized into dimensions which are ways that the facts can be broken down. For instance, total sales might be able to be broken up by geography. Similarly, total sales might also be broken down by time. Dimensions have also hierarchies of levels.
The set of dimensions and measures is called a Cube. The cubes facilitate multifaceted data analysis in response to complex business queries. Because they can be made up of more than three dimensions (hypercube), in-depth analysis is enabled, allowing users to gain comprehensive and valuable business insights.
Axional Analytics allows for virtually unlimited numbers of dimensions to be added to the data structure (OLAP cube), allowing for detailed data analysis. Analysts can view data sets from different angels or pivots.
It uses a relational database which directly stores the information contained in the various cubes (ROLAP model). This approach translates native OLAP queries into the appropriate SQL statements. Thanks to the use of DB high performance tools, such as Informix IWA, this approach performs as well as a MOLAP database.
This approach also enables an easy implementation of In-memory analytics allowing for faster analysis, rapid insights and minimal IT involvement. The In-memory analytics approach eliminates the need to store pre-calculated data in the form of OLAP cubes or aggregate tables. It offers business-users faster analysis, and access to analysis of large data sets, with minimal data management requirements.
With the ETL process, several cubes can be created, each one with a specific set of dimensions and measures more fitted to the requirements of a particular group of users. With cubes, managers gain insight into data through fast, accurate, and flexible methods to various views of business metrics that have been transformed from raw data into meaningful information.
There are two different OLAP Clients providing an easy-to-use interface that will automatically retrieve and format data based on existing model definitions, for every query made by user:
Drill down: Zoom on details on certain part of the cube
Drill up: aggregating details to a summarized level
Sort, fill and drill in data dynamically
View chart visualizations
Can export data to Excel
Advanced sorting and filtering
ANALYTICS WEB CONSOLE: AXIONAL OLAP
AXIONAL MOBILE OLAP