data sources
23/02/2023

New data sources at icaria Technology

icaria Technology is a company focused on data and metadata management. Our work deals with different data warehouses; relational databases, data lakes, data warehouses, hierarchical databases, based on files, etc.

The ability to access these data sources is crucial to us. That is why we constantly make an effort to study these sources’ potential, investigating the correct strategy for managing their data. We aim to provide optimal support for the functions of both icaria TDM and icaria GDPR.

During these last weeks, we have released different features related to this purpose that involve the icaria Technology architecture, shared by icaria TDM, icaria GDPR, icaria Lean Factory, as well as in the specific processes of these applications.

Big Data Improvements

icaria is related to Big Data environments with different solutions. For example, in Cloudera Data Platform (CPD) environments, the usual approach is to make a mixed-use between Hive and direct file processing, generally through Hadoop and HDFS.

It is mainly in the latter where the treatment of Parquet and AVRO files has been significantly improved.

Increasing performance with large volumes of data

Processing large volumes of data is a significant challenge. Doing the math, if we seek to deliver data on time, we quickly find that the processing speed must exceed a million records per second. With this objective in mind, we have: 

  • Optimized the use of memory and reduced the generation of objects
  • Moved light but long processes to the background and improved the treatment of remote files. 

All this was done with the aim of speeding up the processing, dissociation, anonymization, and delivery of data on demand.

The recently deployed improvements provided a 50% reduction in execution time.

New version of the icaria FileSQL JDBC driver

The new version of icaria FileSQL, the technological complement of the icaria Technology architecture with which we treat files as data sources, has been completely redesigned to facilitate its extension and the future incorporation of new types of file formats; in the roadmap: fixed width, DLI, XML, JSON or Excel

Treatment of CSV extended

At the moment, compared with the previous version, we have added and improved the treatment of CSV, TSV, and equivalent files with values separated by a delimiter.

Treatment of remote, compressed, and encrypted files

Processing time, storage, and security are precious resources. Therefore, the new JDBC driver incorporates the ability to:

  • Treat remote files, through SSH, remote network locations, various file systems, and other accesses that will soon be enabled
  • Manage compressed files, with the capability to treat the most common formats, enabling a reduction in network transmission traffic and less consumption of disk resources
  • Security is crucial, which is why the abilities to process encrypted files, decrypt them in memory on the fly, dissociate their information, and deliver them to the destination environment without sensitive data are enabled.

New data sources

The icaria Technology product team is constantly receiving requests to add support for new databases.

The last to enter the list of compatible databases comes from Huawei. This is your GaussDB database. Given its characteristics, compatibility has been completed with a better implementation of the JDBC driver, management of new connections, and a more agile connectivity retry policy.

In addition, concurrent compatibility with Oracle DB versions is improved, including compatibility from version 9i to version 21c, and including the current long-term release Oracle Database 19c.

Management of auto-generated fields

In relational databases, the use of auto-generated fields is common - for primary keys, for example. This type of field represents a challenge in the segmentation of structures that seek to deliver a coherent subset of information, such as a client in all systems.

In the last weeks, icaria TDM's ability to manage this problem with different approaches has been improved. The main RDBs reviewed on this occasion have been Oracle, SQL Server, and DB2: repetitive delivery of data, deletion of consumed data, data generation, etc. have been enabled.

Next steps

The roadmap set out by the icaria Technology team for the coming months regarding data warehouse compatibility focuses on extending the functionality of icaria GDPR and icaria TDM in SAP S/4HANA and SAP R/3 environments, while the development of the specific connector for Salesforce will be completed.

We aim to expand the current capacity for integrating data sources based on SAP and Salesforce with the rest of the organization's systems, achieving anonymization and generation of test data in all systems coherently and simultaneously.

In constant evolution

icaria Technology is constantly evolving and improving. In this article, we’ve introduced you to some of the most relevant recent advances regarding compatibility with data sources, but this isn’t the only thing we’ve been working on.

In future articles we will review the multi-node architecture we’ve been developing for months, and whose objective is to achieve better scalability, parallelization, resilience, and task distribution. Multi-node execution will facilitate execution in public clouds -AWS, Azure, or GCP- and the use of auto-scalable containers by integrating it with Kubernetes.

We can assist you at icaria Technology. Get in touch with us and speak directly to our team about your project's needs and the tools that can enhance efficiency and quality in processes.

Share
magnifiercrossmenuchevron-down