icaria Technology is a company focused on data and metadata management. Our work deals with different data warehouses; relational databases, data lakes, data warehouses, hierarchical databases, based on files, etc.
The ability to access these data sources is crucial to us. That is why we constantly make an effort to study these sources’ potential, investigating the correct strategy for managing their data. We aim to provide optimal support for the functions of both icaria TDM and icaria GDPR.
During these last weeks, we have released different features related to this purpose that involve the icaria Technology architecture, shared by icaria TDM, icaria GDPR, icaria Lean Factory, as well as in the specific processes of these applications.
icaria is related to Big Data environments with different solutions. For example, in Cloudera Data Platform (CPD) environments, the usual approach is to make a mixed-use between Hive and direct file processing, generally through Hadoop and HDFS.
It is mainly in the latter where the treatment of Parquet and AVRO files has been significantly improved.
Processing large volumes of data is a significant challenge. Doing the math, if we seek to deliver data on time, we quickly find that the processing speed must exceed a million records per second. With this objective in mind, we have:
All this was done with the aim of speeding up the processing, dissociation, anonymization, and delivery of data on demand.
The recently deployed improvements provided a 50% reduction in execution time.
The new version of icaria FileSQL, the technological complement of the icaria Technology architecture with which we treat files as data sources, has been completely redesigned to facilitate its extension and the future incorporation of new types of file formats; in the roadmap: fixed width, DLI, XML, JSON or Excel
At the moment, compared with the previous version, we have added and improved the treatment of CSV, TSV, and equivalent files with values separated by a delimiter.
Processing time, storage, and security are precious resources. Therefore, the new JDBC driver incorporates the ability to:
The icaria Technology product team is constantly receiving requests to add support for new databases.
The last to enter the list of compatible databases comes from Huawei. This is your GaussDB database. Given its characteristics, compatibility has been completed with a better implementation of the JDBC driver, management of new connections, and a more agile connectivity retry policy.
In addition, concurrent compatibility with Oracle DB versions is improved, including compatibility from version 9i to version 21c, and including the current long-term release Oracle Database 19c.
In relational databases, the use of auto-generated fields is common - for primary keys, for example. This type of field represents a challenge in the segmentation of structures that seek to deliver a coherent subset of information, such as a client in all systems.
In the last weeks, icaria TDM's ability to manage this problem with different approaches has been improved. The main RDBs reviewed on this occasion have been Oracle, SQL Server, and DB2: repetitive delivery of data, deletion of consumed data, data generation, etc. have been enabled.
The roadmap set out by the icaria Technology team for the coming months regarding data warehouse compatibility focuses on extending the functionality of icaria GDPR and icaria TDM in SAP S/4HANA and SAP R/3 environments, while the development of the specific connector for Salesforce will be completed.
We aim to expand the current capacity for integrating data sources based on SAP and Salesforce with the rest of the organization's systems, achieving anonymization and generation of test data in all systems coherently and simultaneously.
icaria Technology is constantly evolving and improving. In this article, we’ve introduced you to some of the most relevant recent advances regarding compatibility with data sources, but this isn’t the only thing we’ve been working on.
In future articles we will review the multi-node architecture we’ve been developing for months, and whose objective is to achieve better scalability, parallelization, resilience, and task distribution. Multi-node execution will facilitate execution in public clouds -AWS, Azure, or GCP- and the use of auto-scalable containers by integrating it with Kubernetes.