data subsetting
26/09/2022

Data subsetting: everything you need to know

Data subsetting has emerged from a need within test environments to save space, on the one hand, resulting in cost reductions, and on the other, to comply with regulations. Regulations that state not only that they must protect the data but also that they must take the minimum amount of data possible. 

Continue reading to find out more about data subsetting and the valuable new features this process includes for data processing in development environments.

What is data subsetting

The concept of data subsetting defines the process by which a reduced set of data from a production database is separated to transfer this to a non-production environment. 

Through the discarding or extraction of data, the goal of data subsetting is to reduce the size of the data to make it more manageable, whilst maintaining a set of relevant data for use. 

To carry out data subsetting processes, data filters are applied, to make it possible to access the right data to generate a relevant, consistent set that can be employed in software testing processes. 

In a way, data subsetting employs the test data from the same perspective as statistical science: employing representative samples instead of an entire population.

Some of the benefits of data subsetting include:

  • Reduction of time-to-market of software solutions, as the time required for testing processes is minimised.
  • Reduction of the storage space needed for development environments. Possibility of converting data sets in terabytes into gigabyte-sized sets.
  • Data management to avoid the proliferation of sensitive data and ensure compliance with data protection laws.

Advantages of data subsetting

Constant deliveries

As pointed out previously, the data subsetting process consists of the separation of a reduced set of data from a production database to transfer this to a non-production environment. 

In this way, the data of value remains directly available through a process that can be planned and periodic. In other words, subsetting guarantees that the data is delivered to the selected environments as often as necessary.

Data subsetting also allows multiple testers and developers to work in parallel, as data delivery can take place without interrupting the operations in the destination environment. This contributes to achieving greater efficiency, avoiding processes that slow down deliveries such as data editing, movement or erasure. 

User independence

As the data subsetting process delivers the data the user needs on demand, they do not depend on a third party to carry out this subsetting. You can simply take the data you need when you need it. 

Test automation

Using the right data subsetting tools, companies can save themselves a large number of manual tasks and reduce the resources needed for software testing. 

How it works

An ideal process to deliver the useful data to the developer or tester would be as follows:

  • Location of valuable data in all environments in which this may be found.
  • At the same time, location of sensitive data that should be concealed.
  • Application of measures to protect sensitive data.
  • Extraction and delivery of datasets that are useful to developers and testers. 

With this process in mind, we can outline the following steps:

Firstly, and bearing in mind that it is sometimes difficult to know what you need to move because the data you are looking for needs to meet complex criteria, it is important to find an efficient way to do this. At icaria Technology, this prior data identification process is resolved through a search engine.  

Once this valuable data has been located, the subsetting plan begins, determining the structures to be transferred, the conditions of the instances to be selected, the complete transfer entities and the valid trajectories (from source to destination).

At this point, our icaria TDM software has already come into play through its search engine and the creation of the subsetting plan. The platform, designed to simplify data management in testing environments, is capable of segmenting the datasets and dissociating complex structures, applying the most advanced data subsetting practices to guarantee quality datasets.

This digital tool itself takes care of the identification of the different types of information that must be located and, later on, the creation of the Data Map for sensitive data that will guide the process to meet the legal requirements.

icaria TDM tool's additional features

Additionally, some of the icaria TDM tool's additional features include the following:

  • The capacity to automatically identify the relationships between entities of the data model, even if they are not documented, and the possibility of defining these in icaria Studio to guide the subsetting process, even when the relationships do not exist physically in the database. 
  • The possibility of defining different data delivery strategies, a capacity that is decisive through constant deliveries, when conflicts may occur between the existing data in the destination environment and that being delivered.
  • Different delivery retry policies, necessary to ensure the delivery of data into databases with active referential integrity and models with a tendency to circular references.

Configuration specifically for data domains, allowing the incremental management of complexity and reusing of the knowledge. In fact, MDA is always present in our working philosophy. If you are interested in this subject, you can find a more in-depth explanation here.

Dissociation during the subsetting process

A key step in any data subsetting process includes not only the extraction of complete structures but also their dissociation

Dissociation consists of replacing sensitive data with other fictitious but true data, thus maintaining the referential integrity including between databases in different technologies, so that the dataset is in accordance with the law with respect to data protection but can also be used in testing processes. 

In this way, companies ensure that they comply with all the requirements of the GDPR and data protection in testing environments.

What is the result of the data subsetting process using icaria TDM? Reduction of costs, compliance with the law and improved quality of testing management software. 

Want to know more about how data subsetting is allowing a reduction in the waiting times and operating costs in testing processes? Contact the icaria team or request an icaria TDM demo and see for yourself.

Share
Funded by
Certificates and awards
magnifiercrossmenuchevron-down