Data transformation is the transformation and alignment of data sets to each other or to a certain schema. Data transformation takes place after data extraction. The goal might be to integrate data sets or load them into another IT system. In process mining, data transformation is a component of data preprocessing.
Why is data transformation important?
To successfully transform data, the data needs to be uniform, or standardized. Different source systems, different table schemata, or different data types can create variations in data. Data transformation ensures the relations between data are preserved and transferred during data integration. As a rule, during transformation, data is adjusted either to other data or to a specific target format.
If most of the data is already in the same format, you can define a target format based on that data. Only the data that doesn’t correspond to this format needs to be adjusted.
If a specific format is required due to database or software limitations, all data must be transformed according to the specified target format.
How is data transformed?
Before data can be transformed, it must be extracted. The one exception is for data that is stored in database systems. This data can be transformed directly in the database using certain instructions, such as SQL commands.
After data has been extracted, define it into a target format. When converting data, you’ll need to know the specifications of the source format and the target format to convert one format to the other. Using fixed definitions and assignments, you can convert the data in the source file and assign it values that correspond to the target format. Then, you can save this transformed string of values or characters as a new file, the converted output file.
Data transformation also involves handling empty data values. Empty values occur if, for instance, an object does not have a certain attribute—the attribute is not applicable to the object. You have to decide whether the value for such attributes should be either empty or “NULL.” How you handle these values depends on the transformation target format. In databases, for example, you should enter “NULL” values, since empty values can lead to errors during transformation.
These are the steps in a data transformation:
- Extract the data.
- Identify the appropriate target format.
- Define the target format.
- Convert the extracted data to the target format.
- Save the converted data as a new file.