Data preprocessing describes the preparation of data for analysis. This preparation consists of four core activities:
These operations take place after the data extraction, where data is retrieved from a system. Data transformation is an important part of the ETL process.
This stage is where you identify incomplete, false, and irrelevant data. In the next step, you’d replace, modify, or delete unusable data from the data set. If nonsensical or missing data is included in your dataset, the whole data picture can be distorted, which can create unreasonable or misleading results.
Data transformation includes normalizations and aggregations to make datasets more meaningful for given analysis targets. In aggregations, for example, subsequent visualizations of the data can be made more useful and meaningful. This step harmonizes data from different sources and unifies units and data schemata.
After the data transformation, different data sets can be linked together to create a uniform picture for analysis. Thanks to the now-uniform nature of the data from the transformation, this can be done via a common attribute.
Removing unnecessary and unimportant data from the data set makes any calculations more efficient and removes factors that could distort the result.
Tip: Coordinate any steps involving the removal or modification of data with the relevant departments in advance. Arbitrary modification or removal can lead to distortions, too.
Related Terms: ETL, Process Mining, Data Transformation, Data Extraction
Reach the optimized process with the Process Mining Guide.
Learn how process mining can provide valuable insights into your processes.