Data extraction describes the extraction of data from a system. In the context of process mining, this means that event data is extracted from an IT system to perform a data transformation and analysis.
What are the extraction methods?
There are different methods for extracting data, depending on the IT system and required data format. With some systems, the data can be exported to any file format as CSV files with the push of a button. With other programs, you’ll have to address their API (Application Programming Interface). In this case, the data connection takes place on the level of the source code. Program APIs often differ from each other in data structure, formats, objects, variables and remote calls and need to be addressed specifically. Information about these differences can usually be found in the API documentation.
How does data extraction work?
How the data extraction happens depends on which extraction method you select.
If you export data manually using a graphical user interface, you only need to select and export the required data, tables, and so on.
However, if the data is exported using a program’s API, the procedure is usually as follows:
- Evaluate the formats, data structure, objects, variables, and remote API calls.
- Query the API for the required data.
- Save the response data in the desired format.
All of these steps are usually mapped with a query script or workflow. After data extraction, a data transformation is performed if needed.