3.3 Applied theories and their contributions
3.3.5 Applied data quality criteria
As already explained in section 2.2.1 data quality criteria play an important role for the de
termination of the data respectively information quality. What does this mean in practice?
According to Morbey (2011, p. 2526) there are seven data quality criteria in data qual
ity management, which have to show highquality data and which enable the recognition
of the quality of data. These criteria are machine verifiable and corrections can be made if necessary. The data quality criterion horizontal completeness is fulfilled if no defective or useless data exist any more and the business requirements have been taken into account when collecting the data. To ensure syntactical correctness, all data entered must be saved in a standardized format. The criterion consistency is intended to ensure and verify that there are no discrepancies within the data records and that these do not violate any business rules.
Data must be uptodate in time and context in the sense of the quality criterion accuracy (incl. topicality). Furthermore, the spelling of names and designations in particular must be accurate. A further criterion for highquality data is freedom from repetition, in which the data stock must be checked for redundancies. The integrity criterion is used in this context to check whether reference data or data links are missing. The prerequisites for the last data quality criterion are fulfilled if data has a uniform consistency across systems (Morbey, 2011, p. 26).
Wang and Strong (1996) have in their study explored a further way of looking at data quality by means of a twophase survey, in which they asked data users and consumers about their opinion and perception about the dimensions of data quality. In a theoretical framework they have previously defined four basic aspects of data quality, which data must fulfil and to which they have assigned the dimensions defined in the second survey in a further step.
• Accessibility: The data is available to the data user and user in a form that allows them to access and retrieve it.
• Comprehensibility: The data is understandable and interpretable and therefore cannot be captured or stored in any foreign language.
• Topicality: The data are relevant and topical for use in a decisionmaking process.
• Accuracy: The data are correct, factual and reliable (Wang & Strong, 1996, p. 9).
In the first phase of the survey, 25 people from companies as well as 112 students who had previously worked on this topic in a professional capacity were interviewed. Participants were asked to list the characteristics (in addition to the four aspects of accessibility, compre
hensibility, timeliness and accuracy) that came to their mind in relation to data quality. They were then asked to draw up a list of 32 characteristics that they had learned either from liter
ature studies on the discipline or from discussions with researchers (Wang & Strong, 1996, p. 10). A total of 178 different characteristics were collected in large parts (see table 3.5 on page 41).
3.3 Applied theories and their contributions | 41 Table 3.5 Data quality attributes from an applied study (Wang & Strong, 1996, p. 11) Ability to be Joined
with
Ability to Download Ability to Identify Er
rors
Ability to Upload Acceptability Access by Competi
tion
Accessibility Accuracy Adaptability Adequate Detail Adequate Volume Aestheticism
Age Aggregatability Alterability Amount of Data
Auditable Authority Availability Believability
Breadth of Data Brevity Certified Data Clarity
Clarity of Origin Clear Data Responsi
bility
Compactness Compatibility Competitive Edge Completeness Comprehensiveness Compressibility
Concise Conciseness Confidentiality Conformity
Consistency Content Context Cost
Cost of Accuracy Cost of Collection Creativity Critical
Current Customizability Data Hierarchy Data Improves Effi
ciency
Data Overload Definability Dependability Depth of Data
Detail Detailed Source Dispersed Distinguishable Up
dated Files
Dynamic Ease of Access Ease of Comparison Ease of Correlation Ease of Data Exchange Ease of Maintenance Ease of Retrieval Ease of Understanding Ease of Update Ease of Use Easy to Change Easy to Question
Efficiency Endurance Enlightening Ergonomic
ErrorFree Expandability Expense Extendibility
Extensibility Extent Finalization Flawlessness
Flexibility Form of Presentation Format Integrity
Friendliness Generality Habit Historical Compatibil
ity
Importance Inconsistencies Integration Integrity
Interactive Interesting Level of Abstraction Level of Standardiza
tion Localized Logically Connected Manageability Manipulate
Measurable Medium Meets Requirements Minimality
Modularity Narrowly Defined No lost information Normality
Novelty Objectivity Optimality Orderliness
Origin Parsimony Partitionability Past Experience
Pedigree Personalized Pertinent Portability
Preciseness Precision Proprietary Nature Purpose
Quantity Rationality Redundancy Regularity of Format
Relevance Reliability Repetitive Reproducibility
Reputation Resolution of Graph
ics
Responsibility Retrievability
Revealing Reviewability Rigidity Robustness
Scope of Info Secrecy Security SelfCorrecting
Semantic Interpreta
tion
Semantics Size Source
Specificity Speed Stability Storage
Synchronization Timeindependence Timeliness Traceable
Translatable Transportability Unambiguity Unbiased
Understandable Uniqueness Unorganized UptoDate
Usable Usefulness User Friendly Valid
Value Variability Variety Verifiable
Volatility WellDocumented WellPresented
The second stage of the survey focused on the classification of the collected character
istics. In order to underpin the reliability of this survey, a broad spectrum of people with divergent views were included in the evaluation. The survey looked for graduates of a US
American university who are employed in different industries, departments and management levels and who regularly make decisions on the basis of data. A total of 1,500 people were randomly selected from the original 3,200 graduates. The 179 attributes were slightly mod
ified for the second survey, leaving 118 attributes. In addition, homologous characteristics were grouped and divided into 20 dimensions. The rating scale ranged from 1, extremely important, to 9, not important (Wang & Strong, 1996, p. 13).
The result was presented in a further step in the form of a ranking. According to the survey most participants found credibility to be the most important dimension. The following list represents an analogous translation of the developed ranking of Wang and Strong (1996, p. 1415).
1. credibility 2. added value 3. relevance 4. accuracy 5. interpretability 6. intelligibility 7. accessibility 8. objectivity 9. topicality 10. completeness
11. traceability 12. reputation 13. consistency 14. cost efficiency 15. ease of use
16. variety of data & sources 17. conciseness
18. security access
19. (appropriate) amount of data 20. flexibility
Rohweder et al. (2015, p. 2728) use the dimensions of Wang and Strong (1996, p. 1415), but some changes have also been made. On the one hand, the dimensions were classified into two forms, useful information and unacceptable information, in order to classify the quality of information. Information is useful if it can be used for its intended purpose. Information becomes unusable if no use can be made of it. Also in the use of the terms they deviate marginally (Rohweder et al., 2015, p. 27). In each case one speaks of information quality, whereas in the original treatment of this topic the talk was of data quality. However, this fine deviation was already examined in the subsection 3.3.3 and finally both terms were found to be congruent (Gebauer & Windheuser, 2015, p. 87). A further difference to the dimensions of Wang and Strong (1996) lies on the one hand in the number and on the other hand in the selection of the dimensions. The Wang and Strong (1996) definition amounts to 20, while Rohweder et al. (2015, p. 2829) have set 15. Among others, the dimensions access
3.3 Applied theories and their contributions | 43 security, ease of operation, traceability, flexibility, accuracy, cost efficiency and variety of data and data sources were removed due to their low impact on quality, according to the authors, and replaced by the dimensions free of error and ease of manipulation (Rohweder et al., 2015, p. 2728).
system-supported
• accessibility
• workability
purpose-dependent
• actuality
• added value
• completeness
• appropriate scope
• relevance
presentation-related
• comprehensibility
• clarity
• consistent presentation
• clear interpretability
Inherent
• high reputation
• correctness
• objectivity
• credibility
IQ
Figure 3.4 4 IQcategories and 15 IQdimensions (ad. Rohweder et al., 2015, p. 30) Figure 3.4 shows the source of information is centrally located surrounded by four as
pects, systemsupported, inherent, representational and purposeful, and the 15 dimensions, all of which are essential for a functioning IT system. On top of that the four order terms each represent four facets of an information system: system, content, representation and use. If, for example, there is a problem with the IQ dimension clarity, the analysis and problem so
lution must be applied to the representation (Rohweder et al., 2015, p. 3031). It is therefore not sufficient for a high quality of information if only one of the four generic terms, which in turn encompass several dimensions, is fulfilled. Quality is ultimately based on the assurance of all four aspects (Rohweder et al., 2015, p. 2930).
4 | Empirical studies on designing for consistency of evaluation in EGIT
This chapter contains the research design of the dissertation project. It refers to the studies that led to the findings. For the individual studies the procedure is used which is most useful for the project. All studies are combined in a Design Science Research (DSR) project. Sec
tion 4.1 explains in detail what it is all about. The study in section 4.2 and section 4.3 goes through the socalled relevance cycle. They prove that the problem space is relevant at all.
The studies in section 4.4 and section 5, in turn, work out the sought artifact in different steps and in a different context.
The first study “The role of IT governance in digital operating models” in section 4.4 aims to clarify the role of IT governance in the enterprise. In particular, new digital business models are discussed. The study was carried out using both a qualitative and a quantitative approach. This quantitative survey uses the same data set that is used for the study “Designing Organizational Structure In The Age Of Digitization” (Schwer & Hitz, 2018).
A purely qualitative approach has been chosen for the study “ITBudgeting Processes in Swiss Banks and How They Are Influenced by Rapidly Changing Regulatory Require
ments” (Hitz, Krey, Albath, Wyss, & Thoma, 2018).
Finally, the two preliminary studies form the basis for the last study in section 5. A qualita
tive research approach is chosen to determine the principles of data definition. Subsequently, the dependencies between principles of data definition, data governance and experienced data quality are determined with a quantitative study.
Clarification of contribution To clarify the contribution of the first named author in the carried out studies, the second named authors mainly shared the effort for data collection and validation. All data preparation, analysis, research and presentation at conferences were carried out by the first named author. In addition, coauthors who were part of the IGA project of this dissertation project were also involved in the role of a reviewer. The published articles have been shortened due to the requirements of the journals. Where necessary, the content of the studies will therefore be discussed in greater detail in this dissertation. In particular, it will be clarified how the individual studies contributed to defining and verifying the principles for data definition and thus optimally supported the entire design process.