Applied data quality criteria - Applied theories and their contributions

3.3 Applied theories and their contributions

3.3.5 Applied data quality criteria

As already explained in section 2.2.1 data quality criteria play an important role for the de

termination of the data respectively information quality. What does this mean in practice?

According to Morbey (2011, p. 2526) there are seven data quality criteria in data qual

ity management, which have to show highquality data and which enable the recognition

of the quality of data. These criteria are machine verifiable and corrections can be made if necessary. The data quality criterion horizontal completeness is fulfilled if no defective or useless data exist any more and the business requirements have been taken into account when collecting the data. To ensure syntactical correctness, all data entered must be saved in a standardized format. The criterion consistency is intended to ensure and verify that there are no discrepancies within the data records and that these do not violate any business rules.

Data must be uptodate in time and context in the sense of the quality criterion accuracy (incl. topicality). Furthermore, the spelling of names and designations in particular must be accurate. A further criterion for highquality data is freedom from repetition, in which the data stock must be checked for redundancies. The integrity criterion is used in this context to check whether reference data or data links are missing. The prerequisites for the last data quality criterion are fulfilled if data has a uniform consistency across systems (Morbey, 2011, p. 26).

Wang and Strong (1996) have in their study explored a further way of looking at data quality by means of a twophase survey, in which they asked data users and consumers about their opinion and perception about the dimensions of data quality. In a theoretical framework they have previously defined four basic aspects of data quality, which data must fulfil and to which they have assigned the dimensions defined in the second survey in a further step.

• Accessibility: The data is available to the data user and user in a form that allows them to access and retrieve it.

• Comprehensibility: The data is understandable and interpretable and therefore cannot be captured or stored in any foreign language.

• Topicality: The data are relevant and topical for use in a decisionmaking process.

• Accuracy: The data are correct, factual and reliable (Wang & Strong, 1996, p. 9).

In the first phase of the survey, 25 people from companies as well as 112 students who had previously worked on this topic in a professional capacity were interviewed. Participants were asked to list the characteristics (in addition to the four aspects of accessibility, compre

hensibility, timeliness and accuracy) that came to their mind in relation to data quality. They were then asked to draw up a list of 32 characteristics that they had learned either from liter

ature studies on the discipline or from discussions with researchers (Wang & Strong, 1996, p. 10). A total of 178 different characteristics were collected in large parts (see table 3.5 on page 41).

3.3 Applied theories and their contributions | 41 Table 3.5 Data quality attributes from an applied study (Wang & Strong, 1996, p. 11) Ability to be Joined

with

Ability to Download Ability to Identify Er

rors

Ability to Upload Acceptability Access by Competi

tion

Accessibility Accuracy Adaptability Adequate Detail Adequate Volume Aestheticism

Age Aggregatability Alterability Amount of Data

Auditable Authority Availability Believability

Breadth of Data Brevity Certified Data Clarity

Clarity of Origin Clear Data Responsi

bility

Compactness Compatibility Competitive Edge Completeness Comprehensiveness Compressibility

Concise Conciseness Confidentiality Conformity

Consistency Content Context Cost

Cost of Accuracy Cost of Collection Creativity Critical

Current Customizability Data Hierarchy Data Improves Effi

ciency

Data Overload Definability Dependability Depth of Data

Detail Detailed Source Dispersed Distinguishable Up

dated Files

Dynamic Ease of Access Ease of Comparison Ease of Correlation Ease of Data Exchange Ease of Maintenance Ease of Retrieval Ease of Understanding Ease of Update Ease of Use Easy to Change Easy to Question

Efficiency Endurance Enlightening Ergonomic

ErrorFree Expandability Expense Extendibility

Extensibility Extent Finalization Flawlessness

Flexibility Form of Presentation Format Integrity

Friendliness Generality Habit Historical Compatibil

ity

Importance Inconsistencies Integration Integrity

Interactive Interesting Level of Abstraction Level of Standardiza

tion Localized Logically Connected Manageability Manipulate

Measurable Medium Meets Requirements Minimality

Modularity Narrowly Defined No lost information Normality

Novelty Objectivity Optimality Orderliness

Origin Parsimony Partitionability Past Experience

Pedigree Personalized Pertinent Portability

Preciseness Precision Proprietary Nature Purpose

Quantity Rationality Redundancy Regularity of Format

Relevance Reliability Repetitive Reproducibility

Reputation Resolution of Graph

ics

Responsibility Retrievability

Revealing Reviewability Rigidity Robustness

Scope of Info Secrecy Security SelfCorrecting

Semantic Interpreta

tion

Semantics Size Source

Specificity Speed Stability Storage

Synchronization Timeindependence Timeliness Traceable

Translatable Transportability Unambiguity Unbiased

Understandable Uniqueness Unorganized UptoDate

Usable Usefulness User Friendly Valid

Value Variability Variety Verifiable

Volatility WellDocumented WellPresented

The second stage of the survey focused on the classification of the collected character

istics. In order to underpin the reliability of this survey, a broad spectrum of people with divergent views were included in the evaluation. The survey looked for graduates of a US

American university who are employed in different industries, departments and management levels and who regularly make decisions on the basis of data. A total of 1,500 people were randomly selected from the original 3,200 graduates. The 179 attributes were slightly mod

ified for the second survey, leaving 118 attributes. In addition, homologous characteristics were grouped and divided into 20 dimensions. The rating scale ranged from 1, extremely important, to 9, not important (Wang & Strong, 1996, p. 13).

The result was presented in a further step in the form of a ranking. According to the survey most participants found credibility to be the most important dimension. The following list represents an analogous translation of the developed ranking of Wang and Strong (1996, p. 1415).

1. credibility 2. added value 3. relevance 4. accuracy 5. interpretability 6. intelligibility 7. accessibility 8. objectivity 9. topicality 10. completeness

11. traceability 12. reputation 13. consistency 14. cost efficiency 15. ease of use

16. variety of data & sources 17. conciseness

18. security access

19. (appropriate) amount of data 20. flexibility

Rohweder et al. (2015, p. 2728) use the dimensions of Wang and Strong (1996, p. 1415), but some changes have also been made. On the one hand, the dimensions were classified into two forms, useful information and unacceptable information, in order to classify the quality of information. Information is useful if it can be used for its intended purpose. Information becomes unusable if no use can be made of it. Also in the use of the terms they deviate marginally (Rohweder et al., 2015, p. 27). In each case one speaks of information quality, whereas in the original treatment of this topic the talk was of data quality. However, this fine deviation was already examined in the subsection 3.3.3 and finally both terms were found to be congruent (Gebauer & Windheuser, 2015, p. 87). A further difference to the dimensions of Wang and Strong (1996) lies on the one hand in the number and on the other hand in the selection of the dimensions. The Wang and Strong (1996) definition amounts to 20, while Rohweder et al. (2015, p. 2829) have set 15. Among others, the dimensions access

3.3 Applied theories and their contributions | 43 security, ease of operation, traceability, flexibility, accuracy, cost efficiency and variety of data and data sources were removed due to their low impact on quality, according to the authors, and replaced by the dimensions free of error and ease of manipulation (Rohweder et al., 2015, p. 2728).

system-supported

• accessibility

• workability

purpose-dependent

• actuality

• added value

• completeness

• appropriate scope

• relevance

presentation-related

• comprehensibility

• clarity

• consistent presentation

• clear interpretability

Inherent

• high reputation

• correctness

• objectivity

• credibility

IQ

Figure 3.4 4 IQcategories and 15 IQdimensions (ad. Rohweder et al., 2015, p. 30) Figure 3.4 shows the source of information is centrally located surrounded by four as

pects, systemsupported, inherent, representational and purposeful, and the 15 dimensions, all of which are essential for a functioning IT system. On top of that the four order terms each represent four facets of an information system: system, content, representation and use. If, for example, there is a problem with the IQ dimension clarity, the analysis and problem so

lution must be applied to the representation (Rohweder et al., 2015, p. 3031). It is therefore not sufficient for a high quality of information if only one of the four generic terms, which in turn encompass several dimensions, is fulfilled. Quality is ultimately based on the assurance of all four aspects (Rohweder et al., 2015, p. 2930).

4 | Empirical studies on designing for consistency of evaluation in EGIT

This chapter contains the research design of the dissertation project. It refers to the studies that led to the findings. For the individual studies the procedure is used which is most useful for the project. All studies are combined in a Design Science Research (DSR) project. Sec

tion 4.1 explains in detail what it is all about. The study in section 4.2 and section 4.3 goes through the socalled relevance cycle. They prove that the problem space is relevant at all.

The studies in section 4.4 and section 5, in turn, work out the sought artifact in different steps and in a different context.

The first study “The role of IT governance in digital operating models” in section 4.4 aims to clarify the role of IT governance in the enterprise. In particular, new digital business models are discussed. The study was carried out using both a qualitative and a quantitative approach. This quantitative survey uses the same data set that is used for the study “Designing Organizational Structure In The Age Of Digitization” (Schwer & Hitz, 2018).

A purely qualitative approach has been chosen for the study “ITBudgeting Processes in Swiss Banks and How They Are Influenced by Rapidly Changing Regulatory Require

ments” (Hitz, Krey, Albath, Wyss, & Thoma, 2018).

Finally, the two preliminary studies form the basis for the last study in section 5. A qualita

tive research approach is chosen to determine the principles of data definition. Subsequently, the dependencies between principles of data definition, data governance and experienced data quality are determined with a quantitative study.

Clarification of contribution To clarify the contribution of the first named author in the carried out studies, the second named authors mainly shared the effort for data collection and validation. All data preparation, analysis, research and presentation at conferences were carried out by the first named author. In addition, coauthors who were part of the IGA project of this dissertation project were also involved in the role of a reviewer. The published articles have been shortened due to the requirements of the journals. Where necessary, the content of the studies will therefore be discussed in greater detail in this dissertation. In particular, it will be clarified how the individual studies contributed to defining and verifying the principles for data definition and thus optimally supported the entire design process.

4.1 Overview of the research framework and methodology

In document Hlavní práce55039_hitc00.pdf, 5 MB Stáhnout (Stránka 59-66)