Model for back analysis of objects from patterns

5.4 N ETWORK ANALYSIS FOR BUSINESS PROCESS IN SAP (L2)

5.4.6 Model for back analysis of objects from patterns

As we have shown, we can identify set of patterns P1, ... Pd from original dataset X = {x1, x2, ..., xn}, xk

= {xk1, xk2, ..., xkm}  R^m. In realized experiment there is X = D1. We can identify the business meaning of the pattern. The dataset X was defined as an object-attribute table (vector of attributes), where attributes were calculated from the context of business process and from the log of the business process, that provided the data for initial log.

Every pattern Pj is defined by representative vector Tj = {tjA, tjB, tj1 ..., tjm}. This representative vector defines mean parameters of the pattern members.

It is important to perceive pattern in both its features – first as set of real representatives (in given context) and second as a set of descriptive rules (in our case it is the representative vector). If we find pattern in behavior of the business process (let’s assume in range from time C1 to C2), it could be interesting to see this pattern in reduced or extended date/time range of the same business process in the same context.

5.4.6.1 Finding original records for pattern from original dataset

First we show how we can obtain original record(s) from the same dataset D1 from pattern Pr. We will transform original dataset X into normalized dataset X’ = {x’1, x’2, ..., x’n}, where

∀𝑘 ∈ {1, . . , 𝑛}; 𝑗 ∈ {1, . . , 𝑚}; 𝑥′_𝑘𝑗= 𝑥_𝑘𝑗

𝑚𝑎𝑥_𝑗 (21) (maxj is defined as maximal value of every attribute).

We define distance of member xk of dataset X’ from pattern Pr as follows 𝑑(𝑥_𝑘, 𝑃_𝑟) = ∑(𝑥′_𝑘𝑗− 𝑡_𝑟𝑗)²= ∑( 𝑥_𝑘𝑗

𝑚𝑎𝑥_𝑗− 𝑡_𝑟𝑗)²

𝑚 𝑗=1 𝑚

𝑗=1

(22)

The most fit real object that represents the pattern Pr (or its representative vector) is found as xk where 𝑑(𝑥_𝑘, 𝑃_𝑟) is minimal. If pattern Pr has i members, we can find i smallest 𝑑(𝑥_𝑘, 𝑃_𝑟).

The principle is shown in Fig. 27

Result: we tried to identify members of patterns 1 ... 11 by presented concept. In case of patterns with one member the correct user vector was identified; in case of patterns with more members we found correct members (by minimal function).

5.4.6.2 Decision support: finding pattern for new object in dataset

When patterns P1, ... Pd are identified from original dataset, it can happen we need to analyze new object yk = {yk1, yk2, ..., ykm}  R^m and know to what pattern it fits best and if representative behavior also fits to this pattern. Principle of the procedure is shown in Fig. 26

Similarly as in 5.4.6.1 we will transform original dataset X into normalized dataset X’ = {x’1, x’2, ..., x’n} (formula 21) and calculate maxi for all attributes. Then we calculate distance of new object y’k

normalized by original dataset from every pattern P1, ... Pd and find pattern Pk with minimal distance 𝑑(𝑦_𝑘, 𝑃_𝑖); 𝑖 ∈ {1. . 𝑑}.

Fig. 26. Principle of finding pattern for new object

The distance 𝑑(𝑦_𝑘, 𝑃_𝑖) is calculated by the same method as (22):

𝑑(𝑦_𝑘, 𝑃_𝑖) = ∑( 𝑦_𝑘𝑗

𝑚𝑎𝑥_𝑗− 𝑡_𝑖𝑗)²

𝑚 𝑗=1

(23)

Result: We selected existing user from original dataset and it fits to correct pattern (as we expected).

The we collected data from previous year for the user and we analyzed distances of this new object to the patterns. The object fits best with pattern 1. The representative parameters of pattern 1 were compared with representative values of this new object and consistency was found.

5.4.6.3 Finding original records for pattern from extended/reduced original dataset

Next we show how we can obtain original record(s) from in dataset X1 from pattern Pr, where X1 is time-extended or time-reduced dataset to dataset X. Time-extended dataset means dataset from the same business process but scanned (logged) during wider time frame. Time-reduced dataset means dataset from the same business process but scanned (logged) during shorter time frame.

Fig. 27. Principle of identifying nearest objects for pattern Pr

We expect that the pattern represents given behavior and this behavior can be found also in reduced or extended dataset. But we must keep in mind that pattern is defined by set of attributes. Attribute can be representative (describes property that represents the cluster and it is calculated – mean of total process time of one case, mean of maximum or minimum time, number of used order types,) or cumulative (describes value that is cumulative and directly depends on number of records in cluster – as absolute number of activities, number of used orders). We call some attributes marginal (if they represent value of some margin or extreme – for example, max/min value) – these attributes tend to be representative, but in large datasets and they can be easy changed by extreme or error record.

As the extended/reduced dataset covers another base of inspected activities (and objects as well), we can take into account only attributes from pattern that we call representative – they are not dependent on number of logged activities (if the process does not change). Also representative attributes are presented in normalized form, it means that is some case they can be valid for reduced or extended dataset.

We show types of used attributes in following Table 24 (R – representative, C – cumulative, M – marginal).

ActivitiesNR TimeTotal TimeAverage TimeMax TimeMin Role r1 r2 r3 r4 r5

C C R M M C R R R R R

r6 r7 r8 r9 r10 NrRoles

Roles

NrInvoice NrOrders PO

NrVendors Vendors

AvBus Process

AvAppr Proces

R R R R R R C C C R R

Table 24. Patterns – types of attributes (R/C/M) in experiment 5.4

We experimentally used a dataset X1, that we constructed from dataset D1 using filter for invoices only created in 2017 year. The dataset X1 has 144 966 activities.

We used procedure of finding original record same as in 5.4.6.2 but for dataset X1. All attributes of used patterns were applied for this experiment.

Pattern Result (the most similar records)

1 Original pattern 1 in D1 has 75 members.

The nearest user from pattern 1 representative in X1 has distance 0,0956, 75th user (sorted by distance) has distance 0,318.

75 nearest users from X1 were compared if they exist in pattern D1: 54 exists, 21 does not exist.

From another side, we analyzed 21 users from D1 that were not found near the representative – 21 of them were active in 2018 more then 20% (their activities in 2018 were not taken into X1).

2 Original pattern 2 in D1 has 45 members.

The nearest user from pattern 2 representative in X1 has distance 0,05288, 45th user (sorted by distance) has distance 0,2129.

45 nearest users from X1 were compared if they exist in pattern D1: 29 exists, 16 does not exist.

From another side, we analyzed 16 users from D1 that were not found near the representative – 5 of them were active in 2018 more then 20% (their activities in 2018 were not taken into X1) and 1 of them were active only in 2018.

3 Original pattern 3 in D1 has 69 members.

The nearest user from pattern 3 representative in X1 has distance 0,0880, 69th user (sorted by distance) has distance 0,2315.

69 nearest users from X1 were compared if they exist in pattern D1: 48 exists, 21 does not exist.

From another side, we analyzed 21 users from D1 that were not found near the representative – 11 of them were active in 2018 more then 20% (their activities in 2018 were not taken into X1) and 5 of them were active only in 2018.

4 distance) has distance 0,122302.

42 nearest users from X1 were compared if they exist in pattern D1: 26 exists, 16 does not exist.

From another side, we analyzed 16 users from D1 that were not found near the representative – 12 of them were active in 2018 more then 20% (their activities in 2018 were not taken into X1) and 4 of them were active only in 2018.

That is the reason that another users were selected as nearest.

The same user 10 was found (distance near 0). The second nearest user has distance 1,76.

The same technical user 12 was found (distance near 0).

The nearest (distance 0,519) was the same user 27 as was found in pattern 7. Another near node was found (distance 0,7) although the original patter had one member.

The same user 29 was found (distance near 0). The second nearest user has distance 0,67.

The same user 36 was found as the nearest (distance 0,927), but another 10 users are in distance +0,1.

The resolution for finding of this pattern is very small.

The same user 98 was found (distance near 0,32). The second nearest user has distance 0,59.

11 Another user 58 was found (distance near 0,25).

The original user is far from pattern 11 representative – the user has only one activity in 2017 year and 10 activities in 2018 (these are not present in reduced dataset X1).

Table 25. Finding original record in experiment 5.4

Not matter if pattern was one member or many members, the range of distance, where users were found, was in the interval <0; 0,3>.

Summary of visual curve of the graph of distance distribution from 5.4.6.3: we can meet several typical curves of the graph described in Table 26.

1 This curve represents zero distance of

one node to given pattern (original pattern represented one outlier node).

Next set of nodes differs with less difference and on the other site there is set of nodes with growing distance (we expect they are also outliers but from other clusters).

2 This curve represents pattern

representing large set of nodes in original dataset. We can see that distance is slowly growing. On the other site there is set of nodes with growing distance (we expect they are also outliers but from other clusters).

Table 26. Typical curves for distance distribution of distances to pattern

In document 2018 Martin Kopka SELF CITATIONS for PHD-THESIS Analysis of process data and their social aspects VŠB – Technical University of Ostrava Faculty of Electrical Engineering and Computer Science (Stránka 51-56)