A RELATIONAL APPROACH TO INDEXING

(1)

DIPLOMA THESIS

Antonín Prukl

A RELATIONAL APPROACH TO INDEXING

RELA NÍ P"ÍSTUP K INDEXACI Department of Software Engineering Supervisor: Prof. RNDr. Jaroslav Pokorný, CSc.

Study Program: Computer Science

(2)

I hereby declare that I have written this diploma thesis on my own, and used no other than the named sources and aids. I agree with lending the thesis.

Prague, April 20, 2007 Antonín Prukl

(3)

V nLkterých pFípadech však nemusí být množina metod poskytovaná databázovým systémem dostaEující. Jednou z možností, jak implementovat nový index v relaEním SNBD, je využít tabulek daného systému. Tento pFístup nevyžaduje zmLny v jádru databázového systému a je tak dostupný vývojáFKm i v pFípadL, že cílový SNBD není distribuován jako open source. V rozšiFitelné databázové architektuFe je tak vyžadována pouze možnost pFidat nový datový typ do stávajícího SNBD.

V této práci byl zmínLným zpKsobem integrován UB-strom do SNBD Oracle. RelaEní tabulky související s indexem byly navrženy dvLma rKznými zpKsoby, zároveO byly zkoumány EtyFi metody pro vyhodnocení relevantních SQL dotazK. V rámci experimentK bylo pak implementované Fešení relaEního indexu porovnáno s nativním nasazením téhož indexu.

KlíEová slova: databázové systémy, relaEní indexace, UB-strom, benchmarkování

Title: A Relational Approach to Indexing Author: Antonín Prukl

Department: Department of Software Engineering Supervisor: Prof. RNDr. Jaroslav Pokorný, CSc.

Supervisor's e-mail address: jaroslav.pokorny@mff.cuni.cz

Abstract: In order to achieve efficient evaluation of SQL queries, database systems provide its users with set of integrated index access methods.

When a new access method is required for various reasons, one of the possibilities to implement such method in a relational DBMS is the way of exploiting relational tables of given database system. This approach does not involve any internal changes of database system kernel and thus it is available to all developers even when the target DBMS is not distributed as an open source. In the terms of extensible database architecture, only the availability to extend existing DBMS with a new data type is required.

In this work, UB-Tree index has been integrated into Oracle DBMS in such way. Index related tables have been designed in two different ways and four alternatives to evaluate relevant queries have been proposed and studied. Finally, several experiments have been done to compare performance of an access method implemented via the relational approach and a native kernel integration of the same method.

Keywords: database systems, relational indexing, UB-Tree, benchmarking

(6)

In database area, working with high amount of data often brings a requirement for speeding up the access time to searched entries with usage of some specialized secondary data structures. This is obvious mainly in case when the amount of searched data is very small in comparison with the volume of all data. Such structures that are used for direct access to a small subset of data instead of sequential passing through all data are called indexes.

Currently database systems provide a set of integrated indexes, e.g. B-Tree index, bitmap index etc. However, for many applications this may not be sufficient as they may require a specialized "tailor-made" access to data to improve their performance significantly.

Thus a possibility to implement and integrate new custom access method is needed.

1.1 Problem Outline

The design of extensible architectures represents an important area in database research.

Relational database servers gained advanced functionality by introducing the object- relational data model with abstract data types. Thus, object-relational database systems can be naturally employed as platforms to design an integrated user-defined database solution.

As custom data types can be stored in relational tables along with the native ones, some applications may require a specialized index structures to be built on these data types to effectively handle frequent operations. As an example, we may consider a custom data type polygon representing a polygonal object in n-dimensional space, and custom predicate INTERSECTS which tests intersection of two objects of given type. Object- relational queries can be expressed in usual declarative fashion, e.g. "SELECT * FROM polygon_table p WHERE p.polygon_object INTERSECTS :query_region". Provided only with a functional implementation which evaluates the INTERSECTS predicate, the built-in optimizer of underlying database system has to include a full-table scan into the execution plan to perform given selection. In consequence, the resulting performance will be very poor for highly selective query regions.

Other needs for custom index types may arise when native data types are regarded in an application-specific manner, e.g. when table entries with standard data types are considered as points of a multidimensional space (this is equivalent to the case when range searches according to multiple attributes are frequent) or when an application carries out sophisticated access to text or LOB table items.

C

H A P T E R

1

Introduction

(7)

systems provide developers with extensible indexing frameworks. An object-relational index type encapsulates stored functions for creating and dropping a custom index and for opening and closing index scans. Although the embedding of a custom index type is thus well supported, the actual implementation of its low-level functionality within a fully-fledged database kernel can be fairly complicated.

1.2 Implementation of New Access Method

When a new type of database index access method is needed for whatever reason, its actual implementation can be done according to three basic approach types. Particularly they are the integrating, the generic, and the relational approach as presented in [1]. This section discusses their advantages and disadvantages in relation to the difficulty of their implementation, the expected performance and their availability.

1.2.1 Integrating Approach

Outline: A new index access method is hard-wired directly into the kernel of an existing database system.

Implementation: Two types of implementation are distinguished: the Extending Approach and the Enhancing Approach. The enhancing approach is the easier one - many properties get inherited from an access method that already exists in the kernel;

e.g. B-Trees can be enhanced to become a functional B-Trees. On the other hand the extending approach stands for real adding of a new access method which comprises sophisticated support for transactions (concurrency control, locking, recovery services etc).

Performance: The expected performance is the best possible in comparison with other approach types.

Availability: Code maintenance is a very complex task and requires access to low-level kernel components which is nearly impossible when the target database is not distributed as open source.

1.2.2 Generic Approach

Outline: To overcome the restrictions of the integrating approach, such called Generalized Search Tree (GiST) has been proposed as a generic way of implementation of a new index. GiST serves as a high-level framework to plug in block-based tree structures. It has to be built only once into a database kernel and already includes support for transactions.

Implementation: New index integration is quite easy; however the actual implementation of the GiST itself remains a very complex task.

(8)

Performance: Although the framework induces some overhead, GiST-based access methods can still be of high performance.

Availability: Due to its complex implementation, GiST exists only as a research prototype and it is an open question, if and when a comparable functionality will be a standard component of major commercial database systems.

1.2.3 Relational Approach

Outline: Custom index structure is mapped into a relational schema organized by built-in access methods and all the operations are done on top of a relational query language.

Implementation: No extensions to the database kernel are required and therefore an index can be implemented with less effort when comparing with other approach types.

Performance: The performance is questionable; however it should be sufficient as mentioned in [1].

Availability: By design, a relational access method is supported by any object-relational database system. It requires the same functionality as an ordinary database user or a relational database application.

1.3 Goals

In this work a new access method via the relational approach will be implemented.

Particularly, the chosen index type is the UB-Tree which stands for a promising structure in the field of multidimensional access methods (MAMs). MAMs in common have high impact on different database application domains like data warehousing, data mining, or geographical information systems. However, they have not made their way into commercial database systems on a broad scale yet. The only exception is Transbase DBMS [2, 9] which comprises the native kernel implementation of just the UB-Tree.

All the advantages and drawbacks of relational index implementation will be studied in this work; then it will be compared with the native kernel integration, mainly with respect to performance issues.

(9)

2.1 Relational Access Method

The basic idea of a relational access method is to delegate the management of persistent data to an underlying relational database system by implementing the index definition and manipulation on top of its SQL interface. In other words, an access method is called a relational access method, if any index-related data are exclusively stored in and retrieved from relational tables.

2.1.1 Basics

Relational access methods rely on the exploitation of the built-in functionality of existing database systems. Instead of extending any database kernel component, just the native data definition language (DDL) and data manipulation language (DML) with common object-relational enhancements in the sense of SQL:1999 (mostly the object types and collections) are employed to process updates and queries related to index data. This approach can be used for implementation of both basic services of all-purpose database systems and also very specialized application-specific extensions.

In other words, the SQL layer of the DBMS is used as a virtual machine for management of persistent data. It also means that a relational access method immediately benefits from any improvement of the underlying DBMS.

2.1.2 Relational Storage of Index Data

Relational access methods are designed to operate on relations rather than on dedicated disk blocks which is common to standard block-oriented access methods of a DBMS kernel. The actual persistent storage and block-oriented management of the relations are delegated to the underlying database server. The relational access method and the database system cooperate to maintain and retrieve the index data and all the functionality of the DBMS including concurrent transactions and recovery can be reused.

In order to support queries on index tables, a relational access method can employ any built-in secondary indexes, including hash indexes, B-trees, and bitmap indexes.

Alternatively, payload data can be included into clustering by organizing index tables in a cluster or by storing them in index-organized tables.

Relational Index

(10)

2.1.3 Operations on Relational Access Method

According to previous specifications, a common block-oriented access method can be transformed to a relational access method by simple replacing each invocation of the underlying block manager by an SQL-based DML operation (e.g. calling of a function

"blocks.get(block_id)" would be replaced by "SELECT * FROM blocks WHERE id = :block_id"). Thus the original procedural style of index operations remains unchanged, whilst all I/O requests are newly handled by the DBMS.

Such simple scenario however reduces the DBMS to a plain block manager and most of its functionality remains unexploited. To maximize the architecture-awareness, two types of declarative operations have been proposed in [1] in order to reduce the possible number of DML operations submitted from a procedural environment - particularly the cursor-bound and cursor-driven operations.

Cursor-bound operation stands for a query or an update related to a relational access method such that the corresponding I/O requests on the index data can be performed by submitting O(1) DML statements, i.e. by sequentially and concurrently opening constant number of cursors provided by the underlying DBMS. Its main advantages are:

Declarative semantics. Operations are bound to the DML engine of the DBMS rather than to user-defined implementation code, therefore the DBMS gains responsibility for significant parts of the query processing. Thus the formal verification of the semantics is simplified if we can rely on the given implementation of SQL layer.

Query optimization. Whereas the database engine optimizes the execution of single closed-form DML statements, a joint execution of multiple independently submitted queries is very difficult to achieve. By using only a constant number of cursors, the DBMS captures significant parts of the operational semantics at once.

Cursor Minimization. The CPU cost of opening variable number of cursors or submitting several DML statements out of a stored procedure may become very high.

For cursor-bound operations, the relatively high cost of opening and fetching multiple database cursors remains constant with respect to the complexity of the operation.

Cursor-driven operation is a special case of cursor-bound operation where the result can be retrieved as an immediate output of a single cursor provided by the DBMS.

Particularly a query or an update related to a relational access method can be divided into two consecutive phases:

1 Procedural phase: In the first phase, index parameters are read, query specifications are retrieved and data structures required for the actual query execution may be prepared by user-defined procedures and functions. Additional DML operations on user data or index data are not permitted.

2 Declarative phase: In the second phase, only a single DML statement is submitted to the DBMS, yielding a cursor on the final results of the index scan which requires no post-processing by user-defined procedures or functions.

(11)

execution plans. However, the ability to take advantages of cursor-driven operations heavily relies on the expressive power of the underlying SQL interface, often including availability of recursive queries.

2.2 Generic Schemes of Relational Index

In [1] two generic schemes for a relational storage of index data have been identified;

particularly they are the navigational scheme and the direct scheme. This section discusses their main properties, advantages and disadvantages.

2.2.1 Navigational Scheme of Index Tables

Let P = (T, R₁, …, R_N) be a relational access method with a primary data table T and index related tables R1, …, RN.Pis called navigational scheme ( t T) ( ri Ri, 1

i n): at least one riis associated with rows { 1, …, m} tand m> 1.

Therefore, a row in an index table of a navigational index may logically represent many objects stored in the primary table. This is typical in case of hierarchical structures that are mapped to a relational schema. In other words, an index table contains data that are recursively traversed at query time in order to determine the resulting rows. To implement a navigational query as a cursor-driven operation, a recursive version of SQL like SQL:1999 is required.

Although the navigational scheme offers a straightforward way to simulate any hierarchical index structure on top of a relational data model, it suffers from the fact that navigational data are locked like any other primary data. As two-phase locking on index tables is too restrictive, the possible level of concurrency is unnecessarily decreased. For example, uncommitted node splits in a hierarchical directory may lock entire sub-trees against concurrent updates.

A similar overhead exists with logging, as atomic actions on navigational data, e.g. node splits, are not required to be rolled back in order to keep the index tables consistent with the data table. Therefore, relational access methods implementing the navigational scheme are only well suited for read-only or single-user environments.

2.2.2 Direct Scheme of Index Tables

Let P = (T, R1, …, RN) be a relational access method with a primary data table T and index related tables R1, …, RN. Pis called direct scheme ( t T) ( ri Ri, 1 i n): each riis associated with a single row t.

(12)

It means that each row in the primary table is directly mapped to a set of rows in the index tables. Inversely, each row in an index table exclusively belongs to a single row in the primary table.

The drawbacks of the navigational scheme with respect to concurrency control and recovery are not shared by the direct scheme, as row-based locking and logging on the index tables can be performed on the granularity of single rows in the primary table. For example, an update of a single row r in the primary table requires only the synchronization of index rows exclusively assigned to r. As the acquired locks are restricted to r and its exclusive entries in the index tables, they do not unnecessarily block concurrent operations on other primary rows. In contrast to navigational indexes, the direct scheme inherits the high concurrency and efficient recovery of built-in tables and indexes.

(13)

3.1 Multidimensional Data

Idea of multidimensional indexing arises from the fact that data (records) can be considered as points of a multidimensional vector space. In terms of relational databases, each row of a database table can relate to a point of a multidimensional space, where each domain is represented by an attribute. Therefore such table stands for a subset of the space which is defined by Cartesian product of table columns.

3.1.1 Applications

Multidimensional approach can bring lots of benefits into various database applications.

Often it is wise to consider and treat data as points of a multidimensional space. In some cases the mapping of data to a vector space is straightforward (geographical information systems, CAD databases etc.), in other cases the mapping is more or less synthetic but still useful (data warehousing, data mining, systems for information storage and retrieval, archives etc.). The common identifier for all such applications is that searching according to several criteria (i.e. according to more than one database attribute) is required quite often.

Let us consider following example: a database application that is used in a shopping company comprises table sales with attributes product_type_id, sales_date, branch_id (and possibly other ones). A common database query could be as follows:

SELECT * FROM SALES

WHERE PRODUCT_TYPE_ID BETWEEN 10 AND 20 AND BRANCH_ID BETWEEN 15 AND 18

AND SALES_DATE BETWEEN '14.05.2007' AND '21.05.2007'

It is likely that the result set of such query will be quite small in relation to the count of all entries in the sales table; in other words the selectivity of such query is small. This is the case when it would be wise to create an index on the columns of the table. The easiest solution available in all database systems is to create separate indexes on each attribute. An effective query plan would select temporary subsets of the table searched by particular attribute (with use of particular index) and then the final result would arise as an intersection of the temporary subsets.

Such principle of separate indexes is shown in Figure 1 (gray lines in upper tables correspond to search conditions according to particular attributes whilst the lower table stands for the result based on the intersection of all three attribute-related conditions).

Multidimensional Indexing

(14)

Figure 1: Intersection of temporary result sets within query employing separate indexes

1113

1715

13 1618

14.05.2007 16.05.2007 21.05.2007 19.05.2007 14

16

11 16.05.2007

18

13 19.05.2007

product_type_id branch_id sales_date

In real applications users usually want only few rows to be returned in the result set (they would hardly list thousands of entries to find the required ones). It means that even with growing count of attributes used in a search condition the expected output is still of approximately the same size. The main problem of above approach is that many rows from temporary result sets are often filtered out because they do not belong to the intersection.

Moreover, with more attributes in a search condition, the expected intersection is of smaller size and more and more rows are filtered out because they seldom fulfill all the conditions. When the count of entries filtered out becomes comparable with the count of all entries in a table then it is questionable whether the sequential passing of the whole table would not be faster.

3.1.2 Relational Databases

In multidimensional databases, objects are indexed according to several or many independent attributes. However, as mentioned in previous section, this task cannot be effectively handled by using many standalone indexes. Thus special indexing structures which would naturally index vectors of values instead of indexing single values have been required.

Common approach available in nearly all database systems is indexing of compound keys, i.e. several attributes are indexed by a single index - usually a compound B-Tree index (see chapter 3.2.2 "B-Tree with Compound Keys"). This method is more effective then utilization of separate indexes, however it still involves filtering of relatively high number of entries from temporary result sets.

(15)

several attributes at once; for example the KD-Tree, the R-Tree and its modifications, the UB-Tree etc. However, the integration of any such access method into the kernel of a commercial database system is a very costly task (see chapter 1.2 "Implementation of New Access Method"). Thus most of such advanced access methods are still in the state of research prototype or are available only as database plug-ins with restricted usability.

Usually only a specific data types can be indexed (geometric objects of CAD or GIS databases), concurrency control and recovery services may not be presented at all. The only exception is the UB-Tree that was integrated into Transbase DBMS [2].

In this work, a cheaper way of developing new access method (particularly the UB-Tree via the relational approach) is investigated and compared with its native implementation.

3.1.3 Range Query

Range query (window query respectively) in a vector space is usually represented by a hyper-box in given space. The ranges of a query box QB are defined by two boundary points, the lower bound QB_low = [a₁, a₂, …, a_n] and the upper bound QB_up = [b₁, b₂, …, b_n] where a₁ b₁,a₂ b₂, …,a_n b_n. The purpose of a range query is to return all points located inside the query box, i.e. to return all points osatisfying a_i o_i b_i,i [1,n], as outlined in Figure 2.

Figure 2: Range query in 2-dimensional space

D1

D2

QBup = [b1,b2] QBlow = [a1,a2]

(16)

3.2 Common Index Methods

The most common method used for indexing of a single attribute is the B-Tree and its modifications. Its concept can be extended into the B-Tree with Compound Keys so that it is feasible to index more attributes at once. Basics of these approaches are described in this section.

3.2.1 B-Tree

B-Tree is a balanced search tree. Its internal nodes can have variable number of child nodes within some pre-defined range as mentioned later. The tree is balanced which means that all leaf nodes are at the same depth. Following rules have to be valid for proper B-Tree of degree m:

1 the root has at least 2 descendants unless it is a leaf

2 all inner nodes except from the root have at least m/2 and at most mdescendants 3 all branches are of the same length

4 all nodes except from the root have at least m/2 - 1 and at most mdata entries 5 data in a node are organized as p₀, (k₁,p₁,d₁), … (k_n,p_n,d_n) where:

p_iis a pointer to a descendant

kiis a key (the keys are ordered in ascendant or descendant order) distands for associated data

(ki,pi,di) stands for data entry

6 let us consider U(pi) to be a sub-tree which is pointed to by pi, then k U(pi-1): k<ki

k U(pi): k>ki

A modification to above principles is a B+ Tree. It differs in points (5) and (6) of B-Tree definition in following way:

5 data are stored in leaves only (or are referenced from leaves only), inner nodes comprise only keys and pointers

6 let us consider U(pi) to be a sub-tree which is pointed to by pi, then k U(p_i-1): k k_i

k U(p_i): k>k_i

Way of searching follows from above definitions and is performed in the typical manner, analogous to that in a binary search tree. Starting at the root, the tree is traversed top to bottom, choosing the child pointer whose separation values are on either side of the value that is being searched. The tree is traversed down to a leaf node (B+ Tree) or to a node containing searched data (B-Tree).

(17)

Figure 3: Insertions and deletions in a B-Tree

3.2.2 B-Tree with Compound Keys

B-Tree presented in previous section does not allow indexing of multidimensional data.

The easiest extension of B-Tree which enables this type of indexing is to consider its keys as a chained sequence of values of those attributes that are subject to index. When comparing two keys, they are compared linearly item by item.

(18)

Such extension is available in most database systems. Unfortunately, it brings some disadvantages. Probably the biggest one is an asymmetry in the order of the attributes.

The first attribute is always the main one and serves for clustering of the vector space. It means that all data in the index are ordered only according to the first attribute, and not according to the other ones. Only in case when the first attribute (or all the previous attributes in general) has the same value for more entries, the index is sorted also according to following attribute for these entries. Therefore it is suggested to use an attribute with the smallest range of values at the first place so that the count of duplicities is as big as possible.

The asymmetry causes that in a range query many branches of the tree have to be searched through and this impacts the overall efficiency negatively. A solution would be to create several indexes, each for a different order of the attributes. However such approach is hardly affordable because of its enormous disk space demands.

(19)

3.3 UB-Tree

UB-Tree [5, 4, 2] is one of the access methods that are natively used for indexing of multidimensional data - this section discusses its features and suitability for multidimensional queries.

3.3.1 Z-curve

The basic idea of the UB-tree is to use a space filling curve to map a multidimensional universe to one-dimensional space. Points of the universe are ordered according to such called Z-curve which preserves multidimensional clustering - it means that points that are close to each other in the original universe (using standard L2-metric) are in general also close to each other on the Z-curve. Figure 4 shows the Z-curve for 2-dimensional universe of size 8×8.

Figure 4: Linear ordering of 2-dimensional space with a Z-curve

63 62

61 60 59 57 58 56

55 54

53 52 51 50

49 48

47 46

45 40 41 42 43

44 39 38

37 36 35 34

33 32

31 30

29 28 27 25 26 24

23 22

21 20 19 18

17 16

15 14

13 8 9 10 11

12 7 6

5 4 3 2

1 0 0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7

3.3.2 Z-value

Z-value (also called Z-address) is an ordinal number representing the position of a multidimensional point on the Z-curve. For proper determination of the Z-value, the universe has to be finite in each dimension. Let d be the count of dimensions of the universe and xi = xi,0 … xi,s-1; i [1,d] be the binary record of the value of a multidimensional point in dimension i. Then the Z-value can be counted according to following formula:

= =

= ¹ +

0 1

1

, 2

)

( ^s

j d i

i d j j

xi

x Z

(20)

Above equation simply represents bit interleaving of values of the point in each dimension as shown in Figure 5 where step stands for bit position in given dimension (the most relevant bit position has its step equal to 0).

Figure 5: Computation of Z-value by bit interleaving bitstring 1

011

0…1 step 0

bitstring d 101

1…1 step 2

…

1…0 step 1

z-value

3.3.3 Z-region

Z-region [ : ] is a part of a multidimensional universe corresponding to all points of the universe within an interval on the Z-curve. The interval is defined by two boundary Z-values and (where < ). The upper bound is called region address. Set of Z- regions creates a disjunctive partitioning of the whole universe. Figure 6 shows Z-region [4 : 20] and partitioning of 2-dimensional universe into 5 Z-regions [0 : 3], [4 : 20], [21 : 35], [36 : 47] and [48 : 63].

Figure 6: Z-regions in 2-dimensional space

4 20 21

35 36

47 48

63

(21)

3.3.4 Tree Structure

The structure of UB-Tree is similar to the standard B-Tree (B+ Tree modification actually). Leafs of UB-Tree represent the Z-regions containing indexed objects, whilst inner nodes of UB-Tree represent such called super Z-regions. A super Z-region comprises all (super) Z-regions that lie entirely inside it. Therefore the UB-Tree structure is determined by a nested Z-region hierarchy as shown in Figure 7.

Figure 7: Hierarchical UB-Tree structure

0:63

0:31 32:63

0:3 4:9 10:31 32:35 36:41 48:63

O1 O2O3 O4 indexed objects

Z-regions super Z-regions

Algorithms that handle insertions, deletions and updates are the same as for B-Trees - the only difference is that at first the Z-value is computed based on indexed attributes of a database entry and then such Z-value is used as a key in subsequent "B-Tree operation".

3.3.5 Formal Definition

Since the UB-Tree stands for a smart extension of the B+ Tree to handle multidimensional objects, their formal definitions are quite similar. Particularly, the first 5 points of B+ Tree definition (see chapter 3.2.1 "B-Tree") are shared by the UB-Tree, just the last one is different:

6 let us consider U(p_i) to be a sub-tree which is pointed to by p_i, and Z(d) to be a function that computes Z-value for a multidimensional point d, then

k U(p_i-1): k k_i& d U(p_i-1): Z(d) k_i k U(p_i): k>k_i& d U(p_i): Z(d)>k_i

3.3.6 Range Query

Unlike the operations of insert, update and delete, a range query cannot be simply forwarded to the B-Tree. As mentioned previously, the range query in a multidimensional space is defined by two boundary points QB_low = [a₁, a₂, …, a_n] and QBup = [b1,b2, …,bn] and its purpose is to return all points lying inside such query box.

In terms of UB-Tree a range query can also be defined as a search through all UB-Tree Z-regions that intersect given query box as shown in Figure 8.

(22)

Figure 8: Z-regions intersecting a query box in 2-dimensional space

Following algorithm was presented by Markl in [2]: let us consider ql and qbto be the Z- addresses related to QBlow or QBup respectively. At first the Z-region containing ql is located (note that qldoes not need to relate to a point existing in the database - we only need to know which Z-region it belongs in). This Z-region is searched for relevant objects that really lie inside the query box, the others are filtered out. Then a subsequent Z-region intersecting the query box is retrieved and processed etc. The algorithm iterates until the upper bound of currently processed Z-region is higher than qh.

The crucial part is the way of obtaining a subsequent Z-region intersecting the query box. In Markl's work this is done by calculation of the Z-value for next intersection point of the Z-curve with the query box based on the currently processed Z-region; then a Z- region containing the computed Z-value is obtained.

Similar approach has been employed in this work as well. Please refer to the chapter describing the UB-Tree index implementation (see chapter 4 "Implementing the Relational UB-Tree Index") for more details on both the above algorithm and the function used for calculation of next intersection point get_next_zvalue (see chapter 4.5.1

"Get Next Z-value").

3.3.7 Processing Multidimensional Objects

In database area, indexing of multidimensional objects is often simplified to indexing of their minimum bounding boxes (MBB). A MBB is the smallest cube (in a multidimensional space) which completely covers the original object, usually with sides parallel to the axis. This approach reduces demanding computation of objects intersection, position etc. An example of MBB in 2-dimensional space can be seen in Figure 9.

Figure 9: Minimum bounding box in 2-dimensional space

Unfortunately, even indexing of MBB in the UB-Tree brings problems. When a UB-Tree page (i.e. a Z-region) has to be split because the count of items inside it exceeds the maximal allowed count, we may face a problem of choosing the divider for splitting algorithm.

(23)

found on the Z-curve between these objects. The trouble is that the Z-curve may go forth and back from one object to the other one and therefore such divider cannot be found so that one object lies entirely on its "left side" and the other object on its "right side". Thus at least one of the objects would belong to both Z-regions that originate from the splitting of the original Z-region and we may identify situations where even many objects would belong to both Z-regions. Then the splitting algorithm would not work as expected and high redundancy would be involved in such UB-Tree.

Similar problem arises when one MBB lies inside another MBB or when two MBB intersect each other. A simple example of two problematic objects is shown in Figure 10.

Figure 10: Intersection of Z-curve with two minimum bounding boxes

A possible solution is to consider a MBB with boundaries QBlow = [a1, a2, …, an] and QB_up = [b₁, b₂, …, b_n] to be a point of higher dimension with coordinates [a₁,a₂, …, a_n, b₁,b₂, …, b_n]. The range query mentioned in previous section has to be modified in this case as well. Thorough information about this approach can be found in [3].

(24)

4.1 Oracle Database Platform

Relational UB-Tree index was implemented with Oracle database platform [8] as the underlying DBMS (particularly the free version Oracle 10g XE has been employed in this work). It has been chosen for three main reasons:

it natively supports object extensions via its PL/SQL language

it supports recursive SQL queries which are required for optimal implementation of navigational type of a relational index

SQL query plans, processing etc. can be influenced by a programmer with a set of integrated tools (e.g. optimizer hints)

PL/SQL (Procedural Language/SQL) stands for proprietary server-based procedural extension to the SQL database language. Its syntax strongly resembles that of Ada and supports variables, arrays, conditions, loops and exceptions. It also includes features associated with object-orientation. More details about the syntax and usage of PL/SQL can be found at [11].

Recursive SQL queries offer a way to traverse tree-like structures with one SQL statement. In Oracle they have proprietary syntax which differs from the SQL: 1999 standard. It comprises two clauses used for definition of recursive traversing, particularly START WITH <condition> and CONNECT BY <condition>. In general, evaluation of such query is done in following way:

1 Oracle selects the root row(s) of the hierarchy - i.e. those rows that satisfy the condition of the START WITH clause.

2 Oracle selects the child rows of each root row. Each child row must satisfy the condition of the CONNECT BY clause with respect to one of the root rows.

3 Oracle selects successive generations of child rows. At first it selects the children of the rows returned in step 2, and then the children of those children, and so on.

4 If the query contains a WHERE clause, Oracle eliminates all rows from the hierarchy that do not satisfy the condition of the WHERE clause.

The syntax of a recursive statement is as follows:

SELECT <what> FROM <table>

WHERE <filter>

START WITH <condition>

CONNECT BY <condition>

C

H A P T E R

4

Implementing the Relational UB-Tree

Index

(25)

4.2 Common Properties of UB-Tree Index

In this work several approaches to a relational UB-Tree index implementation are presented. This section describes basic features that are common for all chosen methods, whilst particular implementation details are mentioned in subsequent sections.

4.2.1 Multidimensional Tuple

Multidimensional tuple is the base element in probably all applications designed to handle and store multidimensional data. It represents a logical point in a multidimensional space which should be involved in multidimensional indexing. The tuple is compound of items representing the value of the point in corresponding dimensions of the space.

Concerning the UB-Tree, type of each item of a tuple has to be numeric or any type that can be converted to numeric type (e.g. date, enumeration, string of fixed length and fixed number of allowed characters etc.). Moreover, the range of values in each dimension (i.e.

the minimal and the maximal value of an item) has to be known in advance. These restrictions arise from the characteristic of the Z-curve - it is necessary to know both where the Z-curve begins (the 0 point) and where it ends (actually the length of the maximal Z-value has to be known in some algorithms mentioned later, however these restrictions are identical); otherwise it would not be possible to assign appropriate Z- value to a tuple.

Without loss of generality only the numeric values for tuple items are considered and ranges of values of items in all dimensions are supposed to have the same length - it equals to the nearest higher exponent of 2 of maximal range of a domain, i.e.:

length = min({2^exp | exp N, 2^exp max_dim-min_dim dim})

4.2.2 Defining the Constraints

As mentioned in previous section, restrictions for the range of the value have to be defined in all dimensions of a multidimensional tuple. However, this task cannot be achieved by simple definition of some standard database constraints on primary data table as if it could be done in case the UB-Tree index was implemented directly into the database kernel similarly to [2].

Therefore a separate table has to be created to hold the restrictions. This table consists of just 3 columns - an identifier of a dimension, its lower bound and its upper bound:

CREATE TABLE ub_constraints ( dimension_id NUMBER, lower_bound NUMBER, upper_bound NUMBER );

(26)

4.2.3 Integrating the Index with a Database

A multidimensional tuple serves as an interface between tables storing the primary data and tables storing the index data. It means that if a row is inserted into (or updated in) the primary table, then a tuple representing row data that are subject to multidimensional index should be generated and index data based on the value of such tuple along with an identifier of primary data row should be inserted into (or updated in) the index table.

This task can be easily achieved by defining appropriate AFTER TRIGGERs on the primary table. The only restriction is that the identifier of a row in the primary table is supposed to be just one NUMBER; in case the PRIMARY KEY of the primary table consists of more attributes or is of a different type, an alternative numeric identifier for a primary data row has to be created. Then it is necessary to define required constraints via the ub_constraints table mentioned in previous section. An example can be seen in Appendix A.

Another way of integrating the index into a database would be exploiting of an extensible indexing architecture of given DBMS. Currently many commercial database systems provide an interface which enables developers to register custom secondary access methods, however the effort of implementing such type of index is behind the scope of this work because the goal is mainly to compare a relational index with its native implementation - a custom index type implementation via the extensible framework would not bring any advantages in comparison with usage of the AFTER TRIGGERs.

In this work several, ways of relational UB-Tree implementation are presented and for higher transparency each of them serves as a black box for index data maintenance.

Therefore a common interface is defined for all approaches. It provides a set of functions to keep index data consistent with primary data whenever the primary data are changed, and also a function to obtain identifiers of rows in primary table based on the query specification (the actual usage of the index in a SELECT statement).

Particularly the functions for index data maintenance which should be used in AFTER TRIGGERs are insert_tuple(tuple, id), update_tuple(tuple, id) and delete_tuple(id), whilst the function inside_query_box(lower_bound, upper_bound) serves for obtaining identifiers of primary data rows based on a query box determined by its lower and upper boundaries (which both are actually multidimensional tuples). The index is then supposed to be used in following way:

SELECT primary.*

FROM TABLE(inside_query_box(Type_tuple(X1,X2,…,Xn), Type_tuple(Y1,Y2,…,Yn))) index

LEFT JOIN PRIMARY_TABLE primary ON index.id = primary.id

More about the real processing of above concept can be found in subsequent sections describing the particular UB-Tree index implementations. Short specification of user data types is presented in Appendix B.

(27)

4.3 UB-Tree via the Direct Scheme

One of the ways proposed in [1] to implement a relational index is the direct scheme (see chapter 2.2.2 "Direct Scheme of Index Tables") where data of the index are exclusively related to primary data and do not form any specific structure. Description of this approach for developing the UB-Tree index can be found in this section.

4.3.1 Basic Concept

According to the definition of the direct scheme, each row in the index table is associated with only one row in the primary table. In this implementation of the relational UB-Tree, the mapping of primary data into index data is moreover bijective and quite simple. Only the actual Z-value of a tuple is computed and this Z-value is stored to the index table along with the identifier of the primary row.

The idea of obtaining identifiers of primary data lying inside a query box is also straightforward - the query box is partitioned into several continuous Z-regions and the index table is queried to get all rows which Z-value lie between the boundaries of such Z-regions.

4.3.2 Index Table

As outlined in previous section, the index table consists of just 2 columns - the Z-value and the identifier of a primary data row:

CREATE TABLE ub_index ( z_value Type_z_value, primary_id NUMBER );

The PRIMARY KEY constraint should be defined on the z_value attribute to exploit the power of the underlying database engine when performing a SELECT query with multiple "WHERE z_value BETWEEN" conditions based on the partitioning of the query box. For the ease of implementation, if we do not want to implement a custom index type for our Type_z_value data type, a functional index can be defined instead of the primary key on the column z_value where the function simply transforms the z_value into a string; consequently, all concerned query conditions should be transformed to be in form "WHERE z_value.to_string() BETWEEN" so that the functional index could be exploited.

The primary_id attribute could be defined as FOREIGN KEY to the PRIMARY KEY of the primary table and an ON DELETE CASCADE constraint could be used instead of AFTER DELETE trigger for deleting a row from the index table when a row in the primary table is deleted, however this approach is not in compliance with the black box concept and strict separating of primary and index data.

(28)

If this type of UB-Tree index is used in an environment where changes and deletions of rows in the primary table are frequent, it may be wise to define a standard secondary database index on the primary_id attribute, because the ub_index table is queried according to this attribute when an INSERT or UPDATE of the UB-Tree index structure is triggered. On the other hand this is not recommended in environments with majority of insertions because such secondary index could become very large and the insertions would be slower just because of updates of this index.

4.3.3 Inserting, Updating and Deleting a Tuple

According to the main principle of this approach, all these operations are quite simple:

in case of inserting or updating, the actual Z-value of a tuple being processed is computed and is stored to the index table

in case of deleting only the row with the corresponding identifier is deleted from the index table

4.3.4 Querying Tuples

The main idea is to partition the query box into several continuous Z-regions. Then the index table is queried to get all rows which Z-value lies between the boundaries of such Z-regions. The actual query statement could be as follows:

SELECT primary_id FROM ub_index u,

TABLE(decompose_query_box(:lower_bound, :upper_bound)) d WHERE u.z_value BETWEEN d.lower AND d.upper

The function decompose_query_box generates a temporary table that contains rows with lower and upper bound specification of particular Z-regions covering the query box which is defined by 2 multidimensional tuples representing its lower_bound and upper_bound boundaries.

The easiest way to decompose a query box is repeated calling of functions get_next_zvalue (see chapter 4.5.1 "Get Next Z-value") and get_next_zvalue_out (see chapter 4.5.2 "Get Next Z-value Out"). This approach guarantees that the query box is covered by sequence of optimal Z-regions, which means that:

each of such Z-regions is as long as possible it does not exceed the query box in any dimension

However, the experiments have shown that this is also the least efficient way. Even though both of the functions are of linear time complexity in relation to the bit length of the maximal Z-value, the characteristics of the Z-curve cause that a query box is decomposed into huge number of Z-regions and thus the whole computation and query evaluation take a lot of time.

(29)

dimension has been divided into approximately 25.000 Z-regions. It means that the average length of a Z-region has been about 4. Although this simple experiment cannot impact the actual result in general case (the query box can be covered just by few Z- regions even though it is much larger), we may estimate that this is not the right way.

The main problem in above scenario is that the average length of a Z-region is too short.

Therefore it would be wise to define minimal allowed length of a Z-region providing that the Z-region may exceed the boundaries of the query box in some dimensions.

Whole space could be divided by a grid into multidimensional cubes with size equal to 2^const in each dimension, where const is an integer higher or equal to 1. Each such cube is covered by just 1 continuous Z-region which length is considered to be the minimal allowed one. It means that the actual minimal length equals to (2^const)^dim, where dim is the count of dimensions of the space. Moreover, if we define total ordering of the cubes according to the Z-curve and a query box has non-empty intersection with several subsequent cubes in relation to such ordering, then the related Z-region covering a part of the query box has its length equal to the sum of lengths of the subsequent cubes. Such Z-regions generate an extended query box.

If this logic is applied to the same query box as in the example mentioned in one of the previous paragraphs, and in case the const equals to 3, then the box is completely covered just by 1 Z-region. An example is shown in Figure 11.

Figure 11: Query box partitioned into 9 Z-regions and its extended query box partitioned into 4 Z-regions when const=1

Above approach requires post-filtering of selected tuples to ensure that they really belong to the original query box. Therefore an optimal const should be found so that both the number of Z-regions covering the extended query box is rather small and the number of tuples discarded because of post-filtering is not significant in relation to the number of all tuples lying inside the extended query box. A discussion about choosing the proper const can be found in Experiments section of this work (see chapter 5.2.1

"Optimal Constant for Extended Query Box in Direct Scheme").

(30)

The actual algorithm for decompose_query_box function utilizes the method "Divide and conquer" where the whole space is divided into 2 multidimensional sub-cubes of the same size and each sub-cube is then processed recursively. When a sub-cube with minimal allowed size is being processed and in case it has non-empty intersection with the query box, then its boundaries are sent to the output.

For higher effectiveness both positive and negative pruning are employed in this algorithm. It means that if the cube being currently processed does not intersect the query box, then the further processing of such cube is skipped. On the other hand if the cube is nearly whole covered by a part of the query box (it means that all minimal sub- cubes of such cube intersect the query box), then it is whole sent to the output.

Simplified pseudo code of this algorithm which does not consider optimization of the query box intersection with subsequent sub-cubes is as follows:

function decompose_query_box(lower_bound, upper_bound) { function decompose(query_box, cube) {

if (cube does not intersect query_box) return;

if (cube is minimal or cube is covered by query_box) { send cube boundaries to the output;

return;

}

decompose(query_box, lower-sub-cube);

decompose(query_box, upper-sub-cube);

}

query_box = BOX(lower_bound, upper_bound);

cube = BOX(minimal_z_value, maximal_z_value);

decompose(query_box, cube);

}

For estimation of time complexity of above algorithm following definitions are needed:

The basic cube is a multidimensional cube with size equal to 2^int in each dimension, where int is an integer equal to or higher than 0; moreover such cube has to be filled by just one continuous Z-region. The length of such Z-region can be easily counted and equals to (2^int)^dim.

Let cnt be the count of maximal basic cubes which Z-regions completely cover the extended query box. From the construction of the extended query box follows that the smallest possible size of a basic cube in the extended query box is 2^const and the length of related Z-region is (2^const)^dim.

Let nbe the bit length of maximal Z-value.

The time complexity of each execution of decompose function is O(n) because:

The condition "cube does not intersect query_box" is counted as the result of the function get_next_zvalue (see chapter 4.5.1 "Get Next Z-value") which has complexity O(n).

The condition "cube is minimal" is of constant complexity because just an integer representing the minimal length is added to a Z-value (constant complexity) and then two Z-values are compared (again constant complexity).

(31)

because it utilizes the function inside_box (see chapter 4.5.3 "Inside Box") executed on both the boundaries of the cube.

Because of both positive and negative pruning the recursive dividing of the cube is executed at most cnt-times in each level of depth of nested calling of the function, in other cases one of the conditions evaluates to true and the function ends. The maximal depth of nested calling can be also counted and equals to n/const however the const factor can be omitted in this computation.

Therefore the overall time complexity of decompose_query_box function is O(cnt*n*n)

= O(cnt*n²) which means that the time complexity is quadratically dependent on the length of maximal Z-value, and linearly dependent on the complexity of the extended query box.

(32)

4.4 UB-Tree via the Navigational Scheme

Another way proposed in [1] to implement a relational index is the navigational scheme (see chapter 2.2.1 "Navigational Scheme of Index Tables") where data of the index create a hierarchical structure. Description of this approach for developing the UB-Tree index can be found in this section.

4.4.1 Basic Concept

According to the definition of the navigational scheme, each row in the index table is associated with one or more rows in the primary table. The chosen concept is similar to the implementation of the relational R-tree in [1].

The rows in the index table form a hierarchical structure that stands for the natural way of implementing the UB-Tree which is essentially also hierarchical. Each row contains logic identifier of the page of UB-Tree, its level in the tree, reference to the page that stand for its direct descendant and boundaries of related Z-region. On the lowest level the reference actually contains identifier of a row in the primary table, the lower boundary contains the Z-value of indexed data whilst the upper boundary is null.

There are three ways of obtaining identifiers of primary data lying inside a query box - either a recursive SQL query statement can be used, or the UB-tree structure can be traversed programmatically or a database cursor can be opened to obtain required data.

4.4.2 Index Table

The index table for the navigational UB-Tree is designed in following way:

CREATE TABLE ub_index ( page_id NUMBER, level NUMBER, son_id NUMBER,

z_lower Type_z_value, z_upper Type_z_value );

The page_id attribute stands for the identifier of a logical UB-Tree page. Let max be the maximal count of items in one logical UB-Tree page. Then there can be several rows in the table with equal page_id up to the value of max. This concept has been chosen because it allows flexible assignment of the page size and the query to obtain searched objects can be written in a smart way, as mentioned later.

(33)

page_id as the PRIMARY KEY. However several boundary sets consisting of son_id, z_lower and z_upper would have to be stored along with each row and thus the change of maximal items count in UB-Tree page would be quite difficult because it involves changes in the code. Moreover queries on this table would be less transparent as the WHERE clause would comprise several conditions related to each boundary set.

To minimize disk access cost during most SQL queries, clustering of the table is defined according to page_id attribute. This approach ensures that entries belonging to one logical UB-Tree page are stored in one physical cluster on the disk.

The PRIMARY KEY constraint is composite and comprises the attributes page_id and son_id. Another INDEX is defined on the son_id attribute because the updates of the UB-Tree structure related to INSERT, UPDATE or DELETE of a row in the primary table involve number of queries according to this attribute.

4.4.3 Inserting a Tuple

At first a logical UB-Tree page has to be found where the z-value of a tuple being inserted belongs. This can be easily achieved with recursive SQL query:

SELECT * FROM ub_index WHERE level = 1

START WITH page_id = 1 CONNECT BY

PRIOR son_id = page_id

AND :tuple_z-value BETWEEN z_lower AND z_upper

If the count of items in a logical page exceeds max, then the page has to be split into two pages and a new divider has to be inserted into the parent page. This may involve recursive splitting of the ancestor pages up to the root. The algorithm is similar to the algorithm of splitting standard B-tree page. The interesting part is the way of choosing the divider. Similar algorithm to the one proposed in [2] has been used - a divider that causes the least possible partitioning of the space is chosen. If we consider the definition of the basic cube mentioned in one of the previous chapters (see chapter 4.3.4 "Querying Tuples"), then the Z-region covering the original page is divided into two Z-regions that cover maximal possible basic cubes.

4.4.4 Deleting a Tuple

The tuple is simply deleted from the index table according to its identifier.

DELETE FROM ub_index

WHERE son_id = id AND page_lev = 0

If the count of items in the concerned logical UB-Tree page is lower than max/2, then some items has to be transferred to the page from a neighbor page, provided that the count of items in the neighbor page is sufficient; otherwise these two pages have to be merged and their divider has to be removed from the parent page. This may involve recursive merging of the ancestor pages up to the root. The algorithm is similar to the algorithm of merging standard B-tree pages and does not include any special solution.

A RELATIONAL APPROACH TO INDEXING

DIPLOMA THESIS

Antonín Prukl