• Nebyly nalezeny žádné výsledky

Text práce (5.355Mb)

N/A
N/A
Protected

Academic year: 2022

Podíl "Text práce (5.355Mb)"

Copied!
64
0
0

Načítání.... (zobrazit plný text nyní)

Fulltext

(1)

Charles University in Prague Faculty of Mathematics and Physics

MASTER THESIS

Bc. Jiˇr´ı Novotn´ y

Analysis of results of particle phenomena modeling in computer

games

Department of Software and Computer Science Education

Supervisor of the master thesis: Mgr. Jiˇr´ı Boldyˇs, Ph.D.

Study programme: Informatics

Specialization: Software Systems

Prague 2014

(2)

I would like to express my gratitude to my supervisor Mgr. Jiˇr´ı Boldyˇs, Ph.D.

for his unwavering support and encouragement throughout the entire process.

I deeply appreciate the time he spent on reviewing and giving me useful com- ments. Without his supervision and constant help this thesis would not have been possible.

(3)

I declare that I carried out this master thesis independently, and only with the cited sources, literature and other professional sources.

I understand that my work relates to the rights and obligations under the Act No. 121/2000 Coll., the Copyright Act, as amended, in particular the fact that the Charles University in Prague has the right to conclude a license agreement on the use of this work as a school work pursuant to Section 60 paragraph 1 of the Copyright Act.

In Prague, July 31, 2014 Jiˇr´ı Novotn´y

(4)

N´azev pr´ace: Anal´yza v´ysledk˚u modelov´an´ı ˇc´asticov´ych jev˚u v poˇc´ıtaˇcov´ych hr´ach

Autor: Bc. Jiˇr´ı Novotn´y

Katedra: Kabinet software a v´yuky informatiky

Vedouc´ı diplomov´e pr´ace: Mgr. Jiˇr´ı Boldyˇs, Ph.D., ´Ustav teorie informace a automatizace AV ˇCR

Abstrakt: Hlavn´ım t´ematem t´eto pr´ace je n´avrh metody, kter´a umoˇzˇnuje porov- n´av´an´ı v´ystup˚u z ruˇcnˇe nastaven´eho ˇc´asticov´eho syst´emu se vzorov´ym videem pozorovan´eho jevu. Pro jednoduchost se pr´ace zamˇeˇruje v´yhradnˇe na kouˇrov´e efekty a jejich porovn´av´an´ı.

Konkr´etnˇe jsou navrˇzeny deskriptory popisuj´ıc´ı celkovou barevnost zkouman´ych vstupn´ıch vide´ı, d´ale pak deskriptory tvaru a ˇcasov´eho pr˚ubˇehu kouˇre, zaloˇzen´e na v´ysledku metody segmentace obrazu zvan´e Alpha Matting, kter´a umoˇzˇnuje rozliˇsen´ı objekt˚u v popˇred´ı od pozad´ı s t´ım, ˇze objekty jsou pops´any vˇcetnˇe sv´e pˇr´ıpadn´e pr˚uhlednosti.

Diplomov´a pr´ace je doprov´azena vzorov´ymi nahr´avkami kouˇr˚u, kter´e byly na- toˇceny speci´alnˇe pro tento ´uˇcel. V´ysledek navrˇzen´e metody je podpoˇren uk´azkou postupn´eho vylepˇsov´an´ı simulace vybran´eho kouˇrov´eho efektu.

Kl´ıˇcov´a slova: digit´aln´ı zpracov´an´ı obrazu, ˇc´asticov´e jevy, podobnost kouˇrov´ych efekt˚u

(5)

Title: Analysis of results of particle phenomena modeling in computer games Author: Bc. Jiˇr´ı Novotn´y

Department: Department of Software and Computer Science Education

Supervisor: Mgr. Jiˇr´ı Boldyˇs, Ph.D., Institute of Information Theory and Au- tomation of the ASCR

Abstract: The main subject of this thesis is to design a method that is able to compare the outputs of a manually set up particle system with a video template of a simulated particle phenomenon. For the sake of simplicity, this work is focused exclusively to smoke effects and its comparisons.

Specifically, descriptors depicting the color scheme of the examined input videos are proposed as well as the shape descriptors and time variability descriptors, which are based on the image segmentation method called Alpha Matting. This method is able to distinguish foreground objects from the background, while the foreground objects are described together with their transparency.

Video template recordings captured specially for the purpose of this work are included. At the end, the resulting method is supported by an example of the gradually improving smoke effect simulation.

Keywords: digital image processing, particle phenomena, smoke effects similar- ity

(6)

CONTENTS

Introduction 3

Motivation . . . 3

Goals . . . 3

1 Theoretical Background 4 1.1 Particle Systems . . . 4

1.1.1 Brief History . . . 5

1.1.2 Fundamental Model . . . 5

1.1.3 Rendering Tools . . . 6

1.2 Image Processing . . . 6

1.2.1 Basic Terms . . . 7

1.2.2 Color Spaces . . . 7

1.2.3 Image Moments . . . 8

1.3 Alpha Matting . . . 9

1.3.1 Methods with User Interaction . . . 10

2 Proposed Method 12 2.1 Simplifying Assumptions . . . 12

2.1.1 Video Template Restrictions . . . 12

2.1.2 Visual Stationarity . . . 12

2.2 Problem Analysis . . . 13

2.3 Image and Video Features . . . 13

2.3.1 Color Descriptors . . . 14

2.3.2 Shape Descriptors . . . 16

2.3.3 Time Variability Descriptors . . . 18

2.4 Similarity Measures . . . 19

2.4.1 Color Distance . . . 19

2.4.2 Shape Distance . . . 20

2.4.3 Time Descriptors Distance . . . 21

2.4.4 Overall Similarity . . . 22

2.5 Method Overview . . . 22

(7)

3 Implementation 24

3.1 Source Code Description . . . 24

4 User Guide 26 4.1 Installation and Manipulation . . . 26

4.2 Computing Similarities . . . 26

4.2.1 Finding Out More Details . . . 27

4.3 Interpretation of Results . . . 28

4.3.1 Color Similarity . . . 28

4.3.2 Shape Similarity . . . 28

4.3.3 Time Variability Similarity . . . 29

5 Results 30 5.1 Alpha Matting . . . 30

5.2 Color Similarity . . . 33

5.2.1 Color Matrix Verification . . . 33

5.2.2 Color Matrix Examples . . . 33

5.2.3 Color Similarity Experiment . . . 34

5.3 Shape Similarity . . . 35

5.3.1 Shape Matrix Verification . . . 35

5.3.2 Maximal Moment Order . . . 36

5.3.3 Normalized Image Moments . . . 36

5.3.4 Shape Matrix Examples . . . 37

5.4 Time Variability Similarities . . . 42

5.4.1 Implementation Verifications . . . 42

5.4.2 Real Examples . . . 43

5.5 Overall Similarity . . . 43

5.5.1 Smoke Simulation . . . 46

Conclusion 52 Drawbacks and Future Work . . . 52

Bibliography 54

List of Figures 57

List of Tables 58

Appendix A Content of Attached DVD 59

(8)

INTRODUCTION

Motivation

Modern video games are full of phenomena such as fire, smoke, explosion, sparks, dust or abstract visual effects like glowing plasma guns, trails of magic wands etc. These phenomena are typically simulated using particle systems that are being controlled manually by graphic designers. They must set up a lot of pa- rameters. It is a relatively tedious process and the outcome strongly depends on the experience and subjective approach of the graphic artist.

My aim is to facilitate the work of graphic designers by developing some method that would help to compare the result of a manually set up particle system with a video template of simulated real phenomenon (e.g. smoke and fire). This comparison should also serve as a feedback for the gradual improvement of the particle system.

Goals

Main goal of this thesis is to propose a method capable to compare two video streams of particle effects that gives some reasonable output for graphic artists modeling the particle phenomena. Although a fully automated approach can come to mind, this work is focused to create some similarity metric only, rather than to present a completely autonomous system that can model particle effects purely on the basis of video templates. This is because this problem is too much general and have not been studied in the literature yet, so the goals have to be more realistic. However, the metric should be robust enough to allow possible future extensions, perhaps right for a fully automated approach.

An integral part of this work is obtaining video templates that can be used for experimental purposes and testing the similarities.

(9)

CHAPTER 1

THEORETICAL BACKGROUND

This chapter contains theoretical foundations that are essential to understand the further text. Section 1.1 briefly describes the history of particle systems, their common characteristics and basic use in computer graphics applications.

The fundamental terms from digital image processing are defined in Section 1.2.

Knowledge of these concepts is assumed in the following chapters where they are not re-explained. Finally, Section 1.3 discusses the problem of the foreground and background separation from images and videos in terms of a transparency map computation, which is also referred to asalpha matting.

1.1 Particle Systems

Particle system is a computer graphics method that uses a huge amount of tiny graphical objects (called sprites) to simulate fuzzy phenomena such as fire, clouds and water. Modeling of these effects has proved difficult with other existing techniques of computer image synthesis [20]. Samples of the particle effects which I have created in CryEngine 31 are shown in Figure 1.1.

(a) A smoke (b) A fire

Figure 1.1: Examples of particle effects generated in CryEngine 3

1CryENGINER is a game engine designed by the German game developerCrytek.

(10)

1.1.1 Brief History

Particle systems have a long history in video games and computer graphics. Very early video games in the 1960s already used 2D pixel clouds to simulate explosions.

One of the first publications about the use of dynamic particle systems was written after the completion of the visual effects for the motion pictureStar Trek II: The Wrath of Khan by William Reeves in 1983 [20]. Reeves described basic data structure for representing a particle and elementary motion operations. Neither of them has changed much since. First implementations on parallel processors have been done by Karl Sims in early 1990s [21]. The further development of GPU-based techniques has enabled real-time particle systems which resulted in a massive expansion of particle phenomena into the video-gaming industry [22].

1.1.2 Fundamental Model

An object is represented as a collection of many primitive particles that define its volume. Over a period of time, particles change form and move. Also new particles are generated (“born”) and old particles are removed from the system (they “die”). Fuzzy objects represented by a particle system are not determinis- tic, since their form and shape are not completely specified. Instead, stochastic processes are used to affect the particle system behavior.

The following text does not aim to provide a complete information about the particle systems, but it gives brief characteristics that I believe are necessary to understand the basic functioning. A more detailed description and implementa- tion notes can be found in [20] and [23].

Particles A particle can be thought of as a point in the three-dimensional space with some specific properties. Common particle attributes are position, velocity, size, color (together with transparencyor alpha-texture affecting the re- sulting entity shape) and life span. New particles are generated and controlled by an emitter. The emitter acts as a source of the particles; its position and distribution determines where the particles are generated. Each new particle is assigned its individual attributes (with an element of randomness, e.g. life span is set up to 100 frames±25%).

Particle System A particle system (in terms of entities) is a collection of a number of individual elements (particles) together with attributes which control behavior of the elements in a non-deterministic way with the passage of time. Common data properties of particle systems are: list of particles, position, emission rate,forces (they affect the behavior) and blending(or some other kind of rendering settings).

Simulation Phase Over the simulation phase, new particles are born (amount of just created particles is based on emission rate and the interval between frame updates) and their parameters are initialized. Then, particle lifetime test is per- formed: any particle that exceeded prescribed life span is removed. The remaining particles are moved and transformed according to their dynamic attributes and

(11)

according to external forces (gravity, wind, friction, attractors, . . . ) and prede- fined laws. This can be based on a real physical simulation. Collisions between particles and specified scene objects are often computed, which leads to particle interactions like bouncing or absorption. Even interactions between particles can be taken into account but they are still computationally costly and therefore they are rarely used [22].

Rendering Phase After the simulation phase is passed, each particle is ren- dered. Particles can be rendered as points, textured polygons, or primitive ge- ometry. The most common form is a textured billboard (quadrilateral) that is always facing the camera. Together with alpha blending (an operation of com- posing image transparency channels with other layers) these billboards yield good results for fire, smoke, etc. Main properties that contribute to the final rendered image are: color(which can also be animated, for example: color of a fire particle is orange near the flame source and darker as the particle moves farther away), transparency, blur (smoke particles are often blurred to a point that the view- er cannot distinguish individual particles) and glow (which makes particles look incandescent, e.g. fire, laser and plasma guns).

Particle Hierarchy A particle systems may be composed of many smaller particle systems. The child particle systems then inherit properties of the par- ents. Also different particle systems are usually combined together to create one complex effect (like fire mixed with smoke and sparks).

1.1.3 Rendering Tools

Particle systems are nowadays a common part of many modeling and rendering programs such as3D Studio Max,Blender,Maya, Cinema 4D. These tools allow graphic artists to have an instant feedback on how a particle system will look with specified constraints and attributes. Particle systems are also included in different game engines like Unreal Engine, Unity3D orHavok.

For experimental purposes of this thesis I have chosen the CryEngine 3 — SDK2 version which is available on Crytek’s website: http://www.crytek.com/.

Its main advantage is the price (free for non-commercial purposes) and relatively high visual fidelity of the results.

1.2 Image Processing

The extensive area of digital image processing refers to processing images by means of computer. It is a far-reaching discipline investigating subjects such as mathematical characterizations of images, image improvements, analysis, descrip- tions and many others. The overall knowledge of this field is perfectly covered in two well known books. The first isDigital Image Processingby Pratt [1] and the second (with the same title) is written by Gonzalez et al. [2]. In the following text, only essential terms are laid down.

2Software Development Kit

(12)

1.2.1 Basic Terms

Definition. An image function (or just image) is any piece-wise continuous real function f(x, y) defined on a compact bounded support D ⊂ R×R and having a finite nonzero integral [12].

Note. In practice, a discrete version of some image function is assumed. This is indicated by a capital letter, e.g. I.

Definition. A Digital image is any H×W matrix I with the elements from the finite set of integers {0,1,2, . . . , L−1}, where L > 1. Usually L = 256 and each pixel value is stored in onebyte [12].

Definition. Video function(or justvideo) is any image function with addi- tional real parameter t (time). Discrete version of a video function I(x, y, t) has t∈Z and 1≤t≤F, where F indicates number of frames.

Definition. Pearson’s Correlation Coefficient of two M ×N images I and J is defined (based on [3]):

r(I, J) =

M

P

m=1 N

P

n=1

I(m, n)−E(I)

·

J(m, n)−E(J) s M

P

m=1 N

P

n=1

I(m, n)−E(I)2

· s M

P

m=1 N

P

n=1

J(m, n)−E(J)2

whereE(I) denotes the mean value of an image I:

E(I) = 1 M ·N

M

X

m=1 N

X

n=1

I(m, n)

This coefficient is used to measure the similarity (linear dependency) between features and has a value from [−1; 1]. (Note that in the MATLAB, the built-in implementation of this coefficient for two matrices is called corr2 function.)

1.2.2 Color Spaces

Color spaceis an abstract mathematical model covering the way how colors can be represented as numeric vectors. Various color spaces are available, definitions of different color spaces can be found in [15]. I have chosen the HSV3 color space for representing colors in this thesis because of the following reasons. It is more natural for graphic designers to describe colors by the HSV model than by the RGB4 and color features extracted in the HSV space can capture the distinct characteristics of computer graphics better. This is noted in the work of Chen et al. [16].

3Acronym for Hue, Saturation and Value.

4Acronym for Red, Green and Blue.

(13)

1.2.3 Image Moments

Image moments are often used in pattern recognition applications to describe ge- ometrical shapes of investigated objects. In general, moments are commonly used in statistics to characterize the distribution of random variables. They provide fundamental geometric properties (e.g. area, centroids, skewness, kurtosis) of a distribution [10]. If we consider a binary or gray level image as a two-dimensional density distribution function, the use of moments for image analysis is straight- forward. In this manner, moments may be used to extract properties that have analogies in statistics [13].

Definition. The geometric moment mpq of an image f(x, y), where p, q are non-negative integers and (p + q) is called the order of the moment, is de- fined [12] as

mpq =

Z

−∞

Z

−∞

xpyqf(x, y) dxdy

Definition. Thecentral geometric moment µpq of an imagef(x, y), where p,q have the same meaning as above, is defined [12] as

µpq =

Z

−∞

Z

−∞

(x−x)p (y−y)q f(x, y) dxdy,

wherex= mm1000 andy= mm0100 are the components of the centroid. Central moments provide a description invariant to the translations of the object.

Definition. The normalized central geometric moment is defined [12]

as:

νpq = µpq

m

p+q 2 +1 00

The main difference of normalized moments is their scaling invariance.

Transformations of these definitions to the discrete form are quite intuitive. For the sake of clarity and a straightforward implementation they are listed bellow.

Detailed reasoning can be found in [12] from where also following definitions are taken.

Definition. Thegeometric momentmpq of a discretizedH×W imageI(x, y), is defined:

mpq =

H

X

y=1 W

X

x=1

xpyqI(x, y)

Definition. The central moment µpq of a discretized H×W image I(x, y), is defined:

µpq =

H

X

y=1 W

X

x=1

(x−x)p (y−y)q I(x, y), wherex and yare the same as above.

(14)

(a)α (b)F (c) 1−α (d) B

(e)C (f) Additional scribbles

Figure 1.2: An illustration of a matte composition according to Equation 1.1.

1.3 Alpha Matting

The accurate extraction of foreground objects from still images and videos plays an important role in many applications of digital image processing, especially in film production and image editing disciplines. The namematting refers exactly to this problem of separating a foreground object from the background.

This task was first described mathematically in 1984 by Porter et al. [24].

They introduced an alpha channel image which is further used for a linear in- terpolation when rendering given foreground over an arbitrary background. The observed composite imageC can be described by their model as:

C(x, y) =α(x, y)·F(x, y) +

1−α(x, y)

·B(x, y) (1.1) Where α is the alpha matte image, F and B represent the foreground and background images respectively. The alpha matte image can take any value from the range [0; 1]. Ifα(x, y) = 1, the pixel (x, y) is calleddefinitive foreground and when α(x, y) = 0, the pixel is called definitive background. Any value in between denotes a mixed pixel. An accurate estimation of alpha values for mixed pixels is needed for full separation of the foreground and the background.

Alpha matting term is used when referencing to the matting with emphasis on getting the alpha matte only. An example of image composition using alpha matte can be seen in Figure 1.2.

With this model, matting problem can be understood as the inverse process of composition. Given a composite imageC, the method must determine a fore- groundF, a background B and an alpha matte α. For three channel image (for example in RGB color space) these three equations can be written:

CR =α·FR+ (1−α)·BR

(15)

CG =α·FG+ (1−α)·BG

CB=α·FB+ (1−α)·BB

They yield an under-constrained problem (7 unknown variables need to be deter- mined from only 3 known variables), which has no unique solution. Thus some additional constraints are necessary. Cho et al. [25] claim that there are two main ways how to make the matting problem tractable:

1. Known background — the input image (or video) is captured against a single constant-colored background, which substantially reduces the com- plexity of matting problem. This technique is also known aschroma keying orgreen/blue screening.

2. A user interaction — the user provides some additional information, typically through scribble-based interface or trimap image (they will be described further).

Apart from these two approaches, there are couple of special techniques that are using extra information such as flash or non-flash image pairs, camera arrays or synchronized video streams. These methods are discussed in detail in Section 7 of the survey by Wang et al. [26], but for the purpose of this thesis, they are not convenient as well as chroma keying methods. The main idea is to obtain template data as simply as possible, without the need of additional equipment or laboratory set up scenes with known background. This led to the choice of methods with user guidance.

Binary Segmentation If the alpha values are constrained to be only 0 or 1, the alpha matting problem is reduced to the well known classic problem ofbinary segmentation (and related background subtraction / foreground detection). Mat- ting is more complex and more general than the problem of binary segmentation and yields results that are closer to the reality when assuming foreground objects with fuzzy boundaries. Nevertheless, my first attempts were based on video back- ground subtraction which turned out to be useless because this method cannot handle subtle particle effects with high transparencies, which is further discussed in 5.1.

1.3.1 Methods with User Interaction

Without any additional constraints, Equation 1.1 has infinite number of solutions.

For example a trivial solution that satisfies the composition equation could be obtained by setting α(x, y) = 1 andF(x, y) = C(x, y) for every pixel (x, y). This solution simply means the whole image is understood as a foreground object.

Probably this is not a desired output that would fulfill human expectations. To correctly separate semantically meaningful foreground objects, majority of meth- ods with user interaction starts with a three level image called trimap.

Trimap and scribbles Trimap is an image containing three regions: definitive foreground, definitive background and unknown pixels. It is usually manually specified by the user and given to the algorithm together with a composite image.

(16)

Recently proposed algorithms based on spectral matting [28] offer an automated extraction of alpha matte without any additional user input. However, these methods have a number of limitations and they typically cannot handle images with highly-textured backgrounds. In practice, user specified trimaps are still necessary to achieve high quality results of the alpha matted image [27].

Instead of requiring a carefully designed trimaps, some algorithms allow to specify an image with a few foreground and background scribbles as an additional user input. This implicitly creates a very rough trimap, where majority of pixels is marked as unknown. An example of input scribbled image is shown in Figure 1.2f.

The accuracy of a trimap or a scribbled additional image is a key factor for the performance of a matting algorithms. The unknown region should be as thin as possible to achieve the best results [26].

Obtaining a matte from video clips of a dynamic foreground objects is often called video matting. In some scope, it is an extension of image matting methods, but video matting is generally much harder to implement. Various challenges must be overcome — temporal coherence between the frames has to be maintained and fast processing of large data sets is needed. A comprehensive review of existing image and video matting techniques and systems can be found in [26] and [27].

(17)

CHAPTER 2

PROPOSED METHOD

2.1 Simplifying Assumptions

From the beginning, it was quite sure that this task is very general and that I should focus only to a restricted area. The characteristics of individual particle effects are greatly diverse — sparks and dust fade out very quickly, while smoke and fire remain visible for a longer period; fire changes its color depending on the distance from the edges, while smoke’s color stays more or less constant; etc.

I have chosen to focus on smoke effects only, because their properties seems to be simple enough to start with, they can be easily captured with a digital camera and they are often presented in predefined particle system libraries in various forms.

2.1.1 Video Template Restrictions

Another decision was to figure out how to capture the input data. I have decided that video templates of the observed particle effect will be captured from one point of view, because this is the most simple solution and it makes the method more practical, without the need of laboratory set up scenes (but at least a tripod or another fixation of the camera is desirable). Same particle effect (like being watched from the identical point of view) is then mimicked within the particle system and the result of the simulation is stored as a video file (or a sequence of frames).

Cheap low-end digital cameras typically do not provide an option to capture the videos into a raw format and they perform some lossy compression (often JPEG is used), which can produce annoying artifacts. As this is also my case, the proposed method should not be affected by these artifacts.

2.1.2 Visual Stationarity

I am assuming that the observed particle effect does not change its natural char- acteristics over time. Particle phenomenon with this attribute will be referred to as stationary, which is analogous to stationary stochastic processes in time

(18)

(a) Non-stationary smoke

(b) Stationary smoke

Figure 2.1: An example of visual stationarity of particle effects.

series analysis [11]. Informally, basic characteristics (mean, variance, . . . ) of a stationary process does not change when shifted in time.

The strict statistical definition will not be followed. Instead, a more intuitive understanding of the stationarity will be sufficient. Visual stationarity means that the followed particle object has roughly the same characteristics during the whole video clip. For instance, a cigarette smoke captured, including the phase of creation or extinction is not stationary, while the video sequence, including only the phase of smoke progression (without fierce changes) is considered to be stationary (see Figure 2.1).

2.2 Problem Analysis

The examined problem can be shortly summarized as follows: given two video sequences of smoke effects, find a suitable similarity comparison.

In order to compare two videos appropriately, a method for describing raw data must be specified so that only features of interest are highlighted for sub- sequent computer processing. This description is often called feature extraction or feature selection [1], [2]. It deals with extracting attributes that yield some quantitative information and can distinguish primitive characteristics of com- pared videos. Some features are built upon natural characteristics by the visual appearance, the others result from specific manipulations on images or video frames.

For depicting the general appearance of captured scene, some color or image intensity features are needed. While describing the smoke lobes, major attention must be paid to the regions, where only smoke is present — this leads to the use of some image segmentation technique (the alpha matting method has been chosen, see Section 1.3). In following sections, the selection of particular image and video features and the problem of their similarity are described more deeply.

2.3 Image and Video Features

The selection of significant features that would be sufficient to compare images or videos has been motivated by analysis of particle systems. The appearance of

(19)

the resulting particle effect is influenced by individual particles and by the proce- dure of their movements and overlapping. A blending logic of the rendering part ensures that the resulting shape is strongly affected by the color of individual particles. For the same reason, the appearance of the whole effect is partly deter- mined by the shape of the individual particles. Hence it is reasonable that both of these aspects (color and shape) must be taken into account while describing properties of images and videos depicting particle objects.

In addition, color-only based methods suffer with false positives, i.e. images with completely different content that just happen to have similar color compo- sition are described with roughly the same feature values. Thus, in practice it is inevitable to combine color features with texture or shape features techniques [14].

2.3.1 Color Descriptors

There are several methods how color features can be described. Overview of image moments techniques can be found in [6]. Nice review of histograms, moments and wavelet-based indexing techniques is stated in Mandal et al. [5]. Quite innovative color correlograms are described in the article by Huang et al. [8].

Despite the wide number of methods, I decided to use the procedure proposed by Stricker et al. [4], because it is simple and computationally fast and also (according to their study) it seems to be robust enough and more efficient than the whole histogram based techniques. For my purposes I have adjusted their method to handle whole videos rather than one image.

Dominant Color Statistics Stricker et al. produced a new approach to color indexing. Instead of computing with the complete color distributions, their de- scription contains only dominant color features. From probability theory we know that a probability distribution can be uniquely characterized by its moments, re- sp. central moments (this can be found e.g. in [10]). So if the color distribution is interpreted as a probability distribution, then the color distribution can be characterized by its moments.

They proposed to store only the first three moments for each color channel (i.e. 9 floating point numbers per a HSV image). The first moment is the average, so the average of pixel intensities is stored per color channel. The second and third central moments is the variance and the skewness. The standard deviation and the third root of skewness are stored for each color channel in order to have three values with the same units which makes them numerically more comparable.

Let Vi(x, y, t) be the i-th color channel value of the discrete video function V(x, y, t), where i, x, y, t ∈ N, x ≤ W (width), y ≤ H (height), t ≤ F (number of the frames) and let D = F ·W ·H (normalization factor as the sums are iterated over all the pixels for each frame). Then entries of the histogram feature vector related to the i-th channel (the average, the variance and the skewness respectively) are:

EiV = 1 D

F

X

t=1 W

X

x=1 H

X

y=1

Vi(x, y, t)

(20)

σiV = v u u t

1 D

F

X

t=1 W

X

x=1 H

X

y=1

(Vi(x, y, t)−Ei)2

sVi = 3 v u u t

1 D

F

X

t=1 W

X

x=1 H

X

y=1

(Vi(x, y, t)−Ei)3

and the histogram feature vector of the channel i can be written as hi = (Ei, σi, si). Unlike the original Stricker’s method I added the sum over all frames. This can be used because of the stationarity assumption introduced in Section 2.1.2. Particular vector entries (features) have analogies in statistics:

• Meaning ofEi is the average of the i-th color channel. Assuming the HSV color space (the reasoning for this choice can be found in Section 1.2.2), averages of Hue, Saturation and Value channels are computed.

• Standard deviations σi measure intensity values spread out from the av- erage: a low standard deviation indicates that the values tend to be very close to the mean, i.e. pixel intensities of thei-th color channel do not vary much over the time and space (see Figure 2.2a where standard deviation is zero — no variance). A high standard deviation indicates that the val- ues are spread out over a large range of values (pixel intensities of the i-th color channel differ a lot in time or space, for instance Figure 2.2b, where σ= 0.5).

• Skewness factorssi can be intuitively understood as a measure of the asym- metry of the distribution of intensity values about its mean value. According the definition, negative skew factor corresponds to the fact that the mass of the distribution is more concentrated on the right (higher values) and the distribution has left tail longer. This corresponds to the fact that the num- ber of pixels having an intensity value over the average is greater than the number of pixels having intensity values below the average (see Figure 2.2d).

Positive skew factor indicates the exact opposite (Figure 2.2e).

Lets take a closer look on the necessity of these three individual features. For the sake of simplicity a video with one frame and one gray-level channel can be assumed. The mean value itself is not a sufficient descriptor as can be seen in Figure 2.2. All the images have the same mean value E = 0.5, but the pictures are different.

Standard deviation feature can differentiate images 2.2a and 2.2b but not the pair of images 2.2b, 2.2c; nor the pair 2.2d, 2.2e. In the case of 2.2b and 2.2c, it is because these two images have only permuted pixels and no one method that is based on the pixel intensities only can really distinguish between them.

The skewness factor can be crucial when compared images have the same mean values and the same standard deviations at the same time — this is shown in Figures 2.2d and 2.2e. Both pictures are computed such that their standard deviations are the same. They can be distinguished by taking the skewness factors

(21)

into account. In the first case, the factor is a negative number, in the second case, the factor has a positive value.

More detailed information on the impact of the same pixel statistics to the visual adequacy of images can be found in [9], which is also very inspirational text about the analysis and synthesis of texture images based on statistical features and complex wavelet coefficients.

(a) (b) (c) (d) (e)

Figure 2.2: Images with the same mean value.

2.3.2 Shape Descriptors

In general, geometric properties are also needed to identify objects [4]. A survey presenting the existing approaches of shape-based feature extraction is framed in the book [19] by Yang et al. and also in the review by Zhang et al. [17].

Because the shape of fuzzy particle object is not well defined, I had tried to find a convenient approach. It turned out that any particle effect on an image can be interpreted as a form with semitransparent parts. This led to the us- age of image matting methods (described more in Section 1.3). Image matting algorithms yield suitable foreground estimation that can be encoded into one gray-level image where background pixels have the value 0, definitive foreground pixels have the value 1 and semitransparent pixels have a value in between — according to their opacity. This so-calledalpha-channel image is further used to analyze the geometrical shape of the object (an object is understood as a set of non-background pixels).

Segmented particle objects have typically translucent blurred boundaries, which makes the use of boundary-based description methods (shape features are extracted from contours) complicated. From the region-based methods, I have chosenimage moment features, which appeared to be satisfactory descriptors [17].

One unpleasant property of the image moments is their increasing impact with a greater distance from the centroid due to the projection onto monomials. In my case, this is partly compensated by the typical attribute of smoke particle effects:

the amount of smoke decreases from the center towards zero at the borders.

Image Moment Features

Moments of images have proven to be a very efficient tool in pattern recogni- tion. Especially they are useful as local descriptors of segmented objects. Over the years, various forms and combinations of moments have appeared [6]. For instance, various constructions of moments invariant to rotation, translation and scaling are listed in the book by Flusser et al. [12]. This book is also an exhaustive introduction to moments and their applications in pattern recognition in general.

(22)

For my purposes, I have decided to use ordinary geometric and central mo- ments, because I don’t require any special properties. In fact, I need to discrim- inate between rotated or scaled images to depict differences of particle effects.

Geometric and central moments present relatively low computational cost [6]

what also affected my choice.

Figure 2.3: Examples of separated smoke lobes.

Low-Order Moments The low-order moments provide well-known geometri- cal properties of a distribution. I will describe fundamental moments to illustrate the applicability to a shape representation. We can consider alpha-channel in- put (gray-scaled image with the values from 0 to 1) for example with segmented smoke lobe foreground (e.g. this can seen in Figure 2.3). The moment value of this distribution may be easily explained in terms of simple shape characteristics:

• Zeroth Order Moment {m00} represents the total “mass” of the given object. When computed for a binary image the area of the segmented object is calculated. This can be easily seen by substituting into definition equation: m00 = PM−1

y=0

PN−1

x=0 I(x, y). It follows that the total mass of a particle object can be compared using this feature.

• First Order Moments {m10, m01} are used to locate the center of mass of the object, which has coordinates x = mm1000 and y = mm0100. If an object is positioned such that the center of mass is placed in the center of the image, geometrical moments are equal to central moments and µ10 = µ01 = 0.

Thus first order moments (resp. centers of mass) are useful for describing the position of a particle phenomenon and its changes.

• Second Order Moments {m02, m11, m20} are often called moments of inertia, which really captures their meaning — for example, m02 and m20 describe thedistribution of massof the image with respect to the coordinate axes. Their more exact description can be found e.g. in [13] and [12].

Instead of using these moments directly as features, I have decided to use a few high-order moments together as one shape descriptor, which will be described further.

High-Order Moments While the lower order moments have intuitively clear semantics (they carry physical meanings associated with region pixel distribu- tion), the higher order moments could not be interpreted so easily — it is diffi- cult to associate them with physical interpretation. Polynomials of high degree do not differ dramatically and since the geometrical (resp. central) image mo- ments are nothing but the image function projections onto monomials {xkyl} (resp. {(x−x)k(y−y)l}), it is suggested to use image moments up to some small fixed order. For detailed discussion of maximal used order, see Section 5.3.2.

(23)

Extending to Videos For each frame, following image features are computed and stored in theshape feature vector (belonging to the i-th frame):

si = (m00, x, y, {µpq})

where p, q ∈ N such that 2 ≤ p+q ≤ N where N ∈ N is the maximal moment order taken into account. Notice that central moments µpq are utilized because they are invariant under translations which makes them suitable to describe shape properties independently on its position. Information about translation (resp.

centers of mass) is already included inx and y and can be used separately.

In order to register the changes through the whole video, I suggest to use two basic statistical measures — mean value and standard deviation. (Another point of view on the description of the dynamics in video is described in Section 2.3.3.) Let V be a video with F frames and let sji be the j-th component of the shape feature vectorsi. Then themean value factorand thestandard deviation factor of the video are defined as follows:

EjV = 1 F

F

X

t=1

sjt

σVj = v u u t1

F

F

X

t=1

sjt−EjV2

The 2×(Q+ 3)shape feature matrixof the video V then can be introduced:

SV = E1V E2V · · · EQ+3V σ1V σ2V · · · σQ+3V

!

Q is the number of image moments from the order 2 up to order N, which can be computed as:

Q= (N + 1)·(N + 2)

2 −3

because all the moments µpq up to the order N ≥ p+q can be written into an (N + 1)×(N + 1) matrix (including zeros) where column numbers correspond to p and rows correspond to q, only one half (including the diagonal) is used.

Number 3 is then subtracted because µ0001 and µ10 moments are not used.

Defined in this way, the columns of a shape feature matrix contain correspond- ing shape features and the rows are mean values and standard deviations. (For example, S2,1V corresponds to the standard deviation of the mass feature m00.) Several examples of real shape feature matrix are available in Section 5.3.4.

2.3.3 Time Variability Descriptors

In order to describe the variability of particle effects over the time, I tried a very simple approach, which has proven to be suitable. Again, the alpha-matted video frames are used. With these frames, the time average and the time standard deviation images are computed. Let V be a gray-channel (alpha matte) video sequence, thenIE denotes thetime average image(a pixel value is the average

(24)

of its values over the time) andIσ denotes the standard deviation image (a pixel value is the standard deviation of its values over the time) defined as follows:

IE(x, y) = F1

F

P

t=1

V(x, y, t) Iσ(x, y) = s

1 F

F

P

t=1

V(x, y, t)−IE(x, y)2

Both of these images have clear semantics. The IE image is a normalized sum of alpha-channel frames containing the segmented semitransparent particle object, so it is obvious that IE(x, y) > 0 if and only if (∃ t) such that 1 ≤ t ≤ F and at the same time I(x, y, t)>0. This simply means that nonzero pixels of IE are right on those places where the observed particle effect was present at least at one frame.

Explanation of Iσ image is very analogous. The fundamental difference is that pixel values represent the variability in time of a particular pixel. Zero means no variance (the value of the pixel was always the same) and higher values correspond to the higher variance, hence Iσ image to a certain extent reflects the time dynamics of a particle effect. I suggest to use this two images as time variability features for further comparison.

2.4 Similarity Measures

In order to depict similarities and dissimilarities of the features, one must in- troduce a metric or at least some similarity function. An overview of different metrics and similarity functions that are often used in digital image processing is given in [3].

2.4.1 Color Distance

Color features can be easily compared by the function introduced in [4]. Let U and V be two videos with r color channels, then we define the similarity as

dcolor(U, V) =

r

X

i=1

wi1

EiU−EiV

+wi2

σUi −σiV

+wi3

sUi −sVi

,

where wij ≥ 0 (i ≤ r, j ≤ 3) are specific weights. The function dcolor is not a metric in mathematical sense because two non-identical color distributions may possibly have the same similarity value of 0. This is the reason whydcolor is often referred to as a similarity function not a similarity metric. However, Stricker et al. [4] showed that this method is fairly robust. Also it is easy to see the basic properties: dcolor(U, U) = 0, dcolor(U, V)≥0 and dcolor(U, V) =dcolor(V, U).

Setting of Weights The weights within the function dcolor can be exploited to adjust the similarity function for a given use. Assuming the HSV color space (more in Section 1.2.2), the weights can be written as a 3×3 matrix W and a color feature matrixCV (belonging to a videoV) can be defined as follows:

W =

wH1 wS1 wV1 wH2 wS2 wV2

wH3 wS3 wV3

 CV =

EH ES EV σH σS σV

sH sS sV

(25)

Let ⊙ be the operator of pointwise matrix multiplication, then the similarity function can be simplified to:

dcolor(U, V) = X

W ⊙ |CU −CV|

where ∗ simply signifies summing up all the elements of the matrix together.

When working with HSV images, the Hue is desired to match more accurately than the Saturation and the Value because the Hue is percepted more sensitively.

This can be done by setting all thewHweights to a higher value than the others.

In order to emphasize the average color (for example, if the lighting condition were roughly the same during the recording), the weights can be set tow∗1 > w∗2 and w∗1 > w∗3. Conversely, if the lighting conditions differ more through the video, w∗1 can be set to small values. Balanced comparison can be achieved by setting all the weights to 1.

A preview of testing videos and their corresponding color feature values is given in Section 5.2, where also achieved results are discussed.

2.4.2 Shape Distance

Let U and V be two compared videos, the 2×4 shape distance matrix can be defined:

Dshape(U, V) =

S1,1U −S1,1V

S1,2U −S1,2V

S1,3U −S1,3V

Q+3

P

i=4

S1,iU −S1,iV

S2,1U −S2,1V

S2,2U −S2,2V

S2,3U −S2,3V

Q+3

P

i=4

S2,iU −S2,iV

where the relative distance

ka−bk= |a−b|

|a|+|b|+ε

is used to compensate the unpleasant fact, that magnitudes of higher image mo- ments may grow exponentially with respect to increasing order. This also fixes potential numerical instabilities and inherent greater influences of higher order moments. The constantε is used to avoid zero in the denominator, in practice it is set to some very small positive number.

Basically, zeroth and first order geometrical moments (masses and centroids) are left for a separate comparison, because they can be easily interpreted and weighted. The higher order moments {µpq} (more precisely, their mean values and standard deviations) are compared correspondingly and then summed.

Following similarity (or distance) function is by its very nature based on dif- ference of moments technique noted in [5]:

dshape(U, V) =

2

X

i=1 4

X

j=1

wij ·Dshape(U, V)i,j

wij ≥0 (i≤2, j ≤4) are specific weights used to balance the similarity function in the same manner as in Section 2.4.1. If the weights are expressed in the form

(26)

of matrix, distance function can be rewritten using the pointwise multiplication operator⊙ for matrices, which is more accessible for implementation purposes:

W =

wE1 wE2 wE3 wE3 wσ1 wσ2 wσ3 wσ3

dshape(U, V) =X

h

W ⊙Dshape(U, V)i

where the sum with asterisk signifies summing up all the elements of the matrix together.

2.4.3 Time Descriptors Distance

This section describes the similarity descriptors based on two images IE and Iσ (defined in Section 2.3.3) and the Pearson’s correlation coefficient.

LetU and V be two compares videos, then themean valuetime descriptor TE and thestandard deviation time descriptor Tσ are defined:

TE(U, V) =r(UE, VE) Tσ(U, V) = r(Uσ, Vσ)

where r is the Pearson’s correlation coefficient defined in Section 1.2.1. Notice, that the commutativity of the coefficient also implies the commutativity of these descriptors.

It should be noted that this coefficient may not be defined — in the case when the denominator from its definition is equal to 0. This occurs if and only if at least one of the images compared by this coefficient has its standard deviation value equal to zero. In terms of our images it means that at least one of the following conditions is satisfied: a) standard deviation of the UEor theVEimage is equal to zero; b) standard deviation of the Uσ or Vσ is equal to zero. The first case cannot happen because there is always some smoke at least at one frame of the alpha matte video. The second case could possibly happen for videos, where all the alpha-matte frames are exactly the same. Obviously, this is not likely for videos of real smoke effects, however, for these cases the value ofr can be defined as zero.

Both descriptors are by the definition coefficients with a value from the inter- val [−1; 1]. The value of 0 stands for completely uncorrelated images (without any similarity), values greater then zero denote that images are positively corre- lated (the higher values, the higher similarity; the exact same images have the coefficient 1), negative values indicate negative correlation. If one of the two im- ages is complementary to the other (e.g. U(x, y) = 1−V(x, y)), then the value is exactly equal to -1.

Letlt(T) = T+12 be the linear transposition of a correlation coefficientT onto the range [0; 1], then the time variability distance of two videos U and V can be defined as:

dtime(U, V) = 2−lt(TE(U, V))−lt(Tσ(U, V))

values of the time distance function are obviously in the range [0; 2], where 0 corresponds to the highest similarity (no distance between images). Examples and practical values are given in Section 5.4.2.

(27)

2.4.4 Overall Similarity

The overall similarity of two videos U and V is simply defined as the weighted sum of individual similarity functions:

d(U, V) =wc·dcolor+ws·dshape +wt·dtime

The weights wc, ws and wt are all greater than or equal to zero and are used to regulate the impact of the single components. Usage of this similarity function is summarized in Section 5.5.

2.5 Method Overview

The problem of comparing a template video with mimicked simulations from a particle system can be summarized into one method, which is recapitulated in Figure 2.4 (darker color marks a user interaction).

First, the user captures a video template and creates a simulation in a particle system. If needed, basic adjustments such as resizing video frames or converting separate frames into the video can be made using the provided tools (prepro- cessing phase). On both videos (the template and the simulation), the alpha matting algorithm is performed to obtain the alpha mattes (the user has to pro- vide a trimap file together with the video input), the result is stored for a possible future use.

In the next phase, color statistics are computed on the original video clips and the shape and time statistics on the alpha matte ones. The original video clips for the color statistics are used in order to depict the overall color distribution of the scene and to avoid possible complications when using with highly transparent smokes — in that case, very low alpha matted values may not be precise enough to allow some reasonable combination with the original frames so that only the smoke colors are depicted.

Extracted features are then used to form a similarity comparison, which can be further used for improving the simulation by the user (the bottom up arrow) and whole process can be iterated.

(28)

Automated Process

User Interaction

Video Template Particle System

Preprocessing

Alpha Matting

Alpha Statistics Color Statistics

Features

Comparison

Trimaps

Figure 2.4: Method overview diagram.

(29)

CHAPTER 3 IMPLEMENTATION

My choice of the development environment has been partly determined by the fact that the selected alpha matting algorithms (see Section 5.1) are already implemented in the MATLAB1. Another conceivable alternative has been the OpenCV library, which is mainly designed for real-time computer vision purposes and contains a lot of useful tools from the digital image processing area. In order to stay consistent with existing code, I have chosen the MATLAB environment.

The built in functions for manipulations with images and videos are fully sufficient for purposes of this thesis. The implemented code has been successfully tested with MATLAB R2013b version.

The source code does not contain any special programming tricks and is guided with commentaries, thus only the most important files and function are listed and described in the following text.

3.1 Source Code Description

The root directory of the code implemented in MATLAB is the ‘Source Code’ folder, which is further divided into subdirectories containing portions of the code.

Each MATLAB file ‘*.m’ consist of only one main function with the name equal to the file name, which maintains a good source code culture.

The overall similarity function is implemented in the filesimilarity.m placed in the root folder and only combines other existing parts. It is a key file for the users, who should utilize only this function (and optionally some video processing utilities).

All the files contain only straightforward implementations and are fully com- mented in the code. A short description of the individual folders and their selected files follows:

• ‘AlphaMatting’ — this folder contains code used for alpha matting. Two subfolders ‘ClosedForm’ and ‘LearningBased’ contains the original codes for the closed form matting and the learning based matting (respectively), which is left, as it was.

1MATLAB is a cross-platform numerical computing environment and programming language developed by MathWorks.

(30)

FunctionrunMattingVideois only an adaptation of the closed form matting to the whole videos, which are processed frame-by-frame.

• ‘Color’ — contains functions implementing the color features and their comparison. The colorMatrix function computes a color feature matrix of one given video file. The colorDistance function compares two given color feature matrices by the color similarity function. These two functions are further combined in the function colorVideosSimilarity.

• ‘Shape’ — geometrical and central image moments are implemented in func- tions geometricalMoment and centralMoments, both in the subdirectory

‘ImageMoments’. Shape feature matrix for an individual video can be ob- tained using the shapeMatrix function. Two shape feature matrices are compared by the shape similarity function in the file shapeDistance.m.

Two input videos can be compared by the functionshapeVideosSimilarity that only combines two previously mentioned functions.

• ‘Time — this directory contains an implementation of the time variability descriptors and their distance. The functiontimeImagescomputes the time average image and the standard deviation image. The time similarity func- tion is implemented in timeDistance.m. Two input videos are compared by a function implemented in timeVideosSimilarity.m.

• ‘Verifications’ — source codes of the verification tests for particular descrip- tors involved in Section 5. In order to run them properly, video template database is expected to be in ‘Data’ directory, placed on the same level in the folder hierarchy as the root folder ‘Source Code’.

• ‘VideoUtils’ — contains two helper functions able to merge images in a specified folder into an AVI video file.

(31)

CHAPTER 4 USER GUIDE

4.1 Installation and Manipulation

All the source code is written in MATLAB, so there is no need to compile it or install it. The only prerequisite is to have a MATLAB environment installed on the computer, at least MATLAB R2013b version is desirable, but the code should work even with older versions (without a guarantee).

The source code is placed in the ‘Source Code’ folder, which can be copied to any desirable location as needed. When working with provided video template database, the best practice is to have the ‘Data’ folder on the same level in the hierarchy as the ‘Source Code’ directory.

In order to run all the provided functions comfortably from the source code root directory, the addPaths function can be executed first to load all the paths needed into the MATLAB environment. This is not necessary when using the main similarity.m function only.

4.2 Computing Similarities

The overall similarity function can be computed using the similarity.m func- tion, which has the following header:

similarity(Video1, Trimap1, Video2, Trimap2, Wc, Ws, Wo)

Video1and Video2are file names of the input video clips,Trimap1 and Trimap2 are the file names of their trimaps used for alpha matte computations. Last three argumentsWc,Ws,Wo are optional and signify the weights: 3×3 matrixWc(color similarity weights), 2×4 matrixWs(shape similarity weights) and 3 dimensional vector Wo (weights of the overall similarity). The meaning of these weights is described in Section 2.4. All the weights are set up to 1 by default.

The input videos are considered to be in an uncompressed RGB AVI for- mat. Sequences of the images (for example outputs of a particle system) can be converted into the desirable AVI file by frames2avi (converts RGB images) and frames2avigs (converts gray-scale images) functions declared in the ‘Vide- oUtils’ folder. Their first argument is a folder containing images to be converted

(32)

and the second (optional) argument is the extension of converted images. Two self-explanatory examples of their usage follow:

>> frames2avi('../SeparateRGBFrames/', 'bmp');

>> frames2avigs('../SeparateGrayScaleFrames/', 'png');

The trimap image is considered to be an RGB bitmap, where the definitive foreground pixels are white and the definitive background pixels are black. Any other color signifies a mixed pixel. (See Section 1.3 for the individual pixel types explanation.) Attention must be paid when creating trimap files in image editors

— some brushes create fuzzy boundaries, which is not desirable (only pure black and white colors are recognized); pencil tools are much better for this purpose.

A preview of a trimap image is shown in Figure 4.1.

Figure 4.1: An example of a trimap (on the right).

A short example, how the similarity function can be used:

>> V1 = '../Data/Cigarette/C09/Frames.avi';

>> T1 = '../Data/Cigarette/C09/trimap.bmp';

>> V2 = '../Data/Simulations/Sim05/Frames.avi';

>> T2 = '../Data/Simulations/Sim05/trimap.bmp';

>> d = similarity(V1, T1, V2, T2);

4.2.1 Finding Out More Details

The similarity function can provide more details than a value of the overall similarity function as shown above. Full output arguments list is:

[d, dc, ds, dt, C1, C2, S1, S2, T1, T2] = similarity(...) where:

− d refers to a value of the overall similarity function

− dc, ds and dt are values of the color, shape and time similarity functions respectively

− C1andC2 are color feature matrices of the input videosVideo1andVideo2

− S1 and S2 are shape feature matrices (of Video1 and Video2 respectively)

(33)

− T1 and T2 contain time images so that T1(:, :, 1) contains the time average image and T1(:, :, 2) contains the standard deviation image, both forVideo1. Meaning of the T2 is analogical.

For instance, if just components of the overall similarity function are required, following command can be executed:

>> [∼, dc, ds, dt] = similarity(V1, T1, V2, T2);

4.3 Interpretation of Results

In general, the value of the overall similarity function stands for the similarity of the compared videos and it is influenced by the color similarity, shape similarity and time distance functions (see Section 2.4.4). The closer it is to zero, the more similar videos are (this value cannot be negative nor the individual components).

In practice, it is hard to follow only a value of this function in order to get better results. It is suggested to use also values of its components for detailed understanding, why the overall similarity changed. These components are briefly recapitalized in the following sections.

4.3.1 Color Similarity

The final value of the color similarity function is influenced by its individual com- ponents listed in Section 2.3.1 and also by setting the weights (Paragraph 2.4.1).

If the value is close to zero, the two videos are very similar in terms of color. In general, higher values signify greater color distance between investigated videos.

Changes of the values obtained from the color similarity function can be ex- amined using the color features matrices (C1 and C2, see Section 4.2.1) of the compared videos. These matrices consist of three rows containing the average features, the standard deviation features and the skewness factors. The columns correspond to the particular H, S and V entries. The meaning of these values is fully described in Section 2.3.1.

A distance between two given color feature matrices can be computed by the colorDistance function:

>> d = colorDistance(C1, C2)

4.3.2 Shape Similarity

Values of the shape similarity function are also influenced by its components and by the weights. In order to track the changes of the shape similarity function, the shape feature matrices S1 and S2 can be analyzed. These shape matrices consist of two rows, where the first row contains the average values of the shape features and the second row contains the standard deviations of the shape features. First three columns of this matrix can be evaluated more intuitively — the first column contains the mass, which captures the amount of the smoke, the second column and the third column include the centroids (x and y respectively), which can be used to compare the positions of the smokes. For a better understanding of the shape feature matrix components, see Section 2.3.2.

(34)

A distance between two given shape feature matrices can be computed by the shapeDistance function:

>> d = shapeDistance(S1, S2)

4.3.3 Time Variability Similarity

Values of the time similarity function are influenced by two factors: the mean value time descriptorTE and the standard deviation time descriptor Tσ. As these two descriptors itself do not provide any intuitive meaning, the best practice when analyzing the changes of the values of the time variability function is to do a visual comparison of the time average images and the time standard deviation images correspondingly. For instance, the time average image of theVideo1 (see Section 4.2.1) can be shown in the MATLAB as follows:

>> imshow(T1(:, :, 1));

The corresponding time images of the compared videos should be visually similar as much as possible, in order to decrease the value of the time similarity function.

The exact semantics of the time images is described in Section 2.3.3.

Odkazy

Související dokumenty

Real biomaterial fatigue and stress-corrosion cracks are inspected using the ECA, TOFD and magnetic field mapping methods, respectively.. The used experimen- tal biomaterial:

The theorems (24.3) on generating decompositions in groups, together with the study of generating decompositions in groupoids and of decompositions of groups generated by

Apart from the formal aspects, the overall impression of the thesis is that the text amasses and presents information from various sources in a well-structured way, it conforms to

Due to the inevitable existence of variability, these assumptions are unrealistic. Practical situations with batches, blocking or starving are called real-case

If the color of the rectangle is blue then its number is equal to rectangle’s width divided by its height. If the color is yellow, the number is rectangle’s height divided by

Finally, some new families of finite CI-groups are found, that is, the metacyclic groups of order 4 p (with centre of order 2) and of order 8 p (with centre of order 4) are

Together with the color symmetry, we are led to consider the (3, 28) representation of SU (3) × SU (6/1) which consists of an antidiquark, a quark and a color triplet scalar that

The results of the simulation experiment are then stored in CKMT at the Simulated World Object and their interpretation is stored as the Solution at the Real World Object.. This