Report on the thesis of Zuzana K´ukelov´a “Algebraic methods in computer vision”

(1)

Report on the thesis of Zuzana K´ ukelov´ a

“Algebraic methods in computer vision”

13 August 2013

Many computer vision problems today are expressed as the problem of finding extreme values of some objective. When the objective has the property that it is a function of a small number of parameters (say of the order of 10), and is expressed as a sum of many terms, it is often the case that for a small subset of the terms, the exact mini- mizer is the solution to a system of polynomial equations, sometimes called a “minimal problem”. This property allows the simple but effective RANSAC formulation to be employed, which has formed the core of numerous successful computer vision systems.

The importance of this paradigm in turn confers a special importance on methods for the solution of systems of polynomial equations, which is the subject of this thesis. Thus the thesis is very relevant to the current needs of the scientific community.

The high-level objective of the work could be said to be to broaden the class of minimal problems that can be solved. This is achieved in two ways: first several new minimal problems are introduced, or new solutions to already-stated minimial problems are derived; second, a way to automatically generate such solutions from the problem statement is proposed.

To begin with the most significant contributions, I find the new minimal solutions to plane-plus-parallax, and to the “6-point one calibrated camera” problem to have potential for useful application. The thesis sketches some potential applications of these methods, but I can foresee that others might emerge in the future. Although these might appear to be somewhat esoteric cases, such special setups can often be exploited in the construction of practical systems, yielding considerable improvements in accuracy.

The automated generation of solvers is another very valuable endeavour. The methods adopted in this work, which largely amount to enumeration over a number of high- level choices, are relatively straightforward, but clearly effective. However, there is also a contribution in simply addressing this problem, and setting up the framework for possibly more comprehensive generators. This is a ripe avenue for further science.

Sometimes the choice of e.g. elimination strategy can lead to a widening of the set of critical configurations of the solvers. For example, where the nullspace of a system linear in monomials is parameterized by some basis vectors, the parameterization sometimes assumes one coordinate is unity (e.g. sentence preceding eq (7.29)). This appears a relatively benign choice, as the basis can always be rotated arbitrarily, but it would be useful to discuss whether it has any practical consequence. At the other extreme, it is

1

(2)

stated after (7.49) that the elimination procedure discussed there assumes seven elements of the fundamental matrix to be nonzero. This seems an extreme disadvantage, but also appears not to be necessary: it would appear sufficient to assume that just one of the seven is nonzero. One might also imagine generating code for the two cases, e.g.f11= 0

and f116= 0, and choosing between them within RANSAC, say.

In converting systems to polynomial eigenvalue problems (see p95), there are ignored constraints on the vector of monomials. It is mentioned on p140 (para 6) that these constraints need not be applied because they will automatically be satisfied in a minimal problem, which seems plausible, but at the time of writing I do not see that this must be true in general, so would welcome some discussion of it at the point where PEP is introduced. And, if true, then it would be good to have discussion of the related technique where the monomial vector is expressed as a linear combination of the nullspace of the input equations viewed as a linear system in the monomials, after which the monomial constraints may often be expressed as a collection of sparse quadratic or cubic equations. Of course such a system need not be simpler than the original, but might be worth investigating within an automated solver generator.

The thesis includes several experimental evaluations of the various proposed algorithms, and for the most part these are useful and instructive. In some cases the new algorithms show little or no accuracy advantage over previous methods, but generally offer moderate speed improvements. In my view, this is a perfectly acceptable state of affairs—the theoretical contributions are in themselves sufficient to justify the importance of the work. One experimental choice which I find somewhat puzzling is the use of the mean number of inliers as a function of number of RANSAC samples (e.g. Fig.

7.6). This is merely a function of the distribution of inliers, which might be just as well characterized by numerical statistics such as quantiles or boxplots. Doing so would allow more compact presentation of these results.

The opening chapters comprise a tutorial in methods for the solution of systems of polynomial equations, and form a very useful primer for the computer vision practitioner.

I consider this in itself a useful contribution, even though it is essentially a synthesis of existing knowledge. Here, and throughout the thesis, there is some (approximate) repetition of material, but in practice this repetition is useful for the reader, as it makes the sections more self-contained, and avoids too many backreferences.

A common theme in the thesis is the idea that where some high-level decisions are to be made (e.g. choosing a monomial ordering, choosing which variables to eliminate), it is useful to search over several alternatives in an automated manner. The thesis reports which option was chosen, but it would be welcome to see some more data on how much advantage was gained by these choices. For example, in the several sections of Ch7 entitled ”Construction of the multiplication matrix”, it would be good to know exactly how much simpler the chosen solution is. Figure 7.24 answers a related question, but it would be interesting to provide some further data.

A small number of typographical errors were also found, which I shall convey sepa- rately to the author.

The author of the thesis proved to have an ability to perform research and to achieve

2

(3)

scientific results. I recommend the thesis for presentation with the aim of receiving the Degree of Ph.D.

Andrew Fitzgibbon Principal Researcher

Microsoft Research, Cambridge, UK

3