Appraisal on the current version of the thesis

(1)

Appraisal on the current version of the thesis Bayesian Model Selection,

by Phuong Thi Thanh Truong, B.Sc., FEI VSB-TUO

In statistics, model selection is the task of selecting a statistical model from a set of candidate models, given data. The thesis presented two Bayesian model selection methods and then applied for linear regression model.

The first method is the model selection with posterior odds in which each pair are compared by the ratio of their posterior model probabilities. And the second method is the model selection with marginal likelihood where two models are compared based on the Bayes factor which is the ratio of their marginal likelihoods.

The thesis has been divided into three major parts. The first part is the collection of probabilistic background and Bayesian model selection criteria in general. The second part is about Bayesian multiple linear regression model and applying the second method of Bayesian model selection for this model. The third part is a collection of three examples of linear regression model with artificial data. In each example, two models are fitted to a given artificial data and then model selection is performed based on the second method of model selection.

Here are some remarks to certain details.

 The structure of the thesis was well-organized but each paragraph inside need reorganize.

 It is indeed not necessary to provide an example of the conditional probability in Subsection 2.1.2.

 Subsection 2.2.2 presented the Bayesian decision making, but it seems to have no role at all in the thesis.

 The notations of parameter and given data in Subsection 2.2.1 and others are not consistent. This fact might be due to different sources of references.

 Some plots are not well-presented and some captions are ambiguous. Most of the presented figures have not been mentioned in the text.

 There are so many typos in the formulas. Some sentences can tell that the author did not really understand what he/she was writing.

 All examples presented in Section 4 are wrong. In each example, the author must compare the two models with the same simulated data set, not each model with different data set. In example 1 and 2, since the data sets were simulated from model 1, the method should support the model 1 instead of model 2. However, the author concluded that the model 2 is better in both examples. And in Example 3, model 1 and 2 have the same formula (?).

Suggestion: It would have been more valuable if the thesis had presented some real data sets. The three examples with simulated data are meaningless. There are infinitely many real data sets in the regression literature. The author should find at least one real data set and apply the mentioned methods and then compare to the traditional methods of model selection like stepwise methods, AIC, BIC,…

Conclusion: To some extent, perhaps the content of the thesis might be equivalent for M.Sc. thesis. There is no major problem in the methods and models presented in the thesis since they have already existed in the literature. However, there is a big mistake in Section 4 as pointed out above. Nevertheless, I still recommend the current version for the defense.

Ostrava, 24/05/2019 Tien Thach

FEI VSB-TUO