• Nebyly nalezeny žádné výsledky

Appraisal on the current version of the thesis

N/A
N/A
Protected

Academic year: 2022

Podíl "Appraisal on the current version of the thesis"

Copied!
1
0
0

Načítání.... (zobrazit plný text nyní)

Fulltext

(1)

Appraisal on the current version of the thesis Bayesian Model Selection,

by Phuong Thi Thanh Truong, B.Sc., FEI VSB-TUO

In statistics, model selection is the task of selecting a statistical model from a set of candidate models, given data. The thesis presented two Bayesian model selection methods and then applied for linear regression model.

The first method is the model selection with posterior odds in which each pair are compared by the ratio of their posterior model probabilities. And the second method is the model selection with marginal likelihood where two models are compared based on the Bayes factor which is the ratio of their marginal likelihoods.

The thesis has been divided into three major parts. The first part is the collection of probabilistic background and Bayesian model selection criteria in general. The second part is about Bayesian multiple linear regression model and applying the second method of Bayesian model selection for this model. The third part is a collection of three examples of linear regression model with artificial data. In each example, two models are fitted to a given artificial data and then model selection is performed based on the second method of model selection.

Here are some remarks to certain details.

 The structure of the thesis was well-organized but each paragraph inside need reorganize.

 It is indeed not necessary to provide an example of the conditional probability in Subsection 2.1.2.

 Subsection 2.2.2 presented the Bayesian decision making, but it seems to have no role at all in the thesis.

 The notations of parameter and given data in Subsection 2.2.1 and others are not consistent. This fact might be due to different sources of references.

 Some plots are not well-presented and some captions are ambiguous. Most of the presented figures have not been mentioned in the text.

 There are so many typos in the formulas. Some sentences can tell that the author did not really understand what he/she was writing.

 All examples presented in Section 4 are wrong. In each example, the author must compare the two models with the same simulated data set, not each model with different data set. In example 1 and 2, since the data sets were simulated from model 1, the method should support the model 1 instead of model 2. However, the author concluded that the model 2 is better in both examples. And in Example 3, model 1 and 2 have the same formula (?).

Suggestion: It would have been more valuable if the thesis had presented some real data sets. The three examples with simulated data are meaningless. There are infinitely many real data sets in the regression literature. The author should find at least one real data set and apply the mentioned methods and then compare to the traditional methods of model selection like stepwise methods, AIC, BIC,…

Conclusion: To some extent, perhaps the content of the thesis might be equivalent for M.Sc. thesis. There is no major problem in the methods and models presented in the thesis since they have already existed in the literature. However, there is a big mistake in Section 4 as pointed out above. Nevertheless, I still recommend the current version for the defense.

Ostrava, 24/05/2019 Tien Thach

FEI VSB-TUO

Odkazy

Související dokumenty

There are some inaccuracies or little paid attention in the second version drafting (for example, Bila Vrana remains in the table of direct competitors (and slightly mentioned in

The fifth analysis studied this assumption, and the results showed that the majority of participants who think start-up is the solution to unemployment did not choose

Author states he used secondary data from Bureau of Economic Analysis and Bureau of Labor Statistics but does not state HOW he used them.. The second part - an online survey, is

Given a model of some flux and corresponding to some surface we introduce a defect operator by gluing to the surface first two three punctured spheres and then closing two of

Selection of parallel data is based on the target language (English) only – so we only need two scoring models for all experiments (both English): the in-domain one is trained on

The semantic models for data normalization are wide-spread relational databases and their utilization in different areas proves that the relational model of data is

First we prove a statement which in our opinion is of interest in itself, and follows as an easy consequence of a result in section 5: the map which consists of taking the

A panel of 160 countries in 1996–2008 is analyzed and two gravity models of exports in the Czech Republic are estimated, the static model by fixed effects (LSDV estimator) and