Cluster analysis - using different sets of attributes

4. Case study

5.3 Cluster analysis - using different sets of attributes

In this part we will cluster both the European and world countries based on three different sets of attributes. At first we only used two sets. The first set is socio-demographic and economic indicators, such as population density, age, GDP per capita, extreme poverty, number of beds in a hospital per thousand people, prevalence of smokers, diabetes and cardiovascular diseases, and many more attributes that we have in our dataset. The second set relates to COVID-19 attributes only, such as number of cases, number of deaths, number of tests, numbers of ICU and hospitalized patients, the rate of testing and stringency index. Because we were not able to find an overlap within these two clusters we decided to take a look at a few COVID-19 indicators to see whether one indicator (or a few of them together) could help us bridge the gap and understand why even countries with different economic levels and different socio-demographic elements have still had the same or similar progress when it came to containing the COVID-19 pandemic. The third clustering we decided to leave here is the clustering created using the stringency index, which refers to how strict the measures are.

5.3.1 Clustering world countries using stringency index

First we had to create a dataset that would keep only the stringency index attribute and remove the rest of them.

world = world[['location', 'iso_code','stringency_index']]

On this graph we see that almost all of Americas belong to the same cluster, meaning they have had a similar stringency index and therefore similarly strict measures. There is also Australia, some countries in Europe as well as some countries in Africa and Asia belonging to the same cluster. Countries in this cluster have had more strict measures at a time, either due to the fact that they have had that many cases or simply because they did want to impose strict measures to prevent an increase. The cluster one has only a few countries spread all over the world are countries that have had either no measures or very few of them. Finally, we have countries in cluster zero. They are mostly in Asia, Europe and Africa. They had some measures imposed but these very relatively mild.

Figure 5.21: Clustering world countries using stringency index.

On the graph below we can see what kind of stringency index each country had at a time.

Figure 5.22: Stringency index.

5.3.2 Clustering world countries using COVID-19 indicators

In this case we have four clusters. Cluster number three consists of those very small countries and sovereign states we have mentioned before that cannot be displayed on the map due to the limitations of this library. These countries, as we have previously seen, always cluster together and this is mostly due to their very small size. The largest cluster is the cluster zero. It covers the entire African continent, almost all of Asia, some Europe, Canada and Australia. These are all of the countries that did not have a full blown pandemic at a time.

Countries in cluster two are countries that were hit severely, they were facing a lot of cases at this time and there was also a large need to impose more strict measures. If we compare it with the clustering using only the stringency index we see that it is not the strictness of measures that plays the most important rule, but overall the number of cases which seems to be very difficult to control even with the strictest measures. And the final cluster is cluster number one which are countries that while they had a lot of cases they did not approach the pandemic in the same way meaning the governments did not impose strict measures and they did not perform many tests.

world = world[['location', 'iso_code', 'total_cases', 'new_cases', 'new_cases_smoothed','total_deaths', 'new_deaths','new_deaths_smoothed',

'total_cases_per_million','new_cases_per_million','new_cases_smoothed_per_million', 'total_deaths_per_million','new_deaths_per_million','positive_rate',

'new_deaths_smoothed_per_million','icu_patients', 'icu_patients_per_million', 'hosp_patients', 'hosp_patients_per_million', 'weekly_icu_admissions',

'weekly_icu_admissions_per_million','weekly_hosp_admissions', 'weekly_hosp_admissions_per_million','total_tests','new_tests',

'total_tests_per_thousand','new_tests_per_thousand', 'new_tests_smoothed', 'new_tests_smoothed_per_thousand','tests_per_case','stringency_index']]

Figure 5.23: Clustering world countries using COVID-19 indicators.

5.3.3 Clustering world countries using socio-demographic and economic indica-tors

When looking at the clustering performed using socio-demographic and economic elements we see a clear division into three clusters. The first cluster, that is cluster number two consists of the wealthiest countries in the world. These countries are very well developed, they have a strong healthcare system and enough money to support and protect their citizens. This is most of Europe with the exception of a few countries in Balkan, North America, Australia, China and a few countries in South America. Next cluster is cluster number one. This cluster consists of the poorest countries in the world. Their healthcare system is underdeveloped and they lack funds to provide the necessary medical equipment and protection to their citizens.

This includes hospitalization, testing, medicine, access to food and water, as well as lack of overall infrastructure in the countries. Final cluster is the cluster zero, which consists of countries that fall somewhere in between. They are poor countries, but they are being more developed with a better infrastructure. These countries can be found all over the world.

Figure 5.24: Clustering world countries using socio-demographic and economic indicators.

5.3.4 Clustering European countries using stringency index

Most of Europe had been increasing hteir measures at this time with the exception of Scan-dinavia. They have either had no measures at all or very mild ones. France and UK had the most strict measures at the time.

Figure 5.25: Clustering European countries using stringency index.

On the graph below we can see what kind of stringency index each country had at a time.

Figure 5.26: Stringency index.

5.3.5 Clustering European countries using COVID-19 indicators

This graph shows us that looking at just one date at a time can give us a false image of what the pandemic progress has been like for some countries. There are four different clusters but none of them correlate with the clusters created based on the stringency index or socio-demographic and economic attributes. This is because pandemic has been spreading in waves and there was very little even very developed countries could do in order to prevent or at least contain the spread of the virus.

Figure 5.27: Clustering European countries using COVID-19 indicators.

5.3.6 Clustering European countries using socio-demographic and economic in-dicators

Now that we have less data since we are looking only at Europe and not the entire world we see that some countries fall into the same cluster now even though they did not when we created the clustering analysis for the entire world. We see here that we have one cluster which are small countries and sovereign states that cannot be shown on this map due to the technical limitations of this library. And we see the second cluster which are all other countries, which seems that all countries in Europe fall within the same category when looking at socio-demographic and economic elements within Europe disregarding the rest of the world.

Figure 5.28: Clustering European countries using socio-demographic and economic indicators.

In document Hlavní práce75007_maln01.pdf, 4.8 MB Stáhnout (Stránka 76-82)