• Nebyly nalezeny žádné výsledky

Medicine and drug development

Drug discovery and development

In order to make a new drug it is shown that on average it takes more than a decade and costs $350 million to $2.7 billion to bring a new drug to market. A lot of this money goes down the drain because only a small portion of the chemical is used in the end. There are two main ways we can use artificial intelligence when seeking treatment for a new disease.

It can either be in order to find a new drug, after which it will be necessary to go through months (or years) of clinical trials. The second way is to repurpose existing drugs, which can lead to much faster delivery to the patients since these drugs have already been through clinical trials.[9]

Personalized treatments

Researchers at the Hospital Clinic in Barcelona (Spain) have developed a tool that was

‘trained’ on more than a trillion anonymized data points retrieved from the clinic’s electronic health records system. The initial studies have shown that the tool was able to correctly predict the trajectory of the disease in individual patients. This is one example of a tool that can help medical staff plan and prepare in advance, whether it be increasing the number of staff members or ordering additional supplies of medication. This enables personalized treatments and their study has shown that there was an improvement at day five of the treatment in 93.3% in the personalized patients, compared to 59.9% on standard of care.

Additionally, at day five, 2% of personalized therapy patients had died vs. 17.7% on standard of care and twenty eight day mortality was 20% vs. 44.2%. The total number of patients included in the study is less than 300. These are very small numbers and additional research is needed. But even so, it sheds light on personalized treatments and opens new paths for artificial intelligence in healthcare.[10]

1.3.2 Banking

Banks are also looking for ways they can profit from the data that they have. JP Morgan Chase & Co. is the biggest American multinational investment bank in the United States, with more than 240,000 employees serving millions of customers.[11] They implemented a system called COiN, a short-form for Contract Intelligence. It was created to process and analyze different documents. In just a few seconds COiN was able to analyze 12,000 annual commercial credit agreements. When done manually by the employees it took 360,000 hours to finish the same amount of work. It was also noted that it was successful in reducing human errors.[12]

1.3.3 Crime

Data mining systems are created to prevent crime, predict where and when it may occur and to counter terrorism. Predpol is a company that has created a machine learning system that predicts crime type, location and time.[13] They analyze existing data to recommend where the police patrol should be increased. They call thisreal-time epidemic-type aftershock sequence crime forecasting.[14] Predpol is already used in several American cities. In Wash-ington there was a 22 percent drop in residential burglaries after implementing the Predpol.

Another study concluded that it resulted in 7.4% reduction in crime volume.[14]

1.3.4 Retail

It is more than eight years now since Target, an American retail corporation made big news for creating a data mining system so good that it figured out a teen girl was pregnant before her father did. They created a system that would profile customers based on their purchases and send them coupons for relevant products in the future. The teen received coupons for pregnancy related products and that is how her father found out about it as well.[15] Searching for a pattern and identifying relationships between the items that people buy is known as market basket analysis.[16] Once such patterns are identified this can affect the promotions that the retailers will give out, the recommendations they will make and even placement of the items in the store (be it physical or online).[17]

1.3.5 Politics

Facebook–Cambridge Analytica data scandal is the most famous scandal involving unethical usage of data recently. It involved using personal data of millions of Facebook users without their consent in order to create profiles that would be used for political advertising. The main reasons for the huge public outrage was due to the lack of consent for using personal data and the fact it was used to sway elections and therefore create influence that goes beyond regular targeted ads to make you buy one product or another.[18]

They used Amazon Mechanical Turk in order to give people a task for which they would be paid. But in order to get paid it was necessary to download a Facebook app called This Is Your Digital Life. This app would take their responses to the survey along with all of the user’s Facebook data as well as all of the data from all of their friends.[19] While only about 260000 users downloaded the app, Cambridge Analytica managed to harvest the data of up to 87 million Facebook profiles (mostly from America).[20] Other data was gathered as well and different models were created to figure out the best way to find target users and influence their behaviour. This data was used to inform targeted political advertising.[18]

They chose those users who were more prone to impulsive anger or conspiratorial thinking than average citizens.[18] Cambridge Analytica would create fake Facebook groups, post videos and images to create maximum engagement and in turn influence the users’ opinions

and behaviours. The company started operating in 2014, influencing the 2016 elections in the USA. In 2018 Christopher Wylie, a former Cambridge Analytica employee disclosed all of the inside information about how the company operated and the way they managed to sway the elections. [21]

1.4 CRISP-DM

1.4.1 Methodology

The methodology used in this project is called Cross-industry standard process for data mining (CRISP-DM). It consist of six phases in total:

1. Business understanding The goal is to determine business objectives, data mining goals and to create a plan for the project. It is necessary to discuss these goals well with the stakeholders of the project otherwise we may end up wasting time creating solutions that were not required of us and do not contribute to the business.

2. Data understanding The goal is to collect the data, describe it, explore it and verify the quality of the data. This can include creating a textual summary of the data, statistics or graphs to visualize the data.

3. Data preparation The goal of this stage is to select, clean and format the data that will be used for the modelling. Data preparation consists of all activities that are done in order to create a final dataset that we will be using to create our models. Data preparation is considered to be one of the most time consuming parts of the project.

4. Modelling The goal is to create a model, therefore during this stage we will explore multiple options and select appropriate modelling technique. The modelling technique used will depend on the tasks we are trying to solve, size of the dataset and type of data we are working with.

5. Evaluation In this phase we evaluate the results we got in the modelling phase. If the models are not sufficiently precise we may decide to create new models, examine the data again or establish new business goals which may lead to creating more accurate and precise models.

6. Deployment The knowledge we have obtained from the models is to be deployed so that the end user can benefit from it. In most of the cases it will actually be the user that carries out the deployment and not the person that created the models. Deployment will look differently across different domains and organizations, whether it be a set of new guidelines for the company, recommendations for new treatment of patients or a new software deployed to increase efficiency in sales.

This process is not linear. During the course of the project it is possible to go back from one stage to another. It is especially common to go back and forth between business and data understanding because one helps us understand the other one better. It is also commonly seen between data preparation and modelling, because depending on which algorithms we decide to use we may have to alter our data slightly. And finally, when we are evaluating our model and we decide that we are not happy with it we can go back to business understanding to get a better picture of what it is we wanted in the first place. This process can be seen in the image below.

Figure 1.4: Process diagram showing the relationship between the phases of CRISP-DM.