Hlavní práce71987_kalt04.pdf, 0.9 MB Stáhnout

(1)

University of Economics, Prague

Master’s Thesis

2020 Tereza Kalendová

(2)

University of Economics, Prague Faculty of Business Administration

Master´s Field: International management

Title of the Master´s Thesis:

A Machine Learning Approach to Startup Success Prediction in the Context of

Venture Capital Industry

Author: Tereza Kalendová

Supervisor: Ing. Mgr. et Mgr. Štěpán Bahník, Ph.D.

(3)

D e c l a r a t i o n o f A u t h e n t i c i t y

I hereby declare that the Master´s Thesis presented herein is my own work, or fully and specifically acknowledged wherever adapted from

other sources. This work has not been published or submitted elsewhere for the requirement of a degree programme.

Prague, August 26, 2020 signature

(4)

Acknowledgments

I would like to express my gratitude to my supervisor Ing. Mgr. et Mgr. Štěpán Bahník, Ph.D.

for his patience and guidance throughout the writing of this thesis. Moreover, I would like to thank my family for their constant support during the whole studies.

(5)

Title of the Master´s Thesis:

A Machine Learning Approach to Startup Success Prediction in the Context of Venture Capital Industry

Abstract:

Startups play a fundamental role as drivers of innovation and growth in today’s world economies, while their failure rate keeps strikingly high. Venture capitalists, as one of the major sources of financing of early-stage companies are continuously seeking to identify promising companies to reach high returns. However, their decision-making is characterized as a time-consuming, labour-intensive and ineffective process. While machine learning methods have the potential to aid and improve the decision-making of venture capitalist, its utilization in venture capital is still very limited. The goal of the thesis is to apply machine learning methods to predict startups’ success with the focus on the needs of the venture capital industry. For that purpose, a unique approach to the definition of startup success is introduced. Four machine learning classification methods are applied to the preprocessed dataset. Overall, all models’ results have shown the potential of using machine learning algorithms to predict the success of new ventures.

The random forest proved to be the best predictor from the set, achieving almost 90%

accuracy. Furthermore, the most important indicators of startups’ success are identified.

The thesis extends the literature on predictive modelling in venture capital and shows that machine learning methods could support investment decisions of venture capitalists.

Key words:

Venture capital, machine learning, startup, success prediction

(6)

List of Tables

Table 1: Descriptive statistics – binary variables 32

Table 2: Confusion matrix 40

Table 3: Class distribution 42

Table 4: Linear regression significant estimates, impact on success variable 44 Table 5: Confusion matrix of logistic regression 45

Table 6: Confusion matrix of random forest 45

Table 7: Confusion matrix of extreme gradient boosting. 46

Table 8: Confusion matrix of SVM 47

Table 9: Models’ performance metrics 48

(8)

List of Figures

Figure 1: Venture capital fund structure 8

Figure 2: Startup lifecycle 10

Figure 3: Venture capital investment process 12

Figure 4: Median time from initial venture capital financing to IPO exit

in the United States from 2007 to 2019 (in years) 26

Figure 5: Boxplots of continuous variables 33

Figure 6: Data sampling ad cross-validation 34

Figure 7: Random forest 36

Figure 8: Gradient Boosting Decision Tree Schema 38

Figure 9: Support Vector Machines Schema 39

Figure 10: Feature importance in Random forest 46

Figure 11: Feature importance of XGB 47

Figure 12: ROC curves 49

(9)

Introduction

The startup economies are raising all over the globe with a striking growth rate of 20%

over the last two years and the total worth of nearly $3 billion (The Global Startup Ecosystem Report, 2019). This impressive volume, comparable with the GDP of the UK, reflects the increasing importance of startups in the world’s economies and their vital role in driving the innovation. Startups and their founders are not only creators of completely new business models but also challengers of incumbents and trendsetters in various areas, from the labor market to biotech. Undoubtedly, startups are the disrupters of today’s world and deserve the attention of all economic actors.

However, the world of startups has its pitfalls. In fact, only a fraction of startups manage to go through a thorny road from their foundation to being a stable company generating profits. Some sources even state that up to 90% of startups become a failure eventually (Krishna, Agrawal & Choudhary, 2016). The environment in which startups operates and grow is very complex and many challenges become unbeatable for founders. Primarily, most of the young companies need to raise capital to be able to turn the founder’s ideas into reality. One of the most important sources of financing in startups ecosystem is venture capital (VC).

Venture capital specializes in the financing of early-stage companies, accepting a high level of risk caused by the uncertainty of the outcome of the investment. In return, venture capital receives ownership share in the companies and chance of finding a Holy Grail among startups. VC-funded companies currently represent one of the world’s major players, especially in the field of technologies. The famous examples of VC-backed companies are technological giants such as Google, Apple or Microsoft. Gornal &

Strebulaev (2015) revealed that 43% of all public U.S. companies founded after 1979 are VC-backed companies. Nevertheless, this is the shiny side of the venture capital industry.

According to Zacharakis & Meyer (2000), VC-backed firms statistically fail at around 20% rate and another 20% of VC investments fails to provide any return to VC. Gage (2012) even states that 30-40% of high potential U.S. startups fail and if we account also for “no return” case, the share reaches 95%. Therefore, there is a space for improvement in VC investment selection. This underlines studies revealing high inefficiencies in VC decision-making process.

VC decision-making process is characterized by lack of innovation, intense time pressure and high amount of information available to evaluate, which results in the time- consuming and labor-intensive process (Fried & Hisrich, 1994; Zacharakis and Meyer, 2000). Moreover, since VCs are strongly linked to the network of contacts and system of referrals, decisions are often made based on subjective feelings. This represents the great risk associated with the deal as decisions might be biased. Therefore, optimization of the VC investment decision-making process could bring some serious improvements to the venture capital industry.

(10)

One of the natural ways to support VC decision-making process is to seek help in data.

The amount of generated data is exponentially growing and many businesses continuously leverage on newly available information extracted from data. Data mining with the use of machine learning helps people and organizations to process a large amount of data to identify the pattern and make predictions on future events. While machine learning models have the potential to aid and improve decisions of VCs, the utilization of machine learning and data-driven decisions in the VC industry is still very limited.

The aim of this thesis is to apply machine learning methods to predict startups’ success with the focus on the needs of the venture capital industry. Based on the findings, implications for VC investors as well as startups will be provided, including assessment of key indicators of startup success. The thesis aims to contribute to the existing literature by providing an outlook on the utilization of machine learning in the VC industry and linking the application of machine learning prediction models to the needs of VC.

The first chapter of the thesis is dedicated to the introduction of the venture capital industry and the startup ecosystem and provides a theoretical base for the design of machine learning models suitable for the VC environment. The second section of the theoretical part describes the basics of machine learning and its utilization in VC and provides a literature review of previous works on startup success prediction. In chapter 2 and 3 research problem, questions and approach of the empirical part are specified.

Chapter 4 covers the procedure of collecting and pre-processing data, including their descriptive statistics. In chapter 5, an overview of machine learning models used for prediction of startup success is provided. Chapter 6 presents the results of empirical analysis including discussion over the results. Finally, chapter 7 assess the limitations of the study.

(11)

1. Literature review

1.1. Introduction to Startups and Venture Capital

1.1.1. Startup

The term “startup” does not have any particular definition widely used among researchers.

Considering company as a “startup” is often perceived differently by various stakeholders as well as researchers causing inconsistency in the definition throughout the literature.

However, the characteristics of startup company surfacing repeatedly within the literature can be summarized by three crucial criteria: novelty, innovation and growth.

Freeman and Engel (2007, p. 94) describe startup as a “young venture with liabilities of newness and smallness.” In general, new venture is considered as a startup up until 10 years of its existence (Bormans et al., 2019). Nevertheless, the age of the company provides no information to distinguish startup from any other traditional early-stage business. What differentiate startup is its innovative product, service or business model and pursuit of growth. Ries (2011) sees innovation as an essential part of the definition and emphasizes that this word’s meaning should be understood broadly. The startup embodies innovation in various manners, from bringing new technological miracles to the market, employing existing technology or know-how in different spheres, introducing new business model, to selling existing product or service in new locations. The key is innovation drives the company’s success (Ries, 2011).

Ries (2011) states further one more important aspect of the definition, the extreme uncertainty in which startups operate. The context of high uncertainty is another differentiator of startups from most new businesses because traditional companies often clone existing businesses that have been proven to yield profits and lacks mentioned innovation. The risk is limited in these cases. The uncertain nature of startups is pivotal for this thesis which aims to uncover startups future and thus reduce the uncertainty.

Finally, high risk is strongly linked to the last essential characteristic of the startup.

According to most researchers, the rapid growth is what defines the startup (Freeman &

Engel, 2007; Blank & Dorf, 2012; Graham, 2012). The startup needs to aim to scale up with intention to increase the turnover, number of employees or markets in which it operates (Bormans et al., 2019). Blank & Dorf (2012, p.17) state the startup is „a temporary organization in search of a scalable, repeatable, profitable business model.“

Graham (2012, p. 1) even claims “…it is not necessary for a startup to work on technology, or take venture funding, or have some sort of “exit.” The only essential thing is growth. Everything else we associate with startups follows from growth.”

Rapid growth and keeping the position of innovator require substantial capital. Startups’

founders usually do not have sufficient funds to finance their companies alone. Therefore, founders need to seek outside financing. The forms of startup financing will be described in the next chapters.

(12)

1.1.2. Venture Capital

Startups represent young and small companies with very limited tangible assets, operating in high levels of uncertainty and expecting years of negative profits. Furthermore, information asymmetry is present between the founders and potential investors (Gompers

& Lerner, 2001). These specifics limit them from receiving financial support in form of bank loans or other debt financing. The high risk inherent in startups requires higher interest rates which exceed those allowed to charge on loans. Startups usually lack hard assets against which the debt is secured (Zider, 1998). Because of these structures and rules of the financial markets, venture capital has developed as an important intermediary providing capital to startups and for many of them, venture capital is the only potential source of financing.

The term venture capitalists is often applied to all investors financing startup ventures, including stand-alone individuals as well as professionals investing on behalf of others (Freeman & Engel, 2007). For the purpose of this thesis, the term will be used to apply to class of professional investors being in line with Gompers’ & Lerner’s (2001, p.146) definition of venture capital: “Venture capital are independent, professionally managed, dedicated pools of capital that focus on equity or equity-linked investments in privately held, high growth companies.” Put differently, venture capitalists specialize in locating, evaluating, and selecting high-risk but potentially high-profit private companies (Silviera

& Wright, 2007) and invest in them in exchange for equity or an ownership stake.

Besides providing capital, venture capitalists enter into partnerships with startup founders and take an active role in advising the firm, retaining important control rights within the company. Venture capitalists nurture the project, monitor strategy and investment decisions, and provide social capital, such as access to their network of consultants, investment bankers or lawyers, market access, or a supply of personnel (Gompers &

Lerner, 1999; Freeman & Engel, 2007). The assistance by VC is often considered as the key input into the startup (Silviera & Wright, 2007) adding value even after the company goes public (Brav & Gompers, 1997).

The reason why venture capitalists invest in the startups and help them with their growth is simple. After the company reaches a sufficient size and credibility, and of course grows in its value, venture capitalist sells the investment either through public markets (Initial Public Offering) or through an acquisition. The shared objective of entrepreneurs and venture capitalists is to create liquid equity value. Put simply, startups are born to be sold (Freeman & Engel, 2007).

A question that naturally follows is, what is the definition of success for a startup and its founder when its final destination results in loss of founders’ control over the company or disappearance of business in the merger or acquisition. The definition of startup success is discussed in section 1.1.7 Startup Success and failure.

(13)

1.1.3. Venture Capital industry

To understand the venture capital industry it is important to further discuss the whole

“venture cycle.“ For the most part, venture capital cycle was described in the previous section. Gompers and Lerner (2001) summarize the venture cycle as

“… it starts with raising a venture fund; proceeds through the investment in, monitoring of, and adding value to firms; continues as the venture capital firm exits successful deals and returns capital to its investors; and renews itself with the venture capitalist raising additional funds.”

(Gompers & Lerner, 2001, p. 152) However, the starting and closing phase of the cycle need further explanation. For these phases, venture fund’s structure and mechanics of fundraising are central.

Venture capital funds are financial intermediaries between sources of funds, represented typically by institutional investors, and startups (Cumming & Johan, 2013). Investors in VC funds are usually very large institutions such as pension funds, insurance companies, financial firms, or university endowments (Zider, 1998). Venture capital funds pool their invested capital and connect it to entrepreneurs seeking funding. These institutional investors are called limited partners and establish limited partnership with the venture capital fund managers called general partners. Limited partners are the passive investors with limited liability, while general partners are venture capitalist responsible for day-today operations and management of the fund (Cumming & Johan, 2013). The basic intermediation structure of venture capital funds is depicted in Figure 1.

Figure 1: Venture capital fund structure

Source: Cumming & Johan, 2013, p. 4 A limited partnership is structured over a 10-year horizon (with the possible extension for an additional 3 years) because venture capital fund’s investments typically lasts over a 2- 7 years. Over the first few years venture capitalists select promising investments and then

(14)

support the companies and endeavor to increase investment value for the remaining years of the fund’s life (Cumming & Johan, 2013).

Incentives for venture capitalists are threefold. First, when the VC fund exits the investment and their stocks are sold at a premium above the investment cost, the VC general partners receives usually 20% of capital gain. Secondly, general partners receive a yearly management fee (around 2.5%) based on the size of the fund. Thus, their fees increase if the value of the portfolio companies increases and more capital is raised from limited partners. Thirdly, venture capital managers reputation improves if their investments perform well (Freeman & Engel, 2007).

To sum it up, all parties benefit from growth in value: the venture capital industry needs to provide sufficient return on capital to attract institutional investors, attractive returns for venture capitalists themselves and sufficient increase in value of the entrepreneurs’

companies to attract their great ideas generating superior returns (Zider, 1998).

1.1.4. Startup lifecycle

Venture capital is not the only source of startup financing. Other forms of financing are often more suitable for a company and investors depending on startup’s stage of development. In fact, venture capital investors focus on the middle part of the S-curve depicted in Figure 2. They avoid both extremes of startup lifecycle, the early stages, when the uncertainty is too large, and the later stages, when growth rates slow dramatically (Zider, 1998). In this section, the various development stages and corresponding forms of financing will be described.

The lifecycle of a startup begins with a founding event. Founders commit effort and their reputations to the transformation of their idea into a new business organization. This initial stage of a company life cycle is called Seed. During the seed stage business plans and prototype versions of the product or service are developed (Freeman & Engel, 2007).

Seed stage is typically associated with substantial need of capital and negative profits.

The entrepreneurs often finance the company from their own savings or obtain financing from so called “FFF:” family, friends, and “fools” (Cumming & Johan, 2013). As the entrepreneur moves towards establishment of going concern, angel investors step in.

“Angels” are professional individual investors and represent a common source of capital for a startup before it obtains more formal venture capital financing (Wong, 2002).

The second period commences when the product development is completed and company requires further funding for marketing, manufacturing and sales of the product. This period is known as early stage or startup period. A company approaches breakeven point and begins to generate first positive profits. At this time, startup seeks first round funding (or series A) of institutional investment from venture capitalists (Freeman & Engel, 2007).

(15)

With continued success startup turns into expansion phase and requires more capital to finance increased production capacity, market or product development and to provide additional working capital (Cumming & Johan, 2013). More funding rounds from venture capitalists follows until the company matures into the later stage. The later stage is also called “mezzanine” and indicates the company is close to a transition from private company to being publicly quoted. This event is called initial public offering (“IPO”), and it is “the first time a company sells its shares for sale in the public market” (Cumming

& Johan, 2013, p. 7).

Figure 2: Startup lifecycle

Source: Cumming & Johan, 2013, p. 7

1.1.5. Decision process of VC investors

One of the goals of this thesis is to provide VC investors a tool to support their investment decisions and contribute to simplification of decision-making process. Due to that, one needs to understand how venture capitalists actually make investment decisions. This exploration consists of two related areas: the process of VCs’ evaluation of potential investments and the criteria VCs use to evaluate the investment. In this section, the models of decision-making processes are introduced, followed by the review of the most important key decisive factors in the investment process.

(16)

Undoubtedly, the decision to invest is difficult task even for experienced investors (Shepherd et al., 2002). Asymmetric information among actors makes the initial decision to invest crucially important part of the venture capital cycle (Fried & Hisrich, 1994).

Consequently, there has been substantial literature body attempting to understand and structure the investment selection decision processes of venture capitalists. Most of the studies have paid attention to the commonalities among VC firms, striving to identify one universal decision-making process. Two studies and their proposed frameworks have received particular attention by scholars: the study by Tyebjee and Bruno (1984), proposing a five-step model for investment activities, and later conceptualization of this model with added modifications proposed by Fried and Hisrich (1994). Even though both studies are dated back to the end of the 20th century, frequent application of proposed models in current literature demonstrates their continued relevance. For better understanding, both models are graphically summarized in Figure 3.

The first step of Tyebjee’s and Bruno’s (1984) model is deal origination where venture capitalists discover potentially promising investment opportunities. Due to high number of investment prospects, VCs often rely on various intermediaries to connect them with new ventures. In the second step, the screening decreases the overall number of potential deals to a manageable set of projects. VC firms standardly have small staffs facing large amount of potential deals, therefore, they use screening criteria to eliminate its number significantly. In the third evaluation step, VC managers analyze and assess the potential return and risk associated with particular deal. When the outcome of deal evaluation is favorable, venture capitalists proceed to the fourth step and structure the terms of the deal, such as the amount, form and price of the investment. Structuring typically results in signed contract between the entrepreneurs and venture capitalist. The last step consists of post-investment activities including broad range of activities venture capitalist provide until they exit the investment.

Fried and Hisrich (1994) modelled the decision-making process in six stages. Whole process starts also with origination. In this case, authors emphasize that venture capitalists wait for deals (investment proposals) to come to them rather than actively seek the deals themselves. Therefore, most proposals come to VCs by referral. The next step is VC firm- specific screening. Venture capital funds usually specialize in certain areas and thus have specific criteria on industries, geographic location, stage of financing or investment size.

If the proposals pass through firm-specific screening, the third step is generic screening where VCs go through business plans. Both screenings shall not take significant amount of time, thus, offer effective way to reduce number of potential investment opportunities.

Next two steps consist of first- and second-phase evaluation. During the first-phase evaluation, comprehensive analysis of the company is composed including a long list of activities that VCs undertake to evaluate the project. The most frequent activities are interviewing all member of the management team, tour facilities, contacting the entrepreneur’s former business associates, existing outside investors, current or potential customers. The second-phase’s objective is to determine what are the obstacles to the investment and how can be overcome. Before entering this phase, an approximate

(17)

understanding about the structure of the deal and price is required from VCs. The final stage is closing. Details of the structure are finalized and contracts negotiated and signed.

However, Fried and Hisrich (1994) further mention that the last three stages are not so easily distinguishable in all VC funds.

Figure 3: Venture capital investment process

(A) (B)

Source: (A) Tyebjee and Bruno (1984), (B) Fried and Hisrich (1994) While the above mentioned studies speak about the sequence and model the structure of the process, a study by Gompers et al. (2020) provides detailed information on VCs’

practices in each stage, and compare its importance in the final decision. The study works with a sample of 681 VC firms bringing comprehensive picture on many areas of the current decision-making process in the VC industry. They classify VCs’ decision process into 3 stages: pre-investment screening (including sourcing, evaluating and selecting investment), structuring, and post-investment monitoring and advising. Pre-investment screening is the most relevant stage for the purpose of this thesis due to the suitability of the application of machine learning models during this initial selection. Thus, structuring and post-investment activities will not be discussed further.

During the pre-investment screening, VCs first generate a deal flow, a stream of their potential investments. VCs use several sources of the investment opportunities but as already mentioned, the networks and referrals are the most prominent ones. According to Gompers et al. (2020), over 30% of deals are generated through professional networks,

(18)

20% are referred by other investors and 8% are referred by existing portfolio companies.

VCs proactively self-generate almost 30% of deals and 10% comes from VCs management. Based on these results, Gompers et al. (2020) highlight the importance of active deal generation from the VCs side and even mention the recent trend of quantitative sourcing in the VC industry. They further add that only few VC firms in their sample use this method. This finding supports the need for wider utilization of quantitative techniques during the sourcing stage, which this thesis attempts to contribute to. All potential investments are then carefully selected based on the selection factors and further evaluated using forecasting valuation techniques. The selection factors are further discussed in the next chapter.

Gompers et al. (2020) also consider which of the decision-making activities are more important for value creation. The results show that deal flow, deal selection, and post- investment activities all add value, but the deal selection is ranked as the most important of the three by venture capitalists.

1.1.6. Investment criteria of VC investors

In the previous section, investment decision process was divided into several stages characterized by specific set of activities. Two of those stages, screening and evaluation (or selection) of the project, involve application of investment criteria. Investment criteria represent key decisive factors in rejection or acceptance of the deal. Exploration of these factors is fundamental for startups’ founders aiming to receive funding from venture capitalists. Therefore, extensive stream of literature was devoted to identification of these key decisive factors within the investment process and will be summarized in this chapter.

It is important to mention, that each venture capital firm and even venture capital manager weight various criteria differently. Therefore, the summary provided in this thesis should represent a sample of investment criteria proven to be significant for the investment decision-making in various prior empirical studies.

Several studies on this topic show that investment criteria could be grouped into the four most relevant areas: human capital, product/service, market and financial considerations (Khanin et al., 2008; Kollmann & Kuckertz, 2009; Dhochak & Sharma, 2016). The especially important criteria seems to be human capital, which has been repeatedly mentioned in the previous studies. Average importance is assigned to product and market related criteria and lower importance have fund and terms of a deal’s structure (Kollmann

& Kuckertz, 2009). Each of the area will be discussed in more detail in the following paragraphs.

Human capital characteristics

The human capital criteria include all characteristics related to startups’ founders and their teams. Startup’s management team is the main driver of the project and according to

(19)

various studies, VCs put utmost importance on their competency. MacMillan et al. (1985) have revealed that out of ten most important decision criteria for US venture capitalists, half is related to the personality or experience of the entrepreneurs. When evaluating characteristics of entrepreneurs and startups’ top management, VCs consider psychological characteristics as well as cognitive capabilities. For example, perseverance, commitment, attention to detail, and high risk tolerance are appreciated characteristics (Khanin et al., 2008). Strong leadership capabilities are also considered as highly relevant (Robinson, 1987). VCs further assess whether management team consists of experienced and qualified managers with different functional backgrounds and strong track record (Robinson, 1987; Muzyka et al., 1996).

Market characteristics

However, some in-depth studies of VC investment criteria show that market-related characteristics, such as market growth or competition, are even more important than team characteristics (Khanin et al., 2008). Dhochak and Sharma (2016) have built a model demonstrating that macro-economic environment, regulatory environment and industry characteristics have the highest driving power and name them “base kay factors“ in the investment decision-making process. Other studies show that VCs are concerned about a sufficient access of a startup to the market (Tyebjee & Bruno, 1984), startups ability to satisfy a market need (MacMillan et al., 1985) or sufficient growth of a market (Muzyka et al., 1996). Furthermore, the degree of competitive threat in a market is another underlying criteria for most of the venture capitalists (MacMillan et al., 1987). Kollmann and Kuckertz (2009) summarize important market criteria into three characteristics:

market volume, growth and acceptance.

Product or service characteristics

Khanin et al. (2008) outline the findings of studies on the product/service criteria applied by VCs as follows. Product or service should be unique or sufficiently differentiated from other’s offerings available in the market, should be proprietary, have functioning prototype and superiority over competitors’ products. In other words, product should be innovative, patentable and have unique selling proposition (Kollmann & Kuckertz, 2009)

Financial characteristics

Financial characteristics include return on investment, deal’s fit in the investment strategy or exit possibilities. Estimated returns from investment in a startup need to justify venture funding and the price of equity stake has to be attractive to make the investment happen.

Venture capitalists also investigate what are the exit options to ensure timely liquidation of their investment is possible (Khanin et al., 2008).

(20)

1.1.7. Startup success and failure

The definition of startup success was perceived from various angles in the literature. In order to properly define the dependent variable of the prediction model (i.e. startup success), the views on startup success and failure are gathered and described in the following paragraphs.

The measurement of startup success depends on the company’s state of development in the startup’s lifecycle. Based on that, various possibilities of startup success definition emerge. Witt (2004) in his study considers two ways of assessment of a startup success and suggest several possible definitions for each of them.

Firstly, he attributes the success of a startup to the success of a founder. For an entrepreneur, the transformation of a business idea and planning phase into a business startup might be considered as a success. However, the fact that he or she has been capable to move his/her vision to the next stage is more about the founder’s commitment and it has nothing to do with the actual success of a company. Similarly, the entrepreneur might consider startup as successful based on personal evaluation criteria of startup’s performance. These criteria involve objective success criteria such as increase in company value or salary but also subjective expectations and feelings of the entrepreneur.

These founder-related measures of success are too subjective and unstable and will not be considered further in this thesis.

The second group of Witt’s (2004) measures of startup success are company-related measures. These involve the survival of a startup, company’s growth rates and profits.

Survival as a measure of the success of startups is easily accessible from most of the available historical databases. Nevertheless, the measure has some disadvantages. For example, when company still exists after some period, it does not necessarily mean it is healthy and successful.

The growth of the company is one of the fundamental attributes of a startup company, thus, it should be an ideal measure. Common indicators of company’s growth are sales or the number of employees. For example, Tavoletti (2013) assess startup success by „the potential of early international growth.” Issue with this measure is the company’s size effect. Small startups experience higher growth rates than older and larger companies.

Therefore, relative and absolute growth measures are necessary to use when measuring startup success.

The last Witt’s (2004) company-related success measure is profit, which can be assessed only at later stages of a startup’s development cycle when it reaches profit breakeven.

More researchers agree on accuracy of this measure of startups’ success. According to Lussier and Pfeifer (2001), a company can be classified as successful when it generates at least industry average profits for the three last years. Stucki (2013) use in his study two binary measures of success: survival and achievement of profit breakeven. Sharchilev et

(21)

al. (2018) view revenues as a perfect success metric, arguing that “generating revenues is the ultimate financial goal of every company.” On the other hand, they consider limitations of this metric for empirical studies because revenues and other financial performance indicators are often concealed to the public. Furthermore, it might take “up to eight years for an average company to become profitable” (Sharchilev et al., 2018).

Witt (2004) share the same concerns, mentioning the trade-off between growth and profitability as some startups achieve higher growth rates at the expense of profitability.

Dempwolf et al. (2014) choose as another approach to a startup success, the number and size of the investments a startup receives. Also, Sharchilev et al. (2018) conclude with the same metric as the most suitable measure of startup success. According to them, a startup securing funding rounds is representation of potential business value and reflects the decision of investors who are experts in the field.

The decision of investors and their point of view on the startup success is in the interest of this thesis. As mentioned in previous chapters, for venture capitalists a startup represents an investment for which they expect to receive high returns and a successful exit. In eyes of venture capitalists, the ultimate success of a startup is measured by the sale of its shares through an IPO or to another company through a merger or an acquisition. Either of the events is usually considered as a success for a startup as well, since it yields extensive amounts of money to its founders, investors and early employees (Guo, Lou & Pérez-Castrillo, 2015). Conversely, if a startup is unsuccessful and fails, venture capitalists exit through redemption or liquidation. It means, a startup “redeem the shares of the venture capitalist on demand pursuant to a contractual put right” or

“the venture capitalist receive a cash distribution upon the liquidation of a startup”

(Smith, 2005). The failure of a startup is briefly discussed in the next paragraphs.

In general, the majority of startups fail. Researchers estimate that startups’ failure rate fluctuates from 50% up to 90% (Song et al., 2008; Boss, 2010; Krishna, Agrawal &

Choudhary, 2016). Song et al. (2008) concluded that after four years of operations, only 36% of new technology ventures in the United States had survived and after five years, the survival rate drops to only 22%. Krishna Agrawal & Choudhary (2016) even state that the industry standard is a failure of 9 out of 10 startups. VC-backed startups fail at a much lower rate (20%) than is the average (Zacharakis & Meyer, 2000). However, Zacharakis and Meyer (2000, p.323) states that “another 20% of the VC's portfolio fails to provide any return to the VC.”

The reasons for such high failure rates have been explored in various studies. Their findings show that the failure is usually caused by the internal factors within the startup rather than the external ones (Triebel et al., 2018). Triebel et al. (2018) state that the most common reasons for a startup failure may be aggregated into five areas: market related, team or personal-specific, capital procurement, the technological concept, and others.

They also emphasize that usually variety of reasons play role in the startup failure. The major causes are team-internal reasons (35%) and market obstacles (26%).

(22)

Study by CB Insights (2019), a major platform aggregating data about startup ecosystem, present 20 most common reasons based on the analysis of 101 failed startups. Three most frequently mentioned reasons are “no market need” (42%), “ran out of cash” (29%) and

“not the right team” (23%).

Nevertheless, these studies evaluate startups retrospectively, after they fail. The purpose of this thesis is quite the opposite, to determine which criteria lead to success of a startup in the future. These criteria are often called success factors.

1.1.8. Startup success factors

The startups’ success factors should overlap the investment criteria of venture capitalists, otherwise, venture capitalists would not choose the best investment opportunities.

However, empirical studies on the relationship of investment criteria and success are still missing (Kollmann & Kuckertz, 2009). Therefore, a brief summary of existing findings on startups’ success factors is presented in this section.

Many studies agree that startup success depends on various aspects. The variability of results further indicate there is no single dominant criteria determining startup success.

Some studies focus on impact of specific startup characteristics. Weking et al. (2019) examine whether and which startup’s business model contributes to startup’s success, in form of survival. The results reveal that business model is influencing variable and show that business models Freemium and Subscription significantly contribute to startup survival. Spiegel et al. (2015) assess networks of startup founders and conclude that socially well-connected founders are more successful with their business. Also, Dessyana and Riyanti (2017) pay attention to startup founders and evaluate whether their personalities contribute to startup success. Their results confirm that “the higher the degree of entrepreneurial selfefficacy, the higher of the success for business startup”

(Dessyana & Riyanti, 2017, p. 67).

Other studies examine wide range of startup characteristics and factors and compare their impacts between each other. Kalkati (2002) analyze 38 criteria identified by venture capitalists who experienced both success and failure in high-tech startups. His findings reinforce venture capitalists’ belief in the importance of team’s and founder’s characteristics by concluding that entrepreneurial quality play critical role in the startup success. Furthermore, resource-based capabilities such as managerial, technical, or marketing capabilities, and venture competitive strategy, also have significant effect on venture success. The study also reveals that product uniqueness is not the necessary factor bringing the success, but it is the ability of startup to “meet the unique requirements of customers” (Kalkati, 2002, p. 447). Most of the market related factors did not show significant impact except market growth rate and stimulating existing market. That means, startups’ entrepreneurs “can achieve initial success more easily and rapidly in the growing market and by stimulating existing market instead of creating a new market”

(23)

(Kalkati, 2002, p. 450). Finally, none of financial consideration criteria have shown significant impact.

Song et al. (2008) conducted a meta-analysis and evaluated 24 possible success factors of new technology ventures. Eight factors prove to have significant impact including team related factors (industry and marketing experience), resources (financial resources, supply chain integration, firm age and size), market scope and patent protection of product.

(24)

1.2. Introduction to Machine Learning

The amount of data about our world is rapidly growing every year. Digital technology became a part of our daily existence and continuously generates and collects data in various fields, from science to our personal life. The datasets are getting larger not only in volumes (number of observations) but also complexity (number of observed attributes) resulting in much more structure in data. To leverage on the ever-growing databases, data must be processed and turned into knowledge in a smart and effective way. This is when data mining and machine learning come into play.

Data mining is defined as any process that discovers patterns and unexpected structures in data, revealing some useful information (Ratner, 2017; Witten et al., 2017). Alpaydin (2014, p. 2) mentions the analogy with mining:

„…large volume of earth and raw material is extracted from a mine, which when processed leads to a small amount of very precious material; similarly, in data mining, a large volume of data is processed to construct a simple model with valuable use.”

(Alpaydin, 2014, p.2) Besides that, the process of discovery must be automatic or semiautomatic, meaning that computer programs seek regularities or patterns in databases automatically (Witten et al., 2017). This is being done by application of machine learning algorithms, thus, machine learning provides the technical basis of data mining (Witten et al., 2017). Some authors equal data mining to application of machine learning algorithms to large databases (Mitchell, 1999; Alpadyin, 2014).

Ratner (2017) summarizes three conceptual components that defines today’s data mining:

(1) statistics, (2) big data and (3) machine learning. Mitchell (1999) highlights that machine learning algorithms are central for the data mining process but also notes that the process involves other important steps such as building and maintaining the database, data formatting and cleaning or human expert knowledge for results extraction. To sum it up, data mining is a set of human-computer interaction, statistical analysis, databases and machine learning algorithms (Mitchell, 1999).

The term machine learning was introduced by American computer scientist A. L. Samuel in late 1950s when he published a self-learning checkers-playing program. However, wider adoption of machine learning did not happen until after 2000, as the advancements in computing power and ubiquitous availability of big data allowed for the acceleration of its use. Before explaining what machine learning is, it is important to mention its relation to the term artificial intelligence, which both appear very often side by side in today’s literature and media. Machine learning is a subset of artificial intelligence and one of the ways it is now expected to achieve a broader concept of artificial intelligence (Mitchell, 1997).

(25)

Machine learning may be defined as the ability of a computer (a machine) to learn the structure in the data without being explicitly programmed (Ratner, 2017). In other words, algorithms and statistical models are used to perform many mathematical operations on the sample data and automatically learn relations and rules within them. The core task of ML is to make inference (going from particular observations to general descriptions) from the sample data (also called training data). Thus, ML uses the theory of statistics in building the models that describes the patterns in data, which, if found, are able to generalizable complex problems. The models can be used for prediction, description, or both (Alpaydin, 2014). The predictive models “forecast what will happen in new situations from data that describe what happened in the past” (Witten et al., 2017, p. xxiii), while descriptive models describes the structure of data and supports explanation and understanding of a problem (Alpaydin, 2014; Witten et al., 2017).

There are two major types of machine learning problems, with a key difference in the type of entry data structure. Supervised learning requires to have training data comprised of examples of input variables (X) and corresponding output variables (Y) and the task is to learn the mapping from the input to the output (Y=f(X)). Once the algorithm learns the rule of mapping input to the output from the training dataset, it can predict or classify output from new input data, making correct predictions for novel instances (Alpaydin, 2014). The examples of the methods are classification (output is a category) or statistical regression (output is a number).

In case of unsupervised learning, the dataset contains only input variables without any desired output. The aim is to find the patterns in the input space that occur more often than others. Those regularities make groupings of inputs. The example of unsupervised method is clustering (Bishop, 2006; Alpaydin, 2014). In this thesis, the focus is placed on classification techniques and the models used will be described in more detail the Methodology chapter.

Data mining with the use of machine learning found its practical application in various fields and many great inventions and applications we are using in our day-to-day life would not exist without machine learning. ML ability to learn from past experience and predict future output represent great potential for its application in the venture capital industry where historic data about startups accumulated over the past decades and the prediction of companies success is of crucial importance for the investors. The utilization of machine learning in the venture capital industry is discussed in the next chapter.

1.2.1. Utilization of ML in the VC

Rapid data growth in the financial industry and its specifics, such as the need for predictive analytics, suggest that the use of machine learning techniques could bring some positive technological advancements for the future of this sector. Indeed, a large body of academic literature is devoted to the application of machine learning algorithms to various

(26)

financial data as well as many companies have already employed the use of ML to their processes. Examples of the use-cases include asset management (e.g. securities investing and trade execution) or assessment of bank credit worthiness (Guida, 2019; Johnson et al., 2019).

By contrast, sources discussing ML in the venture capital sector are scarce. While there are some articles on this topic recently published by established online publishers (e.g.

Forbes, TechCrunch), only a few VCs announced publicly incorporation of ML into their operations. More importantly, there is very limited number of academic literature considering the use of ML in the context of venture capital industry, although, machine learning surely have a place in this sector. Corea (2019, a) lists the ways where machine learning could support VC investors, such as (a) help in spotting market gaps and general trends, (b) obtaining intelligence on competitors’ landscape, (c) creating more accurate pricing and valuation models, (d) identifying potential acquirers and finally, (e) help to find startups with high success potential. Overall, machine learning in the VC industry has the potential to help investors to make more informed decisions and automate them.

In their study, Tewari et al. (2020) propose the use of machine learning in three areas of VCs’ decision making, which overlaps with the above mentioned examples: automation of deal sourcing, automating market and competition research and determination of the probability of success of an investment opportunity based on the specific assessment parameters (discussed in chapter 1.1.6 Investment criteria of VC investors).

The researchers agree on the fact that the VC investment process is slow, expensive, innovation-lacking, and since relying on the human decision-making, often even biased (Corea, 2019, b; Tewari et al., 2020). The idea of aiding VC investors decision process by the use of ML represents uncovered area in the academic literature and deserves further attention. Furthermore, from the improvements in the decision-making process could benefit not only the venture capitalists but also the world of innovation by the increased chance for perspective and quality companies to get funded (Corea, 2019, b).

1.2.2. Prediction of business success

The prediction of startup success is the one use-case of ML in the VC industry, this thesis focuses on. Therefore, a review of previous research and methods applied to the prediction of success of mature companies, as well as startups, is presented in this section.

Predicting success or failure of a business have been topic of academics and researchers for decades because of its effect on many involved actors (such as shareholders, employees, or suppliers). However, the traditional statistical methods were not suitable for these predictions due to fairly restrictive assumptions and thus new methods, such as machine learning, became increasingly investigated in late 1990s (Daubie & Meskens, 2002). Although, Barasa’s (2007) literature survey results indicate that effectiveness of classical statistical models and intelligent models is more or less the same.

(27)

Comparison of effectiveness of machine learning algorithms used for prediction of business failures was conducted by Aktan (2011). On the sample of 180 production industry firms and their financial ratios taken from their annual reports five ML techniques were applied: Bayesian models, k-nearest neighbor, artificial neural networks, support vector machines and decision trees. The results indicate that decision trees method outperforms other models. On the other hand, Huang et al. (2008) utilized a hybrid financial analysis model including static and trend analysis models to construct and train neural networks. In this case, neural networks outperformed other models including decision trees.

One stream of research focused on the prediction of business failure from companies’

annual reports using text mining and machine learning methods. Qiu (2007) confirmed that predictive models can be successfully built using textual content of annual reports, adding value to the numerical estimates of financial performance. Support vector machines classification method was effective in catching the textual differences among firms with different financial characteristics. Hajek et al. (2014) estimated corporate financial performance using sentiment analysis in annual reports. Again, support vector machines provided the highest prediction accuracy and suggested the existence of non- linear relationship between the sentiment and financial performance.

More recently, the literature considering success prediction of early-staged companies – startups - emerged. The researchers’ concern here is how to define the startup success. In comparison to established businesses, early-stage startups do not generate profit and their historical financial data are unstable. Therefore, each researcher defined the startup success differently. For example, Huang (2016) defined startup success strictly as (i) startup that is acquired, (ii) IPO or (iii) valued at $1B or more (Unicorn). Scharchilev et al. (2018) consider startup success as ability to attract further round of investment after secured seed or angel funding. In the research of Xiang et al. (2012), acquisition is the sole criteria of success.

Furthermore, existing research on application of ML to the prediction of startups’ success might be divided into two groups according to the source and type of their data. First group focuses more on large datasets with structured data about startups, their characteristics and facts about their funding. Second group extract data from social networks, such as Twitter, and apply sentiment analysis techniques.

Considering the first group, the main source of data about startups are startup databases such as Crunchbase or AngelList (Huang, 2016; Krishna et al., 2016; Dellermann et al., 2017). The goal of the Krishna et al. (2016) research was to create a predictive model for startups and build it on key events involved at various stages in the life of a startup. The classification techniques including Random forests, Decision trees, Bayesian networks, Naive Bayes and Logistic regression were applied with mixed results, indicating that Random forest and logistic regression models shows the highest accuracy and precision.

(28)

Huang (2016) applied supervised learning binary classification where response label 1 indicated success of the startup. The methods used were Kernel Support Vector Machines, Adaptive boosting and Random forest where Random forest generally performed with the highest accuracy. Finally, Dellermann et al. (2017) propose to use so called hybrid intelligence method where the inputs from machine intelligence and collective intelligence are combined to predict whether startup will receive series A funding. This research suggested to not rely only on either machine or human prediction but to use combined approach.

Regarding the second group, the approach of researchers varies. Zhang et al. (2017) collected data on social engagement on Twitter or Facebook of companies that were actively fund-raising on AngelList (social platform connecting investors and entrepreneurs). They further used ML techniques including decision tree, support vector machines or k-nearest neighbors to predict the ability of a startup to successfully raise funding based on metrics such as number of tweets or new followers. The results show that active engagement on social media is highly correlated to crowdfunding success.

Saura et al. (2019) observed user generated content including #startups hashtags on the Twitter. They applied sentiment analysis with supervised vector machine algorithm and detected that the key topics that have positive impact on startup success are mentions of startup tools, technology-based startup or attitude of the founders.

(29)

2. Research problem and questions

Previous works summarized in the theoretical part of the thesis covered what is the decision-making process of venture capitalists (Tyebjee & Bruno, 1984; Fried & Hisrich, 1994) and how they make decisions about their future investments, including what criteria investors consider to evaluate the companies (e.g. Khanin et al., 2008; Kollmann &

Kuckertz, 2009; Dhochak & Sharma, 2016). Some of the works described challenges in the decision-making process and identified the high need for innovation and use of data- driven approach to support the decision process (Gompers et al. 2020; Corea, 2019, b).

At the same time, a stream of literature addressing the application of machine learning models to the prediction of startups’ success emerged recently (e.g. Krishna et al., 2016;

Dellermann et al., 2017). Naturally, the prediction of startup success is one of the most interesting use-cases of machine learning in the VC decision-making process, due to its potential large impact on the industry. However, the connection between the application of ML prediction of startup success and the VC industry where the outcomes of these methods are relevant the most is missing almost completely in the literature.

Therefore, the aim of the thesis is to approach this problem from the VCs’ perspective and build the prediction model with the purpose to be relevant for the VC investors.

In this context, the following research questions will be addressed throughout the next chapters:

RQ1: Can ML models, build to support the decision-making process of VCs, provide a reliable prediction of the future success of startup companies?

RQ2: What are the key indicators of startup success, which VCs should consider when evaluating their prospects?

(30)

3. Research approach

Based on the understanding of venture capital industry and the nature of the dataset described in the next section, the research approach was defined. The idea behind reflects the venture capital point of view on startups’ success, thus, success of venture capital itself. When designing the research approach, inspiration was taken from the paper by Arroyo et al. (2019), where authors applied a time-aware analysis for startup success prediction linking the topic to the real-world setting.

The goal of all venture capitalists is to receive high returns on their investments which is measured by the sale of company’s shares through an IPO or an acquisition. These two kinds of exit represent success of a startup for the VC fund. Undoubtedly, the time prospects of the investments is another major concern of venture capitalist when choosing the investment. Venture capitalists are financial intermediaries with commitments to the limited partners, investors of VC funds, where limited partnerships are structured over the pre-defined time horizon, typically 10 years (Cumming & Johan, 2013). Therefore, the concern of venture capitalists is not only if the startup become successful but also when. The longer the exit times are, the higher is the risk that VC will not be able to generate returns for limited partners in a timely manner. Based on these facts, a unique definition of startup success (= target variable), relevant for the VC investors, was designed:

The successful investments are startups that managed to proceed from the series A funding to an acquisition or an IPO in less than 7 years from the initial investment (series A).

The definition consists of three crucial features:

• Starting point. The starting point of the “evaluation” period during which startup turns to be successful or not is the series A funding. It is the first stage of venture capital financing and thus the beginning of the potential venture capital investment in the startup.

• Period. The length of the period is defined to 7 years considering the usual duration of limited partnerships of venture capitalists and the average duration of exits. The median time from initial venture capital financing to IPO exit in the United States from the 2007 to 2018 is displayed in Figure 4.

• Success determinants. The ultimate startup success determinant is an acquisition or an IPO of the startup.

(31)

Figure 4: Median time from initial venture capital financing to IPO exit in the United States from 2007 to 2019 (in years)

Source: Statista, 2020

(32)

4. Data

Similarly to previous research, the source of data in this thesis is the crunchbase.com database. Crunchbase.com is one of the largest and the most comprehensive databases of startup ecosystem including data on companies, investors, key people or events. It is a community-based database, which on the one hand bears the risk of unverified inputs, but on the other hand, has grown into the giant source of valuable information acknowledged by the major players throughout the sectors. The raw dataset used in this thesis is a Daily CSV Export of crunchbase.com acquired in April 2020 via Academic Research access, received exclusively for the purpose of this research.

The full export consisted of 18 tables out of which 7 were used for the synthesis of the final dataset. The tables used were: organizations, people, degrees, jobs, funding_rounds, acquisitions and ipos.

In the next sections the data pre-processing including data selection, cleaning and transformation is described. Further, the main characteristics of the final dataset are presented in section Descriptive statistics.

4.1. Data pre-processing

Data pre-processing plays crucial role in all machine learning projects. The quality of data has a significant impact on the performance of models, as noisiness or irrelevancy of used data distorts the classification results (Kotsiantis et al., 2006). Due to the need for the large number of observations and thorough examination of variables in machine learning models, data pre-processing of crunchbase.com database took considerable amount of time but was an essential phase of this thesis.

The organizations table represented the core dataset of the final sample and in its initial form consisted of 943 216 observations of companies and 41 variables. All other tables were either linked directly to the organizations table by the organizations ID or through another table (e.g. jobs dataset → people dataset → organizations dataset). The three fundamental parts of data pre-processing are described bellow followed by the summary of transformation of variables into the final sets.

4.1.1. Data cleaning

The major concern of data cleaning process was the elimination of redundant data, missing values, duplicates, conflicting data and handling the outliers. In the very beginning, all redundant variables bearing no value or being too granular for the analysis were excluded from the organizations table, lowering the number of columns to about a half. These included variables such as legal_name (irrelevant) or address (high granularity).

Hlavní práce71987_kalt04.pdf, 0.9 MB Stáhnout

University of Economics, Prague

Master’s Thesis

2020 Tereza Kalendová

University of Economics, Prague Faculty of Business Administration

Title of the Master´s Thesis:

A Machine Learning Approach to Startup Success Prediction in the Context of

Venture Capital Industry

Author: Tereza Kalendová

Supervisor: Ing. Mgr. et Mgr. Štěpán Bahník, Ph.D.

D e c l a r a t i o n o f A u t h e n t i c i t y

I hereby declare that the Master´s Thesis presented herein is my own work, or fully and specifically acknowledged wherever adapted from

other sources. This work has not been published or submitted elsewhere for the requirement of a degree programme.

Prague, August 26, 2020 signature

Acknowledgments

Title of the Master´s Thesis:

A Machine Learning Approach to Startup Success Prediction in the Context of Venture Capital Industry

Abstract:

Key words:

Contents

List of Tables

List of Figures

Introduction

1. Literature review

1.1. Introduction to Startups and Venture Capital

1.2. Introduction to Machine Learning

2. Research problem and questions

3. Research approach

4. Data

4.1. Data pre-processing