• Nebyly nalezeny žádné výsledky

Program Evaluation and Cost-Benefit

N/A
N/A
Protected

Academic year: 2023

Podíl "Program Evaluation and Cost-Benefit"

Copied!
39
0
0

Načítání.... (zobrazit plný text nyní)

Fulltext

(1)

5IE475

Program Evaluation and Cost-Benefit Analysis

LECTURE 9

Regression Discontinuity Design

Klára Kalíšková

(2)

Readings for this week

Gertler, P. J., Martinez, S., Premand, P., Rawlings, L. B., & Vermeersch, C.

M. (2011). Impact evaluation in practice. World Bank Publications.

Blundell, Richard and Dias, Monica Costa (2008): Alternative Approaches to Evaluation in Empirical Microeconomics. IZA DP 3800

Lee, D. S. (2008). Randomized experiments from non-random selection in US House elections. Journal of Econometrics, 142(2), 675-697.

Matsudaira, J. D. (2008). Mandatory summer school and student achievement. Journal of Econometrics, 142(2), 829-850.

Van der Klaauw, W. (2002). Estimating the effect of financial aid offers on college enrollment: A Regression–Discontinuity Approach. International Economic Review, 43(4), 1249-1287.

Angrist, J. D., & Lavy, V. (1999). Using Maimonides' Rule to Estimate the Effect of Class Size on Scholastic Achievement. The Quarterly Journal of Economics, 114(2), 533-575.

(3)

REGRESSION DISCONTINUITY DESIGN

3

(4)

Regression Discontinuity Design

Principle

• Based on a special type of natural experiments - when the probability of participation in

program discontinuously changes with some continuous variable X (forcing variable)

• i.e., there is some clear threshold for eligibility

• Examples:

– Old-age benefit that is available only to people aged 65 and more

– Higher income tax for income above CZK 1200000 per year

– Test score threshold of 60% for passing the course

(5)

Regression Discontinuity Design

Example

• Old-age benefit that is only for people aged 65+ who do not work

• Estimate the impact of this benefit on employment probability

– Treatment = eligibility for the welfare benefit – Forcing variable = age

– Outcome = employment

• Probability of participation in treatment here

discontinuously changes around the 65 age threshold

– People who are aged 65 and more have positive

probability of receiving this benefit, those below 65 have probability of receiving it equal to 0

5

(6)

Regression Discontinuity Design

Principle

Any discontinuity of the outcome around threshold is evidence of the effect of the program

– If employment probability decreases from 0.6 to 0.3 at the 65 age threshold, this is evidence that the welfare benefit has a negative employment effect.

• Note: forcing variable (X) itself may be associated with outcome, but this association must be smooth

(continuous)

– Employment probability can change with age, but continuously - older people are less likely to work, but the probability of

working does not jump at the age of 65 (without the policy, the relationship must be smooth around this threshold)

– If the distontinuity is present, it is caused by the policy!

(7)

Regression Discontinuity Design

Estimation

• Regression Discontinuity Design (RDD) then compares

individuals (cities, firms, classes, regions, …) who are close to the threshold

Compare the outcomes of those who are just eligible

(eligible, but close to not being eligible) and those who are just not eligible (not eligible, but close to being eligible)

Example: to estimate the employment impact of the old-age benefit, compare employment probability of individuals aged 65 (just eligible) and employment probability of individuals aged 64 (just not eligible)

• Those who are just not eligible provide a good control group for those who are just eligible

These two groups should be similar in (un)observable characteristics and in the outcome (if there were no treatment)

7

(8)

Regression Discontinuity Design

Estimation

(9)

REGRESSION DISCONTINUITY:

SHARP DESIGN

(10)

RD – Sharp design

Definition

Probability of participation in program changes from 0 to 1 at the threshold

– People below (above) threshold have zero probability of

getting the treatment (no one participates in the program), while people above (below) threshold have probability

equal to 1 (all participate in the program)

– Participation depends only on X (forcing variable), it is NOT affected by individual decisions or other factors

• Estimate the treatment effect by a simple comparison of outcomes of those below and above the threshold!

• Potential problem:

– Manipulation -> Can people affect if they are below or

above the threshold?

(11)

RD – Sharp design

Example – Lee (2008)

• Estimate the effect of being an incumbent politician on the probability of winning next elections

Treatment = being incumbent (winning elections in year t-1) Outcome = probability of winning the elections in year t

Theory: incumbents may use resources and privileges of the office to gain advantage in the next election

• Can we simply compare winning chances of incumbents and opponents?

No, incumbents already won before and are thus potentially better/more popular than opponents

• We need a better control group:

Compare candidates who just won and candidates who just lost the previous elections (in year t-1) –> those close to the election threshold Forcing variable = vote share margin (how close to winning a candidate

was in t-1)

11

(12)

RD – Sharp design

Example – Lee (2008)

• Estimate the effect of being an incumbent politician on the probability of winning next elections

Treatment = being incumbent (winning elections in year t-1) Outcome = probability of winning the elections in year t

Theory: incumbents may use resources and privileges of the office to gain advantage in the next election

• Can we simply compare winning chances of incumbents and opponents?

No, incumbents already won before and are thus potentially better/more popular than opponents

• We need a better control group:

Compare candidates who just won and candidates who just lost the previous elections (in year t-1) –> those close to the election threshold Forcing variable = vote share margin (how close to winning a candidate

was in t-1)

(13)

RD – Sharp design

Example – Lee (2008)

• Regression discontinuity idea:

– In elections where the result is very tight, the candidate who just barely won is very similar to the candidate who just

barely lost in observable and unobservable characteristics

– But the one who barely won becomes an incumbent, while the other one does not

• It is a sharp design, because the probability of becoming an incumbent shifts from 0 to 1 at the election threshold

• Compare election results of the next elections of those who barely won and those who barely lost in the previous

elections

13

(14)

RD – Sharp design

Example – Lee (2008): Results

(15)

REGRESSION DISCONTINUITY:

FUZZY DESIGN

(16)

RD – Fuzzy design

Main idea

• Fuzzy design:

Forcing variable (X) is not the only factor that determines participation in program, there are other (potentially unobserved) factors

• Probability of participation does not jump from 0 to 1 at the threshold, but changes e.g. from 0 to 0.7 (or 0.6 to 0.2)

Example: unemployed below certain age threshold are eligible for a training program, but not all of them take advantage of it

Probability of participation in the training program jumps from 0.6 to 0 at the age threshold

• We again estimate the effect by a simple comparison of outcomes of those who are just above the threshold and those who are just below

Example: comparison of average unemployment rate of people just below the age threshold (just eligible) and average unemployment rate of people just above the age threshold (just not eligible)

(17)

RD – Fuzzy design

Difference between sharp and fuzzy design

• The difference between sharp and fuzzy

design is that in fuzzy design, we estimate the effect only for “compliers”

– Compliers are those who take advantage of the

program, because they are offered it, but would not take advantage of the program otherwise

– Example: Compliers are those who enroll to the

training program, because they are eligible, but would not seek training program it if they were not eligible (would not seek out the training if they were above the age threshold)

17

(18)

RD – Fuzzy design

Example: Matsudaira (2008)

• Effect of a remedial summer school on test scores in the next year

Remedial summer school mandatory if a student fails the end-year exams, have bad teacher assessment, or other problems

Most students who fail the end-year exam go to summer school, but not all of them (e.g. if they have a good teacher assessment) and some who do not fail go anyway (if they have a very bad teacher assessment)

Result of the year-end exam is not the only factor predicting who goes to the remedial summer school -> fuzzy design

Does the summer school help students in their further studies? Should we have this kind of policy?

Compare the next-year test scores of students just failing the year-end exam (treatment group) to those just passing it (control group)

The effect is estimated only on compliers

Those who are mandated to attend summer school, because of the test score below passing threshold (but they would not be mandated to attend it if they had the test score just above threshold)

(19)

RD – Fuzzy design

Example: Matsudaira (2008)

• Effect of a remedial summer school on test scores in the next year

Remedial summer school mandatory if a student fails the end-year exams, have bad teacher assessment, or other problems

Most students who fail the end-year exam go to summer school, but not all of them (e.g. if they have a good teacher assessment) and some who do not fail go anyway (if they have a very bad teacher assessment)

Result of the year-end exam is not the only factor predicting who goes to the remedial summer school -> fuzzy design

Does the summer school help students in their further studies? Should we have this kind of policy?

Compare the next-year test scores of students just failing the year-end exam (treatment group) to those just passing it (control group)

The effect is estimated only on compliers

Those who are mandated to attend summer school, because of the test score below passing threshold (but they would not be mandated to attend it if they had the test score just above threshold)

19

(20)

RD – Fuzzy design

Example: Van der Klaauw (2002)

• Effect of financial aid on college admission acceptance

– Colleges often provide financial aid to the best students to motivate them to go to this college – does this monetary incentive work?

– Can we simply compare students who are offered aid and those who are not?

Students with financial aid are usually smarter and thus have better outside option from other schools as well

– SAT scores determine eligibility for financial aid

Students above the threshold for SAT scores have a much higher probability of receiving a financial aid than those right below the threshold (but there are other factors -> fuzzy design)

– Compare college admission acceptance of students who are just below and just above the SAT threshold for aid eligibility

They are similar in observable and unobservable characteristics, but they have very different probability of receiving the aid

(21)

RD – Fuzzy design

Example: Van der Klaauw (2002)

• Effect of financial aid on college admission acceptance

– Colleges often provide financial aid to the best students to motivate them to go to this college – does this monetary incentive work?

– Can we simply compare students who are offered aid and those who are not?

Students with financial aid are usually smarter and thus have better outside option from other schools as well

– SAT scores determine eligibility for financial aid

Students above the threshold for SAT scores have a much higher probability of receiving a financial aid than those right below the threshold (but there are other factors -> fuzzy design)

– Compare college admission acceptance of students who are just below and just above the SAT threshold for aid eligibility

They are similar in observable and unobservable characteristics, but they have very different probability of receiving the aid

21

(22)

IMPLEMENTATION OF RD

(23)

RDD: Graphic analysis

1. Plot treatment vs. forcing variable: There should be a jump in the probability of treatment at the threshold of forcing variable.

• In sharp design, the treatment probability jumps from 0 to 1 at the threshold (there is nothing to check, we do not need to plot this).

• In fuzzy design, we need to check that the treatment

probability indeed changes discontinuously at the threshold (it should jump from 0 to 0.6 e.g.).

23

(24)

RDD: Graphic analysis

2. Plot outcome vs. forcing variable: There should be jump in the outcome at the threshold of forcing variable.

– This is the most important graph.

– If there is a jump in the outcome at the threshold, this suggests there is indeed an effect!

(25)

Graphic analysis example

2. Outcome vs. forcing variable

Lee (2008): relationship between vote share margin at time t (forcing

variable) and probability of winning at time t+1 (outcome) -> the probability of winning is indeed very different for those just above and just below the threshold -> evidence of the effect!

25

(26)

RDD: Graphic analysis

3. Plot characteristics vs. forcing variable: There should be no jump in characteristics at the threshold of the forcing variable.

– There should be no jumps in other variables - people just below and just above the threshold should be similar in characteristics (observable and unobservable).

(27)

Graphic analysis example

3. Characteristics by forcing variable

Lee (2008): relationship between vote share margin at time t (forcing variable) and number of past victories (characteristic) -> show that those below and above the threshold are similar in their past victories

27

(28)

RDD: Graphic analysis

4. Plot the density of forcing variable (histogram of forcing variable): There should be no clustering of people around the threshold.

– Clustering would indicate manipulation -> do not want that!

– If people are clustered just below the threshold (and the program is only for people with forcing variable below the threshold), it suggests they were able to manipulate their position to become eligible (e.g. to hide their income to become eligible for welfare benefit)

(29)

Graphic analysis example

4. Density of forcing variable

Lee and Lemieux (2010), based on Lee (2008): density of vote share margin – it is smooth around the threshold -> politicians cannot affect election results

29

(30)

RDD: Estimation

• Choose a window width h around the threshold (x*)

• Calculate average outcome:

– for those just below the threshold (for those with x below x*

and above x*- h/2)

– for those just above the threshold (for those with x above x*

and below x*+ h/2)

• Calculate the difference in average outcomes between those just below and just above the threshold -> this is the effect of the program

• Alternatively, use regression analysis (Kernel regression):

– Run regression of outcome on the dummy for being above the threshold (x>x*) and other control variables for those around the threshold

(31)

RDD: Estimation issues

1. Choice of bandwidth (h)

– How far away from the threshold should we look?

– The larger the bandwidth, the more data we have around the threshold.

– But with larger bandwidth, individuals in the sample are less similar (they differ more in the value of

forcing variable, but likely also in other characteristics)

2. Needs a lot of data in the neighborhood of x*

– If we have a lot of observations, we can choose a small bandwidth (with more similar individuals).

31

(32)

RD: Summary of issues

• Identifies only local effect restricted to those close to the threshold

• Once the design is fuzzy

– The discontinuity estimates apply only to compliers

– Unobserved factors can drive decision to

participate, so that we often do not know, who are the compliers

• If individuals can manipulate to which side of

threshold they belong, this is a problem!

(33)

EXAMPLE OF RD ANALYSIS

(34)

Example - Angrist and Lavy (1999)

Motivation

• Estimate effect of class size on test scores

– Do children in smaller classes learn more? By how much?

• Are students in small and big classes comparable? If we look at test scores of children in small classes and

compare them to those in large classes, do we get the impact of class size only?

Parents with better socioeconomic background push for smaller classes and also have smarter children

Weaker students might be put in smaller classes

We cannot simply compare test scores of students in small and big classes and assume that the difference is the impact of class size!

(35)

Example - Angrist and Lavy (1999)

Motivation

35

• Estimate effect of class size on test scores

– Do children in smaller classes learn more? By how much?

• Are students in small and big classes comparable? If we look at test scores of children in small classes and

compare them to those in large classes, do we get the impact of class size only?

Parents with better socioeconomic background push for smaller classes and also have smarter children

Weaker students might be put in smaller classes

We cannot simply compare test scores of students in small and big classes and assume that the difference is the impact of class size!

(36)

Example - Angrist and Lavy (1999)

Empirical strategy

➢ Need to find some more comparable groups of students, who

“by accident” attend classes of different sizes

Maimonides rules (Israel):

– Classes should have no more than 40 students

– School with 40 students in cohort can have just one class of 40 students

– School with 41 students in cohort have to have two classes of 20 and 21 students

➢ Compare students in big classes just below the threshold for Maimonides rules with students in small classes just above the threshold for Maimonides rules

There is many discontinuities they can use – 40/41, 80/81, … students in cohort

(37)

Example - Angrist and Lavy (1999)

Comparison of predicted and actual class size

• Rule is not followed strictly => fuzzy design

37

(38)

Example - Angrist and Lavy (1999)

Graphical analysis

Average test scores provide partly a mirror image of predicted class size -

> evidence for an effect of class size on test scores, but might be other factors

(39)

Example - Angrist and Lavy (1999)

RDD results

• Estimation using RDD (comparison of students in small classes just above the threshold for Maimonides rule and students in large classes just below the threshold) confirms negative effect of class size on test scores

39

Odkazy

Související dokumenty

Aims to: (i) estimate the threshold fish size above which the European Standard gillnet is ineffective for estimating fish community biomass and abundance using

For the liver cancer gene network under study, we obtain a strong threshold value at 0.67302, and a very strong correlation threshold at 0.80086.. On the basis of these

The Department of Forensic Medicine provides undergraduate teaching of forensic medicine for 5th year students (including foreign students) in the programs of General Medicine

In other words, we arranged lectures, seminars, and consultations scheduled for students, doctoral students, and all other applicants to start at the beginning of the summer semester,

The field of study regulations will also define conditions and higher support in justified cases (small schools and schools with only a few classes, small fields of study in

The first part is about various equivalent con- cepts for graphs such as positive threshold, threshold, uniquely realizable, degree-maximal, and shifted which arise in the literature

The dichotomic approach assumes that it is possible to draw a irm dividing line between discipline and indiscipline and clearly distinguish between various types of

Arranging production activities to fit in with other construction activities is one of the basic ideas of the Just-in-Time approach. In the construction industry it has never been