I am trying to replicate some of the results in (Krueger, 1999) to get better at econometrics. Treatment means being assigned to a small class, while not being treated means assignment to a regular class (However, there is a third type of class: regular size with teaching assistant, and I am not sure whether I should treat this the same as not being treated).

In order to study the impact of class size in learning outcomes, Krueger first proceeds to show the differences in sample means for several variables: free lunch (measure of socioeconomic class), White/Asian, Age in 1985, Attrition rate, class size and percentile score. He presents the sample mean for every variable broken down by class size, and then the joint P-value. For instance, he presents in one row the percentage of students with free lunch in small classes, in the next column the mean for students with free lunch in regular classes and then the same for regular classes with teaching assistant. The Joint p-value is then testing whether the difference between the number of students with the benefit of free lunch is statistically different across class size. My problem is, where is that p-value coming from?

I realized that, since we want to see the effect on the treatment variable, I have to create a variable for that, and then use regress on that variable and the variables that I am interested in, and then test the hypothesis that jointly, free lunch in small classes, regular and regular with TA does not differ. But I do not know how to finish it. My main confusion is that the variable that indicates the class size is just one, cltypek, taking 3 different values. Should I interact each variable with cltypek? Is this a t test or an F test? Can I obtain the p values for all 6 rows with just one regression, or do I have to run one for every variable. The treatment variable I created is treat_k, and here is what I did

Hopefully you can help me to understand

In order to study the impact of class size in learning outcomes, Krueger first proceeds to show the differences in sample means for several variables: free lunch (measure of socioeconomic class), White/Asian, Age in 1985, Attrition rate, class size and percentile score. He presents the sample mean for every variable broken down by class size, and then the joint P-value. For instance, he presents in one row the percentage of students with free lunch in small classes, in the next column the mean for students with free lunch in regular classes and then the same for regular classes with teaching assistant. The Joint p-value is then testing whether the difference between the number of students with the benefit of free lunch is statistically different across class size. My problem is, where is that p-value coming from?

I realized that, since we want to see the effect on the treatment variable, I have to create a variable for that, and then use regress on that variable and the variables that I am interested in, and then test the hypothesis that jointly, free lunch in small classes, regular and regular with TA does not differ. But I do not know how to finish it. My main confusion is that the variable that indicates the class size is just one, cltypek, taking 3 different values. Should I interact each variable with cltypek? Is this a t test or an F test? Can I obtain the p values for all 6 rows with just one regression, or do I have to run one for every variable. The treatment variable I created is treat_k, and here is what I did

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input float(treat_k freelunch_k white_or_asian age1985 attrition_k avg_perc_k csk) byte cltypek . . 0 6 . . 5273 . 1 0 1 5 0 57.8278 15 1 1 0 0 6 0 80.9352 17 1 . . 1 6 . . 5273 . 0 1 0 5 1 48.92318 22 3 . . 1 6 . . 5273 . . . 0 6 . . 5273 . . . 1 6 . . 5273 . . . 1 6 . . 5273 . . . 1 6 . . 5273 . 0 0 1 5 0 82.97334 22 2 1 1 1 5 0 65.025406 15 1 0 1 1 5 0 38.99766 24 2 0 1 0 5 1 6.663901 25 3 . . 1 5 . . 5273 . . . 1 5 . . 5273 . . . 0 5 . . 5273 . . . 0 6 . . 5273 . . . 0 5 . . 5273 . . . 0 5 . . 5273 . 0 0 1 5 0 71.85199 21 2 . . 1 6 . . 5273 . 1 0 0 5 0 54.26017 14 1 . . 1 6 . . 5273 . 1 0 1 5 0 81.48058 14 1 . . 1 6 . . 5273 . . . 1 5 . . 5273 . 0 0 1 5 0 46.6673 17 2 0 0 1 5 0 49.86032 23 3 0 1 0 5 1 . 24 2 . . 1 6 . . 5273 . 1 0 1 5 0 92.33137 14 1 1 0 1 5 0 51.51086 17 1 0 0 1 6 0 89.32625 22 3 0 0 1 5 1 58.13467 21 2 0 1 1 6 1 35.491154 23 3 0 1 0 5 1 71.690445 23 2 . . 1 6 . . 5273 . 1 0 1 5 0 43.22603 17 1 0 1 1 6 0 66.37458 22 2 . . 1 6 . . 5273 . 0 0 1 5 1 . 20 2 . . 1 6 . . 5273 . 0 0 1 5 1 38.99766 25 2 . . 0 5 . . 5273 . . . 0 . . . 5273 . 0 0 1 5 1 60.37717 24 3 1 1 1 5 0 89.28534 14 1 . . 1 6 . . 5273 . . . 1 5 . . 5273 . 0 1 0 5 1 33.207203 19 3 0 1 0 5 0 90.76884 25 3 0 0 0 5 1 60.87349 24 3 0 0 1 6 0 95.92397 22 2 0 0 1 5 0 27.713375 19 3 0 0 1 5 1 52.00714 23 2 . . 0 5 . . 5273 . . . 0 5 . . 5273 . . . 1 6 . . 5273 . 0 1 0 5 1 20.12069 26 2 0 0 1 5 1 57.15902 24 3 . . 1 7 . . 5273 . . . 0 6 . . 5273 . . . 1 6 . . 5273 . 1 0 1 5 0 22.93918 16 1 0 1 0 5 1 24.96125 24 2 . . 0 5 . . 5273 . 1 1 0 5 0 19.405693 16 1 . . 0 5 . . 5273 . . . 1 5 . . 5273 . 0 1 0 6 1 40.08408 27 3 . . 1 6 . . 5273 . . . 0 5 . . 5273 . . . 0 5 . . 5273 . . . 0 6 . . 5273 . 0 0 1 5 1 64.27664 24 2 1 1 1 5 0 87.63383 14 1 1 0 1 5 0 50.46646 13 1 . . 1 6 . . 5273 . 1 0 1 5 1 66.91418 13 1 1 0 1 6 0 94.0528 13 1 . . 0 6 . . 5273 . . . 1 5 . . 5273 . 0 0 1 5 0 71.02356 21 2 . . 1 6 . . 5273 . . . 1 5 . . 5273 . 0 1 0 6 0 20.411865 23 3 1 0 1 5 1 88.51144 14 1 . . 1 6 . . 5273 . . . 1 6 . . 5273 . 0 0 1 5 1 54.41739 22 3 . . 0 7 . . 5273 . 1 0 1 5 0 92.31004 14 1 . . 1 6 . . 5273 . . . 0 6 . . 5273 . . . 0 5 . . 5273 . 1 1 1 6 1 15.655228 14 1 0 0 0 5 0 92.45034 27 2 0 0 1 5 1 50.44516 22 3 1 0 1 5 0 69.302605 15 1 end label values cltypek cltypek label def cltypek 1 "small class", modify label def cltypek 2 "regular class", modify label def cltypek 3 "regular + aide class", modify

Code:

gen treat_k=. (11,598 missing values generated) . replace treat_k=0 if cltypek==2 | cltypek==3 (4,425 real changes made) . replace treat_k=1 if cltypek==1 (1,900 real changes made) . *Regress . reg treat_k freelunch_k white_or_asian age1985 attrition_k avg_perc_k csk Source | SS df MS Number of obs = 5,853 -------------+---------------------------------- F(6, 5846) = 1027.79 Model | 630.994112 6 105.165685 Prob > F = 0.0000 Residual | 598.175886 5,846 .102322252 R-squared = 0.5133 -------------+---------------------------------- Adj R-squared = 0.5129 Total | 1229.17 5,852 .21004272 Root MSE = .31988 ------------------------------------------------------------------------------ treat_k | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- freelunch_k | .0342283 .0096581 3.54 0.000 .0152948 .0531617 white_or_a~n | -.0261966 .0100654 -2.60 0.009 -.0459285 -.0064647 age1985 | .0030928 .0094153 0.33 0.743 -.0153648 .0215503 attrition_k | .0230936 .0088482 2.61 0.009 .0057479 .0404393 avg_perc_k | .0006209 .0001673 3.71 0.000 .0002928 .0009489 csk | -.0678052 .0008694 -77.99 0.000 -.0695094 -.0661009 _cons | 1.643794 .0539098 30.49 0.000 1.538111 1.749477 ------------------------------------------------------------------------------ .

## Comment