I am facing a peculiar issue in Stata (18.1) where the number of observations being reported in the regression output is much larger than the number of observations in the dataset, as well as the actual number of observations being used in the estimation. I am using a member-level household survey dataset for India. For example, when I use -describe-, Stata reports I have 513,366 observations and 170 variables. I opened the data viewer and this reports the same. When running basic summary statistics, I also get the same number of observations.
I am pasting the data excerpt below of the key variables in the estimation:
However, when I run
in my dataset, I get a regression output that reports 274642225 observations. It however reports correctly that I have 83,939 clusters in hhid. I tried cutting down the number of variables and running a very simple version of this, but I am not quite sure where this number is coming from. Any ideas on what might be causing this?
Thanks in advance!
Anirudh
I am pasting the data excerpt below of the key variables in the estimation:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str9 hhid byte(per_serialno relation_head) int age float female long total_expenditure_edu_amt float lang_discrepancy byte whether_same_grade "100001201" 1 1 44 0 . 0 . "100001201" 2 2 42 1 . 0 . "100001201" 3 5 20 1 29460 1 0 "100001201" 4 5 18 1 51750 1 0 "100001202" 1 1 43 0 . 0 . "100001202" 2 2 38 1 . 0 . "100001202" 3 5 19 0 26940 1 0 "100001202" 4 7 65 0 . 0 . "100001202" 5 7 61 1 . 0 . "100001301" 1 1 32 0 . 0 . "100001301" 2 2 30 1 . 0 . "100001301" 3 5 5 0 6780 1 0 "100001302" 1 1 46 0 . 0 . "100001302" 2 2 40 1 . 0 . "100001302" 3 5 16 0 3060 1 0 "100001302" 4 5 14 1 2130 1 0 "100001303" 1 1 46 0 . 0 . "100001303" 2 2 40 1 . 0 . "100001303" 3 5 16 0 3060 1 0 "100001303" 4 5 14 1 2130 1 0 "100001304" 1 1 82 1 . 0 . "100001304" 2 3 48 0 . 0 . "100001304" 3 4 45 1 . 0 . "100001304" 4 6 10 0 9190 1 0 "100001304" 5 4 42 1 . 0 . "100001401" 1 1 58 0 . 0 . "100001401" 2 2 52 1 . 0 . "100001401" 3 5 29 0 . 0 . "100001402" 1 1 38 0 . 0 . "100001402" 2 2 34 1 . 0 . "100001402" 3 8 46 1 . 0 . "100011101" 1 1 65 0 . 0 . "100011101" 2 2 58 1 . 0 . "100011101" 3 5 25 0 . 0 . "100011101" 4 5 23 0 . 0 . "100011101" 5 5 21 0 . 0 . "100011101" 6 5 22 1 . 0 . "100011101" 7 5 19 1 7400 0 0 "100011201" 1 1 56 0 . 0 . "100011201" 2 2 54 1 . 0 . "100011201" 3 5 23 0 . 0 . "100011201" 4 5 21 0 13000 1 0 "100011201" 5 5 19 0 4470 0 0 "100011201" 6 5 16 0 3270 0 0 "100011201" 7 5 20 1 6370 0 0 "100011201" 8 5 12 1 580 0 0 "100011301" 1 1 36 0 . 0 . "100011301" 2 2 30 1 . 0 . "100011301" 3 5 12 0 3650 0 0 "100011301" 4 5 10 0 630 0 0 "100011301" 5 5 7 0 610 0 0 "100011302" 1 1 45 0 . 0 . "100011302" 2 2 40 1 . 0 . "100011302" 3 5 15 0 2190 0 0 "100011302" 4 5 13 1 460 0 0 "100011302" 5 8 40 0 . 0 . "100011303" 1 1 65 0 . 0 . "100011303" 2 2 60 1 . 0 . "100011303" 3 5 27 0 . 0 . "100011303" 4 5 22 0 . 0 . "100011303" 5 5 18 0 9550 1 0 "100011303" 6 5 12 0 660 0 0 "100011304" 1 1 45 0 . 0 . "100011304" 2 2 40 1 . 0 . "100011304" 3 5 20 0 8970 1 0 "100011304" 4 5 18 1 7460 0 0 "100011304" 5 5 14 1 680 0 0 "100011401" 1 1 45 0 . 0 . "100011401" 2 2 39 1 . 0 . "100011402" 1 1 60 0 . 0 . "100011402" 2 2 51 1 . 0 . "100011402" 3 5 21 1 . 0 . "100021201" 1 1 50 0 . 0 . "100021201" 2 2 45 1 . 0 . "100021201" 3 5 20 0 93000 1 0 "100021201" 4 7 95 0 . 0 . "100021202" 1 1 45 0 . 0 . "100021202" 2 2 42 1 . 0 . "100021202" 3 5 19 0 74000 1 0 "100021301" 1 1 42 0 . 0 . "100021301" 2 2 38 1 . 0 . "100021301" 3 5 12 0 32300 1 0 "100021301" 4 5 8 0 28800 1 0 "100021302" 1 1 39 0 . 0 . "100021302" 2 2 36 1 . 0 . "100021302" 3 5 18 0 . 0 . "100021302" 4 5 14 1 18700 0 0 "100021303" 1 1 42 0 . 0 . "100021303" 2 2 38 1 . 0 . "100021303" 3 5 8 1 26500 1 0 "100021303" 4 5 4 1 24000 1 0 "100021303" 5 5 2 0 . 0 . "100021304" 1 1 40 0 . 0 . "100021304" 2 2 38 1 . 0 . "100021304" 3 5 13 1 37000 1 0 "100021304" 4 5 9 1 33500 1 0 "100021304" 5 5 5 0 19000 1 0 "100021401" 1 1 60 1 . 0 . "100021401" 2 5 30 0 . 0 . "100021402" 1 1 40 0 . 0 . end label values lang_discrepancy disc label def disc 0 "No", modify label def disc 1 "Yes", modify label values whether_same_grade grade
Code:
reg whether_same_grade i.lang_discrepancy i.female age [fweight=rounded_weight], cluster(hhid)
Thanks in advance!
Anirudh
Comment