Synthetic Panel problem

Raza Jafri

Join Date: May 2017
Posts: 31

Synthetic Panel problem

06 Jul 2017, 10:08

Hi everyone, i have built a synthetic panel from repeated cross-section data. Data consist on seven rounds conducted after every two Year from 2004 to 2016. After i construct the relevent variables pertaining to hour_wage, education groups, cohort bin (cbin) and consumption. I collapse the data and run my program. The problem arises when i plot the data those who attained University appeard bellow the regression line means there is a negative change in income and consumption for them and the people from middle school are on top of regression line which is other way round. When i checked the data however, mean hour_wage for University cohorts is higher in comparison to intermediate and middle education category. I have checked the code from every aspect and unable to solve this riddle need your suggestions. Here is my initial code which i apply on every round to build up required variables.

Note: This code is just to give you idea that how i constructed the variables in all the rounds before i append.

Code:

**********************************************
*********Concerned Variables******************
**********************************************

***Household Characteristics***

rename sbq01 sex
rename sbq04 age
drop if age ==0

rename sbq02 rstatus //living in hh or temporarily moved out
rename sbq03 rwhead  //relationship with head of hh
rename sbq05 mstatus
rename seq10 type_work


//constructing number of childern, number of adults and adult equivalence scale 

bysort hhcode: egen hhsize=count(hhcode)

gen child = 1 if age < 14

replace child =0 if age > = 14

bys hhcode: egen num_childern = sum(child) 


bysort hhcode: gen num_adult = hhsize - num_childern 

gen aes= 1 + (num_adult - 1) * 0.5 + num_childern * 0.3 




//cleaning

replace mstatus=1 if mstatus==3   //1 means not married, 2 means married 
replace mstatus=1 if mstatus==4
replace mstatus=2 if mstatus==5


//replace rwhead=. if sex==2 & rwhead==1 | sex==2 & mstatus==1 // considering only male head hh, married Couples are included
//drop if rwhead==.

replace rwhead=11 if rwhead==0 
replace rwhead=12 if rwhead==9 

replace rwhead=9 if rwhead==8 //mother/father in law
replace rwhead=8 if rwhead==7 // daughter/son in law

keep if rwhead==1 & sex==1 & mstatus==2 //86,960  observations dropped here 



******Creating Age-Cohorts*********






drop if age > 50
drop if age < 25 | age == 25

gen year= 2004
gen cohort= year - age
summarize cohort, d


//i have used same defination for cohort across the waves means below listed coding is same for all rounds in terms of cbin and cohort.

recode cohort(1986/1990=1) (1981/1985=2) (1976/1980=3) (1971/1975=4) (1966/1970=5) (1961/1965=6) (1956/1960=7) (1951/1955=8), gen(cbin)

gen     c_age= 28 if cbin==1 
replace c_age=33 if cbin==2
replace c_age=38 if cbin==3 
replace c_age=43 if cbin==4
replace c_age=48 if cbin==5
replace c_age=53 if cbin==6
replace c_age=58 if cbin==7
replace c_age=63 if cbin==8




********Creating Education Groups*********

***Education***

rename scqo4 maxedu
rename scq05 ifstudent //if currently studying
drop if maxedu==19 //(dropped other education: only 22 observations deleted)



// "Junior Middle = 1" "Intermediate=2" "University=3" 


recode maxedu (min/8=1) (9/11=2) (12/max=3)if ifstudent==2, gen(edu_group)


drop if edu_group==.





********Labor Supply and Income********

keep if type_work > 1 //849 observations dropped here
   
rename seq02    selfempl  //if the respondents didn't work in last week, they are asked if they have any business, trade etc 

rename seq11    ifworked_money_m //if worked in the last month
rename seq12    days_worked
rename seq13    salary_monthly
rename seq14    working_months //last year
rename seq15    ifworked_money_y  //if worked last year
rename seq16    salary_yearly 


//drop if selfempl==1  //(only 12 observations deleted)


//Some of the people reported monthly income, whereas others reported yearly income. So we have to construct the measure for wage from both types.


keep if working_months > = 10 & days_worked > 20 // considering those who worked at least ten months in a year and 20 days in a month ***226 observations deleted here

gen wage_m = salary_monthly / days_worked if ifworked_money_m ==1

gen days_worked_y = days_worked * working_months


gen wage_y = salary_yearly / days_worked_y //if ifworked_money_y ==1

egen wage = rsum(wage_m wage_y)
replace wage=. if wage == 0


//gen annual_wage= days_worked*working_months*wage


gen hour_wage = wage /8     //assuming working day means working 8 hours a day
replace hour_wage=. if hour_wage== 0

drop if wage==.
drop if hour_wage==.

summarize hour_wage, d
replace hour_wage=. if hour_wage>r(p99) | hour_wage<r(p1) //59 observations dropped here 

******Expenditures******

//constructing adult equivalent consumption 

gen ae_consumption = consumption / aes 
summarize ae_consumption, d
replace ae_consumption=. if ae_consumption > r(p99) | ae_consumption <r(p1) // 68 observations dropped here


//egen id = group(cbin edu_group)

//tabstat ae_consumption wage_m if year == 2004, by(id) st(mean min p5 p25 p50 p75 p95 max)




************************************************************
*************************************
*************************
//Here we need to collapse the data in order to make synthetic pannel 



collapse (mean) ae_consumption hour_wage c_age, by (cbin edu_group)

gen year= 2004

The above-listed code is used for all the rounds. Now here comes my data after appending all 7 rounds.

[CODE]

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float cbin byte edu_group float(ae_consumption hour_wage c_age year)
3 1 1005.2929 20.011213 38 2004
3 2 1082.5514  29.47328 38 2004
3 3  1159.966  29.82299 38 2004
4 1  958.5524   24.2419 43 2004
4 2 1175.9354  29.41757 43 2004
4 3 1787.5577  47.94204 43 2004
5 1 1000.7888 25.253265 48 2004
5 2  1224.527 32.823265 48 2004
5 3 1624.9316  43.67646 48 2004
6 1  998.0583 28.098703 53 2004
6 2  1294.746 37.223385 53 2004
6 3  1704.804  52.70139 53 2004
7 1  984.4781  28.42537 58 2004
7 2 1279.9153   39.7994 58 2004
7 3  1801.728  57.02699 58 2004
8 1  1021.486  26.16772 63 2004
8 2 1284.8254  39.21839 63 2004
8 3 2012.6095 71.293304 63 2004
3 1  969.6302 19.816727 38 2006
3 2 1155.6779 25.147636 38 2006
3 3  1731.774  41.67512 38 2006
4 1  989.2782 24.987286 43 2006
4 2  1205.359  31.23556 43 2006
4 3 1902.7225  48.40184 43 2006
5 1 1012.4293  25.99362 48 2006
5 2 1260.4014 33.815296 48 2006
5 3 1813.4307  55.81765 48 2006
6 1  1052.937  27.97255 53 2006
6 2   1390.88  40.35061 53 2006
6 3 2079.8054    66.982 53 2006
7 1  1084.501  28.63389 58 2006
7 2 1427.3075  40.12785 58 2006
7 3 2112.4878  72.61096 58 2006
2 1  1427.588  24.52204 33 2008
2 2 1506.9424 32.635303 33 2008
2 3  2903.329  72.82051 33 2008
3 1 1281.8406  19.87822 38 2008
3 2  1626.253  31.83381 38 2008
3 3 2561.5964  46.18956 38 2008
4 1 1332.3922 25.203506 43 2008
4 2  1631.312   31.1539 43 2008
4 3 2836.3516  55.58306 43 2008
5 1 1393.9075  28.29908 48 2008
5 2  1824.202  42.16266 48 2008
5 3  2685.155  63.60912 48 2008
6 1 1492.4818  28.83083 53 2008
6 2 1802.4243  42.72336 53 2008
6 3  3258.335   67.1168 53 2008
7 1 1508.2616  29.27627 58 2008
7 2 1642.3346  35.59089 58 2008
7 3  3077.336  75.02588 58 2008
2 1 1963.9072  35.48345 33 2010
2 2 2196.9714  41.82792 33 2010
2 3  3176.809    58.639 33 2010
3 1 2040.6433  37.93051 38 2010
3 2  2369.518  49.28765 38 2010
3 3  3388.298  81.11602 38 2010
4 1 2143.0024  44.93072 43 2010
4 2 2681.7434  55.55556 43 2010
4 3  3898.507   92.3561 43 2010
5 1  2092.359  45.02077 48 2010
5 2  2871.833  66.70649 48 2010
5 3  3653.634   93.2951 48 2010
6 1 2165.3635  43.30079 53 2010
6 2  2725.326  63.10066 53 2010
6 3 4340.7153 120.56422 53 2010
7 1 2217.2205  39.75362 58 2010
7 2   2891.94  68.46322 58 2010
7 3  5330.741 140.80374 58 2010
1 1  2444.769  48.36358 28 2012
1 2 2476.9495   43.3796 28 2012
1 3  3232.415   96.3141 28 2012
2 1 2171.4912  42.25338 33 2012
2 2  2687.919  59.96129 33 2012
2 3  3611.756  94.42162 33 2012
3 1 2335.1243  51.99603 38 2012
3 2  2863.485   66.5342 38 2012
3 3 4216.1333  104.8941 38 2012
4 1 2424.7256  58.05062 43 2012
4 2  2990.466  73.66813 43 2012
4 3  4066.801  116.4934 43 2012
5 1 2514.4436  61.18449 48 2012
5 2 3121.2656  87.70039 48 2012
5 3 4224.4253 130.04353 48 2012
6 1  2548.613  63.65802 53 2012
6 2   3121.12  91.64032 53 2012
6 3 4410.4614 154.17195 53 2012
1 1  2502.627  50.04775 28 2014
1 2 3062.7764   60.1854 28 2014
1 3 4874.5405  99.27192 28 2014
2 1   2729.51  58.52513 33 2014
2 2  3297.106 71.947495 33 2014
2 3   4670.37 110.41965 33 2014
3 1 2804.0916  61.95866 38 2014
3 2  3333.801  80.93292 38 2014
3 3 4724.3315 127.44823 38 2014
4 1  2926.054  68.65212 43 2014
4 2  3665.392  96.27921 43 2014
4 3  4568.759 137.72145 43 2014
5 1 2979.3774 75.497215 48 2014
end

Here is the master code which can be used for above data and it will give you the graphs.

Code:

*************************************************
  *** CHOOSE THE INCOME AND CONSUMPTION MEASURE ***
  *************************************************
  
  
  local income_measure           hour_wage
  *local income_measure      hour_wage
  
  *local consumption_measure cosnumption
  local consumption_measure  ae_consumption
  
  
  keep if cbin ~=.  //cbin means Cohort bin (Age cohorts)
  
  capture program drop residualcy        
  program residualcy, eclass   //eclass stores the results of regression
        
        egen subgroup = group(cbin `1')    //(1 means argument 1 which is pertaining to edu_group 1 to 7)
        keep if year == `2' | year == `3'   //(2 and 3 are also argument e-g year 2004 , 2006 etc)
        bys year subgroup: egen m_c_group = mean(`4')  // (in order to make synthetic panel from repeated cross sections we need to generate subgroups in terms of means. Here it is pertaining to consumption)
        bys year subgroup: egen m_y_group = mean(`5')    //same as above but pertaining to income)
        keep year c_age m_c_group m_y_group subgroup cbin `1'
        duplicates drop
        gen lnm_c_group = ln(m_c_group)            //taking logs
        gen lnm_y_group = ln(m_y_group)        
        
        
        bys subgroup (year): gen d_c = lnm_c_group[2]-lnm_c_group[1] //subtracting log group means of two different years between same subgroups. like subtracting year 2006 from year 2004 for consumptiom
        bys subgroup (year): gen d_y = lnm_y_group[2]-lnm_y_group[1] //same as above but for income
        gen c_age2 = c_age*c_age  //in order to reduce age effect for those cohorts who were interviewed later in the survey (here we make age square)
        gen c_age3 = c_age2*c_age //age cube
        //keep if year == `3' 
        drop year m_c_group m_y_group lnm_c_group lnm_y_group
        duplicates drop
        reg d_c c_age c_age2 c_age3  //change in consumption on age (residual here is the risk effecting the consumption)
        predict eps_c, resid            
        reg d_y c_age c_age2 c_age3  //change in income on age (residual here is the risk means income shock)
        predict eps_y, resid 
        reg eps_c eps_y  // income shock is independent here and consumption shock is dependent here. So we can check the consumption insurance hypothesis.
  end 
        
 
 
 
 capture program drop adgraph2
  program adgraph2
        twoway  (scatter eps_c eps_y if `1' == 1 , mcolor(dknavy) msymbol(O))  /// 
                (scatter eps_c eps_y if `1' == 2 , mcolor(green) msymbol(o))  ///
                (scatter eps_c eps_y if `1' == 3 , mcolor(blue) msymbol(O))  ///
                (lfit eps_c eps_y, lpattern(solid) lcolor(black))  ///
             , ylabel(`10'(0.1)`11', labsize(small)) xlabel(`10'(0.1)`11',labsize(small) angle(vertical)) scheme(s1mono) xtitle("change in log disposable income",size(small)) ///
            ytitle("change in log consumption",size(small) angle(vertical))   ///
            legend(nobox symxsize(3) size(small) pos(12) row(3) region(fcolor(none))  ///
            order(1 "`2'" 2 "`3'" 3 "`4'" 4  "Slope `:di %4.3f _b[eps_y]' with s.e. `:di %4.3f _se[eps_y]'"))
        graph save "HIES Figures\ad-by-`1'-`5'-`6'-`8'-`9'-view1.gph", replace
           
        twoway  (scatter eps_c eps_y if cbin == 1, mcolor(blue)     msymbol(O))  ///
                (scatter eps_c eps_y if cbin == 2, mcolor(green)     msymbol(D))  ///
                (scatter eps_c eps_y if cbin == 3, mcolor(purple)     msymbol(T))  ///
                (scatter eps_c eps_y if cbin == 4, mcolor(magenta)     msymbol(S))  ///
                (scatter eps_c eps_y if cbin == 5, mcolor(red)         msymbol(+))  ///
                (scatter eps_c eps_y if cbin == 6, mcolor(brown)     msymbol(dh))  ///
                (scatter eps_c eps_y if cbin == 7, mcolor(gold)     msymbol(th))  ///
                (scatter eps_c eps_y if cbin == 8, mcolor(lavender) msymbol(sh))  ///
                (lfit eps_c eps_y, lpattern(solid) lcolor(black))  ///
            , ylabel(`10'(0.1)`11', labsize(small)) xlabel(`10'(0.1)`11',labsize(small) angle(vertical)) scheme(s1mono) xtitle("change in log disposable income",size(small)) ///
            ytitle("change in log consumption",size(small) angle(vertical))   ///
            legend(nobox symxsize(3) size(small) pos(12) row(3) region(fcolor(none)) order(1 "26-30" 2 "31-35" 3 "36-40" 4 "41-45" 5 "46-50" 6 "51-55" 7 "56-60" 8 "61-65" 9  "Slope `:di %4.3f _b[eps_y]' with s.e. `:di %4.3f _se[eps_y]'"))
        graph save "HIES Figures\ad-by-`1'-`5'-`6'-`8'-`9'-view2.gph", replace
        graph combine   "HIES Figures\ad-by-`1'-`5'-`6'-`8'-`9'-view1.gph"  ///
                        "HIES Figures\ad-by-`1'-`5'-`6'-`8'-`9'-view2.gph", col(2) scheme(s1mono) title("HIES `5' to `6' by `7'", size(small))
        graph save    "HIES Figures\ad-by-`1'-`5'-`6'-`8'-`9'.gph",replace
        graph export  "HIES Figures\ad-by-`1'-`5'-`6'-`8'-`9'.png",replace
        erase "HIES Figures\ad-by-`1'-`5'-`6'-`8'-`9'.gph"
        erase "HIES Figures\ad-by-`1'-`5'-`6'-`8'-`9'-view1.gph"
        erase "HIES Figures\ad-by-`1'-`5'-`6'-`8'-`9'-view2.gph"
  end  
 
  preserve
  keep if edu_group ~=.
  residualcy edu_group 2004 2006 `consumption_measure' `income_measure'
  adgraph2     edu_group "Junior Middle" "Intermediate" "University" 2004 2006 "Full Education Category" `consumption_measure' `income_measure' -0.2 0.2
  restore
  
  preserve
  keep if edu_group ~=.
  residualcy edu_group 2006 2008 `consumption_measure' `income_measure'
  adgraph2     edu_group "Junior Middle" "Intermediate" "University" 2006 2008 "Full Education Category" `consumption_measure' `income_measure' -0.2 0.2
  restore
  
  preserve
  keep if edu_group ~=.
  residualcy edu_group 2008 2010 `consumption_measure' `income_measure'
  adgraph2     edu_group "Junior Middle" "Intermediate" "University" 2008 2010 "Full Education Category" `consumption_measure' `income_measure' -0.2 0.2
  restore
  
  preserve
  keep if edu_group ~=.
  residualcy edu_group 2010 2012 `consumption_measure' `income_measure'
  adgraph2     edu_group "Junior Middle" "Intermediate" "University" 2010 2012 "Full Education Category" `consumption_measure' `income_measure' -0.2 0.2
  restore
  
  preserve
  keep if edu_group ~=.
  residualcy edu_group 2012 2014 `consumption_measure' `income_measure'
  adgraph2     edu_group "Junior Middle" "Intermediate" "University" 2012 2014 "Full Education Category" `consumption_measure' `income_measure' -0.2 0.2
  restore
  
  preserve
  keep if edu_group ~=.
  residualcy edu_group 2014 2016 `consumption_measure' `income_measure'
  adgraph2     edu_group "Junior Middle" "Intermediate" "University" 2014 2016 "Full Education Category" `consumption_measure' `income_measure' -0.2 0.2
  restore

If we look at the attached figure now problem is visiable, negative change in income and consumption for university people. why? when in data mean hour_income is higher for them before taking logs.

As you can see in the figure, i have a similar problem for 2010 to 2012 as well.

Attached Files

Tags: None

Jesse Wursten

Join Date: Jan 2016

Posts: 915
#2

07 Jul 2017, 07:21

The problem arises when i plot the data those who attained University appeard bellow the regression line means there is a negative change in income and consumption for them and the people from middle school are on top of regression line which is other way round.

This doesn't sound correct. Appearing below the regression line means their change in consumption is lower than you'd expect given their change in income, but is not related at all to the actual level of the change.
Comment
Raza Jafri

Join Date: May 2017

Posts: 31
#3

08 Jul 2017, 16:41

Sir can you please elaborate in the detail. Actually for rest of years the University group people are higher or above the regression line than intermediate and down below middle school people. In my understanding, it means a positive change in income & consumption for people having University Education. But in this case you can see the graph, we see a negative change in income and consumption for University people. Whereas, people with Middle education are above regression line, why the returns to education seems higher for people with less education. I checked the data is correct. In data, i see higher income for people with University education, in all the years. I don't know why i am having these result in this graph.
Comment
Raza Jafri

Join Date: May 2017

Posts: 31
#4

08 Jul 2017, 16:45

Originally posted by Jesse Wursten View Post

This doesn't sound correct. Appearing below the regression line means their change in consumption is lower than you'd expect given their change in income, but is not related at all to the actual level of the change.

I understand the lower change in consumption given their income, but why change in income is negative as well specially when data shows people with university education are actully earning more and consuming more.
Comment
Jesse Wursten

Join Date: Jan 2016

Posts: 915
#5

10 Jul 2017, 05:50

You need to be a lot more precise in your "more" statements. For example, only two out of five university observations have a negative change in income. Likewise, junior middle and intermediate both have three out of five with a negative change in income. This is the opposite of what you claim in #3.
Comment
SANOUSSI Yacobou

Join Date: Nov 2019

Posts: 5
#6

14 Nov 2019, 10:15

Dear all
I am working on the dynamics of health spending with a focus on individual catastrophic health spending. I intend to use household survey data to build a synthetic panel. For my work, I have practical difficulties in building the synthetic panel. If possible, I would like to have your technical support to be able to carry out this research work.
The data I want to use comes from household surveys on the Unified Questionnaire for Basic Indicators of Well-being (QUIBB). The data is collected in sections. there is section for households and individuals. I want to use these surveys for two years to build the synthetic panel and do a dynamic analysis of catastrophic health expenditures at the individual level not at cohort level.
Thank you
Comment
Jesse Wursten

Join Date: Jan 2016

Posts: 915
#7

15 Nov 2019, 03:22

Originally posted by SANOUSSI Yacobou View Post

Dear all
I am working on the dynamics of health spending with a focus on individual catastrophic health spending. I intend to use household survey data to build a synthetic panel. For my work, I have practical difficulties in building the synthetic panel. If possible, I would like to have your technical support to be able to carry out this research work.
The data I want to use comes from household surveys on the Unified Questionnaire for Basic Indicators of Well-being (QUIBB). The data is collected in sections. there is section for households and individuals. I want to use these surveys for two years to build the synthetic panel and do a dynamic analysis of catastrophic health expenditures at the individual level not at cohort level.
Thank you

I recommend you start a new thread for this question as there is no direct link to the topic at hand
Comment

Announcement

Synthetic Panel problem

Comment

Comment

Comment

Comment

Comment

Comment