Mixed or reg i.country i.year for repeated cross-section data

Beata Berke

Join Date: Apr 2018

Posts: 2
#1

Mixed or reg i.country i.year for repeated cross-section data

31 May 2019, 02:27

Dear all,

I am doing a time-series cross-sectional data from 4 waves and around 25 countries and I am using Stata 14. The dataset used is the International Social Survey Programme years 1988, 1994, 2002 and 2012. My main variable of interest is female hours worked per week (originally WRKHRS, for purpose of analysis generated work hours only for females, 0 if otherwise) and how are they affected by the benefit amount/presence in the country. First I had these benefits in the percentage of expenditure per GDP, but my supervisor told me to generate dummies, 0 for no benefit and 1 for the benefit, for all the different types I had. I have them both ways now. I have two parts of the research: first is a regression with female hours worked per week and the relationship with different types of benefits, the second part is focused on analyzing attitudes - support for traditional gender roles of men, comparing between countries.

I want to do an individual level analysis (within respondents) on the effect based on education##benefit, marital status, attendance of religious services and presence of a child. On country level variables I have the benefits and Unemployment rates and labor force participation for men and women, total fertility rate and types of expenditure - public total, in-kind % of GDP, in cash % of GDP and real GDP forecast. I know its too much, I won't be using all of them, just letting you know what I have.

I was planning to do a mixed command, starting with basic mixed femworkhours || countryid: , and build upon that, adding more lvl1 predictors and then lvl 2. However, I cannot declare it a panel data set because of repeated time values within the data set, so I set it xtset countryid (As i read somewhere in this forum it is an option for repeated cross-section data). Since this is my thesis, I asked my supervisor if I should use mixed or a simple reg with i.countryid i.wave, and he suggested to use reg with i.countryid i.year. Nevertheless, when I regress it does not seem that there is a significant but small country effect, and it comes out that the first part of the analysis ignores country and year effects. Could the problem be if I run a basic regression with fixed country and year effects I should use mean hours worked by country rather than individual level? I was browsing this forum and the internet and unfortunately could not find the answers I was looking for.

Hence the question, what would you suggest to do with this data? The variable female work hours presented below looks like many observations are missing, but that is not the case since I run mdesc command and from the total sample 33% are missing (the values range from 0-80 hours worked per week). I hope this question is clear enough to understand, if not, please let me know where I can elaborate.

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(femworkhours married incgroup fulltime parttime attend1) byte educ float(dbgrant drealfam dincmaint ddaycare dpleave dchildall wave countryid)
0 0 1 0 0 1 3 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 2 0 1 0 0 0 0 2 1
. 0 3 1 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 3 0 1 0 0 0 0 2 1
. 0 5 1 0 0 3 0 1 0 0 0 0 2 1
. 1 1 0 1 0 0 0 1 0 0 0 0 2 1
. 1 4 0 1 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 1 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 1 3 0 1 1 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 1 2 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 3 0 1 0 0 0 0 2 1
. 1 1 1 0 0 1 0 1 0 0 0 0 2 1
. 0 3 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 0 5 1 0 0 2 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
. 1 1 0 0 1 1 0 1 0 0 0 0 2 1
. 0 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 1 2 0 1 0 0 0 0 2 1
0 0 3 0 0 0 1 0 1 0 0 0 0 2 1
. 0 5 1 0 0 3 0 1 0 0 0 0 2 1
. 0 4 1 0 1 2 0 1 0 0 0 0 2 1
. 1 5 1 0 1 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 1 2 0 1 0 0 0 0 2 1
. 0 4 1 0 0 0 0 1 0 0 0 0 2 1
. 1 4 1 0 1 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
. 0 4 1 0 0 2 0 1 0 0 0 0 2 1
0 1 4 0 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
0 0 3 0 0 0 2 0 1 0 0 0 0 2 1
. 0 4 1 0 1 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 3 0 1 0 0 0 0 2 1
. 1 4 1 0 0 3 0 1 0 0 0 0 2 1
. 1 4 1 0 1 1 0 1 0 0 0 0 2 1
. 1 3 1 0 0 1 0 1 0 0 0 0 2 1
0 1 5 0 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 2 0 1 0 0 0 0 2 1
0 1 4 0 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 1 3 0 1 0 0 0 0 2 1
. 0 3 0 1 0 2 0 1 0 0 0 0 2 1
. 1 3 0 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 1 1 0 1 0 0 0 0 2 1
0 1 1 0 0 0 1 0 1 0 0 0 0 2 1
. 1 1 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
0 1 3 0 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 2 0 1 0 0 0 0 2 1
0 1 4 0 0 0 3 0 1 0 0 0 0 2 1
. 1 5 1 0 1 3 0 1 0 0 0 0 2 1
. 1 5 1 0 0 2 0 1 0 0 0 0 2 1
. 1 1 1 0 0 2 0 1 0 0 0 0 2 1
. 0 4 1 0 0 2 0 1 0 0 0 0 2 1
. 0 4 1 0 0 1 0 1 0 0 0 0 2 1
. 0 1 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
0 1 3 0 0 1 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 4 0 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
0 1 3 0 0 0 1 0 1 0 0 0 0 2 1
. 1 3 1 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
0 0 4 0 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 1 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 3 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 0 5 1 0 0 3 0 1 0 0 0 0 2 1
. 0 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 1 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
. 0 4 1 0 0 3 0 1 0 0 0 0 2 1
. 1 5 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 1 3 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 0 1 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 2 0 1 0 0 0 0 2 1
. 0 1 1 0 0 2 0 1 0 0 0 0 2 1
. 1 3 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 0 1 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
0 0 4 0 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
end
label values incgroup incgroup
label def incgroup 1 "10%", modify
label def incgroup 2 "25%", modify
label def incgroup 3 "50%", modify
label def incgroup 4 "75%", modify
label def incgroup 5 "90%", modify
label values fulltime employed
label def employed 0 "not fulltime", modify
label def employed 1 "fulltime", modify
label values educ educ
label def educ 0 "no education", modify
label def educ 1 "primary/lower secondary", modify
label def educ 2 "upper/post secondary", modify
label def educ 3 "lower/upper tertiary", modify
label values wave wave
label def wave 2 "1994", modify
label values countryid countryid
label def countryid 1 "AU", modify
[/CODE]
Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17700

31 May 2019, 04:14

Beata.
welcome to this forum.
Just one step aside: when I inspect your regressand:

Code:

. codebook femworkhours

---------------------------------------------------------------------------------------------------------------------
femworkhours                                                                                              (unlabeled)
---------------------------------------------------------------------------------------------------------------------

                  type:  numeric (float)

                 range:  [0,0]                        units:  1
         unique values:  1                        missing .:  87/100

            tabulation:  Freq.  Value
                            13  0
                            87  .

The proportion of missing values (87%) is more than a bit worrisome.
Hence, if this finding is real in your dataset, you should consider dealing with missing values, first.
As you might be already aware of, Stata omits all the observations with missing values in any variable.

Kind regards,
Carlo
(Stata 19.0)

Comment

Beata Berke

Join Date: Apr 2018

Posts: 2
#3

31 May 2019, 06:09

Dear Carlo,

Thank you for your response, and you're right, it is worrying that it has so many missing values. However, when I mdesc this is what Stata shows me:

mdesc femworkhours WRKHRS

Variable | Missing Total Percent Missing
----------------+-----------------------------------------------
femworkhours | 40,020 118,003 33.91
WRKHRS | 55,263 118,003 46.83
----------------+-----------------------------------------------

Now I am worried/interested why it shows such a large gap and why female work hours have less missing variables than general work hours of the sample. Thank you for bringing this to my attention, I will definitely go through again and see what is going on with the data.
Comment

Announcement

Mixed or reg i.country i.year for repeated cross-section data

Comment

Comment