Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mixed or reg i.country i.year for repeated cross-section data

    Dear all,

    I am doing a time-series cross-sectional data from 4 waves and around 25 countries and I am using Stata 14. The dataset used is the International Social Survey Programme years 1988, 1994, 2002 and 2012. My main variable of interest is female hours worked per week (originally WRKHRS, for purpose of analysis generated work hours only for females, 0 if otherwise) and how are they affected by the benefit amount/presence in the country. First I had these benefits in the percentage of expenditure per GDP, but my supervisor told me to generate dummies, 0 for no benefit and 1 for the benefit, for all the different types I had. I have them both ways now. I have two parts of the research: first is a regression with female hours worked per week and the relationship with different types of benefits, the second part is focused on analyzing attitudes - support for traditional gender roles of men, comparing between countries.

    I want to do an individual level analysis (within respondents) on the effect based on education##benefit, marital status, attendance of religious services and presence of a child. On country level variables I have the benefits and Unemployment rates and labor force participation for men and women, total fertility rate and types of expenditure - public total, in-kind % of GDP, in cash % of GDP and real GDP forecast. I know its too much, I won't be using all of them, just letting you know what I have.

    I was planning to do a mixed command, starting with basic mixed femworkhours || countryid: , and build upon that, adding more lvl1 predictors and then lvl 2. However, I cannot declare it a panel data set because of repeated time values within the data set, so I set it xtset countryid (As i read somewhere in this forum it is an option for repeated cross-section data). Since this is my thesis, I asked my supervisor if I should use mixed or a simple reg with i.countryid i.wave, and he suggested to use reg with i.countryid i.year. Nevertheless, when I regress it does not seem that there is a significant but small country effect, and it comes out that the first part of the analysis ignores country and year effects. Could the problem be if I run a basic regression with fixed country and year effects I should use mean hours worked by country rather than individual level? I was browsing this forum and the internet and unfortunately could not find the answers I was looking for.

    Hence the question, what would you suggest to do with this data? The variable female work hours presented below looks like many observations are missing, but that is not the case since I run mdesc command and from the total sample 33% are missing (the values range from 0-80 hours worked per week). I hope this question is clear enough to understand, if not, please let me know where I can elaborate.

    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(femworkhours married incgroup fulltime parttime attend1) byte educ float(dbgrant drealfam dincmaint ddaycare dpleave dchildall wave countryid)
    0 0 1 0 0 1 3 0 1 0 0 0 0 2 1
    . 1 4 1 0 0 1 0 1 0 0 0 0 2 1
    . 1 5 1 0 0 2 0 1 0 0 0 0 2 1
    . 0 3 1 0 0 1 0 1 0 0 0 0 2 1
    . 1 4 1 0 0 1 0 1 0 0 0 0 2 1
    . 1 5 1 0 0 3 0 1 0 0 0 0 2 1
    . 0 5 1 0 0 3 0 1 0 0 0 0 2 1
    . 1 1 0 1 0 0 0 1 0 0 0 0 2 1
    . 1 4 0 1 0 1 0 1 0 0 0 0 2 1
    . 1 4 1 0 0 2 0 1 0 0 0 0 2 1
    . 1 4 1 0 0 1 0 1 0 0 0 0 2 1
    . 1 5 1 0 1 2 0 1 0 0 0 0 2 1
    . 1 4 1 0 0 2 0 1 0 0 0 0 2 1
    . 1 3 0 1 1 1 0 1 0 0 0 0 2 1
    . 1 4 1 0 0 2 0 1 0 0 0 0 2 1
    . 1 2 1 0 0 1 0 1 0 0 0 0 2 1
    . 1 5 1 0 0 1 0 1 0 0 0 0 2 1
    . 1 5 1 0 0 3 0 1 0 0 0 0 2 1
    . 1 1 1 0 0 1 0 1 0 0 0 0 2 1
    . 0 3 1 0 0 2 0 1 0 0 0 0 2 1
    . 1 4 1 0 0 1 0 1 0 0 0 0 2 1
    . 1 5 1 0 0 1 0 1 0 0 0 0 2 1
    . 1 4 1 0 0 1 0 1 0 0 0 0 2 1
    . 0 5 1 0 0 2 0 1 0 0 0 0 2 1
    . 1 5 1 0 0 1 0 1 0 0 0 0 2 1
    . 1 5 1 0 0 1 0 1 0 0 0 0 2 1
    . 1 1 0 0 1 1 0 1 0 0 0 0 2 1
    . 0 4 1 0 0 1 0 1 0 0 0 0 2 1
    . 1 5 1 0 1 2 0 1 0 0 0 0 2 1
    0 0 3 0 0 0 1 0 1 0 0 0 0 2 1
    . 0 5 1 0 0 3 0 1 0 0 0 0 2 1
    . 0 4 1 0 1 2 0 1 0 0 0 0 2 1
    . 1 5 1 0 1 1 0 1 0 0 0 0 2 1
    . 1 4 1 0 0 1 0 1 0 0 0 0 2 1
    . 1 5 1 0 1 2 0 1 0 0 0 0 2 1
    . 0 4 1 0 0 0 0 1 0 0 0 0 2 1
    . 1 4 1 0 1 1 0 1 0 0 0 0 2 1
    . 1 5 1 0 0 1 0 1 0 0 0 0 2 1
    . 0 4 1 0 0 2 0 1 0 0 0 0 2 1
    0 1 4 0 0 0 2 0 1 0 0 0 0 2 1
    . 1 4 1 0 0 1 0 1 0 0 0 0 2 1
    0 0 3 0 0 0 2 0 1 0 0 0 0 2 1
    . 0 4 1 0 1 1 0 1 0 0 0 0 2 1
    . 1 5 1 0 0 3 0 1 0 0 0 0 2 1
    . 1 4 1 0 0 3 0 1 0 0 0 0 2 1
    . 1 4 1 0 1 1 0 1 0 0 0 0 2 1
    . 1 3 1 0 0 1 0 1 0 0 0 0 2 1
    0 1 5 0 0 0 1 0 1 0 0 0 0 2 1
    . 1 5 1 0 0 2 0 1 0 0 0 0 2 1
    0 1 4 0 0 0 1 0 1 0 0 0 0 2 1
    . 1 5 1 0 1 3 0 1 0 0 0 0 2 1
    . 0 3 0 1 0 2 0 1 0 0 0 0 2 1
    . 1 3 0 0 0 1 0 1 0 0 0 0 2 1
    . 1 4 1 0 1 1 0 1 0 0 0 0 2 1
    0 1 1 0 0 0 1 0 1 0 0 0 0 2 1
    . 1 1 1 0 0 1 0 1 0 0 0 0 2 1
    . 1 5 1 0 0 1 0 1 0 0 0 0 2 1
    0 1 3 0 0 0 1 0 1 0 0 0 0 2 1
    . 1 5 1 0 0 2 0 1 0 0 0 0 2 1
    0 1 4 0 0 0 3 0 1 0 0 0 0 2 1
    . 1 5 1 0 1 3 0 1 0 0 0 0 2 1
    . 1 5 1 0 0 2 0 1 0 0 0 0 2 1
    . 1 1 1 0 0 2 0 1 0 0 0 0 2 1
    . 0 4 1 0 0 2 0 1 0 0 0 0 2 1
    . 0 4 1 0 0 1 0 1 0 0 0 0 2 1
    . 0 1 1 0 0 1 0 1 0 0 0 0 2 1
    . 1 5 1 0 0 1 0 1 0 0 0 0 2 1
    0 1 3 0 0 1 1 0 1 0 0 0 0 2 1
    . 1 4 1 0 0 1 0 1 0 0 0 0 2 1
    . 1 4 0 0 0 1 0 1 0 0 0 0 2 1
    . 1 5 1 0 0 1 0 1 0 0 0 0 2 1
    . 1 5 1 0 0 2 0 1 0 0 0 0 2 1
    . 1 4 1 0 0 1 0 1 0 0 0 0 2 1
    0 1 3 0 0 0 1 0 1 0 0 0 0 2 1
    . 1 3 1 0 0 1 0 1 0 0 0 0 2 1
    . 1 4 1 0 0 1 0 1 0 0 0 0 2 1
    . 1 4 1 0 0 1 0 1 0 0 0 0 2 1
    0 0 4 0 0 0 1 0 1 0 0 0 0 2 1
    . 1 4 1 0 0 2 0 1 0 0 0 0 2 1
    . 1 4 1 0 1 2 0 1 0 0 0 0 2 1
    . 1 4 1 0 0 3 0 1 0 0 0 0 2 1
    . 1 4 1 0 0 2 0 1 0 0 0 0 2 1
    . 0 5 1 0 0 3 0 1 0 0 0 0 2 1
    . 0 4 1 0 0 1 0 1 0 0 0 0 2 1
    . 1 4 1 0 0 2 0 1 0 0 0 0 2 1
    . 1 4 1 0 1 1 0 1 0 0 0 0 2 1
    . 1 5 1 0 0 1 0 1 0 0 0 0 2 1
    . 0 4 1 0 0 3 0 1 0 0 0 0 2 1
    . 1 5 1 0 0 2 0 1 0 0 0 0 2 1
    . 1 4 1 0 0 2 0 1 0 0 0 0 2 1
    . 1 4 1 0 0 2 0 1 0 0 0 0 2 1
    . 1 3 1 0 0 2 0 1 0 0 0 0 2 1
    . 1 4 0 1 0 1 0 1 0 0 0 0 2 1
    . 1 5 1 0 0 2 0 1 0 0 0 0 2 1
    . 0 1 1 0 0 2 0 1 0 0 0 0 2 1
    . 1 3 1 0 0 2 0 1 0 0 0 0 2 1
    . 1 4 0 1 0 1 0 1 0 0 0 0 2 1
    . 1 5 1 0 0 1 0 1 0 0 0 0 2 1
    0 0 4 0 0 0 1 0 1 0 0 0 0 2 1
    . 1 4 1 0 0 1 0 1 0 0 0 0 2 1
    end
    label values incgroup incgroup
    label def incgroup 1 "10%", modify
    label def incgroup 2 "25%", modify
    label def incgroup 3 "50%", modify
    label def incgroup 4 "75%", modify
    label def incgroup 5 "90%", modify
    label values fulltime employed
    label def employed 0 "not fulltime", modify
    label def employed 1 "fulltime", modify
    label values educ educ
    label def educ 0 "no education", modify
    label def educ 1 "primary/lower secondary", modify
    label def educ 2 "upper/post secondary", modify
    label def educ 3 "lower/upper tertiary", modify
    label values wave wave
    label def wave 2 "1994", modify
    label values countryid countryid
    label def countryid 1 "AU", modify
    [/CODE]




  • #2
    Beata.
    welcome to this forum.
    Just one step aside: when I inspect your regressand:
    Code:
    . codebook femworkhours
    
    ---------------------------------------------------------------------------------------------------------------------
    femworkhours                                                                                              (unlabeled)
    ---------------------------------------------------------------------------------------------------------------------
    
                      type:  numeric (float)
    
                     range:  [0,0]                        units:  1
             unique values:  1                        missing .:  87/100
    
                tabulation:  Freq.  Value
                                13  0
                                87  .
    The proportion of missing values (87%) is more than a bit worrisome.
    Hence, if this finding is real in your dataset, you should consider dealing with missing values, first.
    As you might be already aware of, Stata omits all the observations with missing values in any variable.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Dear Carlo,

      Thank you for your response, and you're right, it is worrying that it has so many missing values. However, when I mdesc this is what Stata shows me:

      mdesc femworkhours WRKHRS

      Variable | Missing Total Percent Missing
      ----------------+-----------------------------------------------
      femworkhours | 40,020 118,003 33.91
      WRKHRS | 55,263 118,003 46.83
      ----------------+-----------------------------------------------


      Now I am worried/interested why it shows such a large gap and why female work hours have less missing variables than general work hours of the sample. Thank you for bringing this to my attention, I will definitely go through again and see what is going on with the data.

      Comment

      Working...
      X