Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fixed-effects analysis with spell data using sequences?

    I have a panel data set. My dependent variable (wage) is reported every wave. However, participants also report on their professional activities since the last interview, so I have several spells per person per wave. My Independent variable (participation in educational programs) is reported there. (The spells include of course more Information than that, but those are of no importance to my analysis.) I want to conduct a fixed-effects analysis, regressing wage on the participation.

    I have two problems:​

    First, how can I find out how often someone participated in the program during one wave? Because it would matter if someone participated once or twice.

    Second, how can I construct a time indicator? Right now, I have repeated unit-observation per wave and of course the treatment might change from spell to spell, but I dont have wage data for every spell, so I cannot use the spell indicator as a time variable..

    For the second problem I have thought of converting the spell data into sequences (probably using months, because the interviews are conducted anually), indicating with 0 and 1 where people participated or did not participated for every wave, then using an if-statement to construct a variable "participated yes/no" for every wave.

    Would that be reasonable or is there maybe a better way?
    Last edited by sladmin; 06 Feb 2018, 10:11. Reason: anonymize poster

  • #2
    On the basis of your description alone, I can't infer what your data structure looks like, nor do I quite understand what you want to do. I think it would help if you posted a small representative sample of the data you have. Please use the -dataex- command (which you can get by running -ssc install datatex-, then -help dataex- for instructions on how to use it) for that purpose. It would probably also be useful if you supplement that with a hand-worked example of what you would like the data set to look like so it will be ready for analysis.

    Comment


    • #3
      Dear Clyde,

      thank you very much for y​our response. Due to data restrictions I do not have any access to the data itself and can only access it through a virtual Stata machine. It also does not let me run neither the list nor the dataex command (and I am pretty sure I am not allowed to post anything). However I have a good idea of how it looks and I will try to recreate it here.
      id spell wage participation (1=yes) wave
      1 1 2400 0 1
      1 2 2400 1 1
      1 3 2500 0 2
      2 1 1800 0 1
      2 2 1900 1 2
      2 3 1900 1 2
      2 4 1900 0 3
      And this is what I would like it to look:
      id wage participation (1=yes) wave multiple participation (1=yes)
      1 2400 1 1 0
      1 2500 0 2 0
      2 1800 0 1 0
      2 1900 1 2 1
      2 1900 0 3 0
      Last edited by sladmin; 06 Feb 2018, 10:12. Reason: anonymize user

      Comment


      • #4
        Ok. It looks like you want to reduce to one observation per wave for each person (id), with participation set to 1 if any of the observations for that wave for that person was 1, and multiple participation set to 1 if more than one of those observations had participation = 1. In the example you posted, the wage variable appears to be constant across all observations for a given id within a given wave. The code below requires this to be true and verifies the assumption in its first command. If wage can vary within wave, then you need to clarify in what way the different values of wage can be combined (average, highest, lowest, something else?). This code also relies on the participation variable in the original data to always be 0 or 1--and this is also verified in the second command. So I think you want this:

        Code:
        // VERIFY WAVE IS CONSTANT WITHIN WAVE FOR EACH ID
        by id wave (wage), sort: assert wage[1] == wage[_N]
        // VERIFY PARTICIPATION IS ALWAYS 0/1
        assert inlist(participation, 0, 1)
        
        // AGGREGATE DATA TO LEVEL OF WAVE WITHIN ID
        collapse (first) wage (max) participation (sum) multiple_participation, by(id wave)
        
        // PATCH THE multiple_participation VARIABLE
        replace multiple_participation = (multiple_participation > 1)

        Comment


        • #5
          Thank you very much! It worked perfectly.

          Comment

          Working...
          X