Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating new observations for each value in id-variable

    Hi,
    I need some help expanding my data set.

    I hope this introduction is not too long.

    I have a dataset with monthly observation of people who receive different benefits and their labor market status between Jan2011 to Feb2018. The issue is that I only have observations for the months the person actually received a benefit or has a registered status. The number of monthly observations per individual is therefore not "balanced". I want to "balance" the data set so that I have the same number of observations per individual, and for the months there is no data on the individual, it should be missing. I want the data set to look something like this:

    For example for individual 1 I only have their status in jan-feb 2011 and april-july2012:
    ID Monthyear benefit workstatus
    1 2011m1 x y
    1 2011m2 x .
    1 2011m3 . .
    1 2011m4 . .
    1 2011m5 . .
    1 and so on
    1 2012m4 z q
    1 2012m5 z q
    1 2012m6 x q
    1 2012m7 z y
    1 2012m8 . .
    1 2012m9 . .
    1 and so on until
    1 2018m1 . .
    1 2018m2 . .
    2 and so on
    While now it looks like this:
    id monthyear benefit workstatus
    1 2011m1 x y
    1 2011m2 x .
    1 2012m4 z q
    1 2012m5 z q
    1 2012m6 x q
    1 2012m7 z y
    2

    How do I do this? (Its a large dataset with 240 000 individuals)

    Saliha
    Last edited by Saliha Amrani; 24 Sep 2018, 05:04.

  • #2
    The short answer is:
    Code:
    tsfill, full
    See also the guide for tsfill, including an example with monthly data: https://www.stata.com/manuals13/tstsfill.pdf

    It's important that your dates are proper Stata dates, though. In the future, please use dataex to post example data, so tat readers here immediately understand what your data is like. See more on how and why of dataex in the FAQ, section12: https://www.statalist.org/forums/help#stata

    Further: you might also think about what you would do with all those observations with missing data. Can you interpolate? Running a regression with an unbalanced panel (no observations for e.g., 2012m3) is the same as running the regression with that observation but with all missing values.

    Comment


    • #3
      I fully support Jorrit's advice. Posting data examples using -dataex- is crucial for getting quick an correct replies.

      I just want to add: The short answer, in my opinion, is
      Code:
      tsset id monthyear
      tsfill , full
      And this is the long answer, including a minimal working example:
      Code:
      clear
      input id monthyear str1(benefit) str1(workstatus)
      1     624     "x"     "y"
      1     625    "x"     ""
      1     626     "z"     "q"
      1     629     "z"     "q"
      2     625    "x"    ""        
      2     628    ""    "z"
      end
      format monthyear %tm
      list , sepby(id)
      
      tsset id monthyear
      tsfill , full
      list , sepby(id)
      Regards
      Bela

      Comment


      • #4
        thank you both! I used tsfill and it worked. There are two more issues I don't know the solution to. I also have som fixed characteristics like gender and municipality that become missing in the new observations. How do I "duplicate" or keep them when I use tsfill? And does it matter whether gender and municipality are string or numeric variables?

        AND: Is there an easy way to check whether an individual has moved, i.e. if one individual has different values in "municipality"?

        Thank you in advance!

        Regards,
        Saliha

        Comment


        • #5
          The missing values will be considered "breaks" in benefits or time away from labor force that i will calculate later.

          Comment


          • #6
            Please do follow the advice on dataex, also for follow-up questions. With a data example, your questions is a whole lot easier to answer.

            Some fake example data:
            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input float(id monthyear) str1(benefit workstatus) str9 municip str6 sex
            1 624 "x" "y" "" ""
            1 625 "x" ""  ""         ""    
            1 626 "z" "q" "sometown"         "male"    
            1 629 "z" "q" ""         ""    
            2 625 "x" ""  ""         ""    
            2 628 ""  "z" "othertown"         "female"   
            3 628 "x" ""  ""         ""    
            3 629 ""  "z" ""         "female"    
            end
            code to run:
            Code:
            format monthyear %tm
            tsset id monthyear
            tsfill , full
            bys id (municip): replace municip=municip[_N]
            bys id (sex): replace sex=sex[_N]
            Yields:
            Code:
            . list , sepby(id)
            
                 +---------------------------------------------------------+
                 | id   monthy~r   benefit   workst~s     municip      sex |
                 |---------------------------------------------------------|
              1. |  1     2012m2         x               sometown     male |
              2. |  1     2012m6         z          q    sometown     male |
              3. |  1     2012m4                         sometown     male |
              4. |  1     2012m1         x          y    sometown     male |
              5. |  1     2012m5                         sometown     male |
              6. |  1     2012m3         z          q    sometown     male |
                 |---------------------------------------------------------|
              7. |  2     2012m3                        othertown   female |
              8. |  2     2012m4                        othertown   female |
              9. |  2     2012m2         x              othertown   female |
             10. |  2     2012m1                        othertown   female |
             11. |  2     2012m6                        othertown   female |
             12. |  2     2012m5                    z   othertown   female |
                 |---------------------------------------------------------|
             13. |  3     2012m2                                    female |
             14. |  3     2012m4                                    female |
             15. |  3     2012m1                                    female |
             16. |  3     2012m5         x                          female |
             17. |  3     2012m3                                    female |
             18. |  3     2012m6                    z               female |
                 +---------------------------------------------------------+
            
            .

            Comment


            • #7
              Thank you for your response!

              Comment

              Working...
              X