Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Questions about recoding in panel data

    Hi there! I have a problem in recoding the panel data.
    As you can see below, this panel data describes the interactions of women’s employment history and their fertility. Employ-status is a category variable that shows the status of work &fertility four times. There are 1before 2across 3. after. And for 1before, the amount of variable “bef” was collcted. For 2 across, var “across” was collected. For 3 after, var “af” was collected.


    Click image for larger version

Name:	5555.png
Views:	1
Size:	37.3 KB
ID:	1692809


    I wanna recode these complicated data into some simple variables. You can see in the table. The rule is:
    a. generate variables ”employ_final”&”employ_bef” &”employ_af” &”employ_across”
    1. Among the data in four times, if all the valid data are “1before”, then the final var “employ_final” is “1before” and fill in the min of all valid of var “bef” into var “employ_bef” (see pid no.2).
    2. If all the valid data are “3after”, then the final var “employ_final” is “3after” and fill in the min of all valid of var “af” into var “employ_af” (see pid no.1).
    3. If there is one “2across” in the valid data in four times, the final var “employ_final” is “2across” and fill in the amount of var “across” of the “across” row into var “employ_across” (see pid no.5).
    4. In all valid data among four times, if the previous data(_n-1) is "after" and the next data(_n) is"before ", then the var”employ_final” is “4gap”. Meanwhile, the amount of var "af" corresponding to the status “af” row is filled into the “employ_af”. Similarly, the amount of var "bef" corresponding to the status “bef” row is filled into the “employ_bef” (e.g., pid no.4).

    b. And I also want to generate a variable called “employ_used” that represents the number of times that previous data was applied (e.g., for first woman, cuz all the status is “after”, and the min of “af” is 10 in forth time. So fill 4 into “employ_used”. And the same as the second&third&fifth women, they should fill 2 1 and 2 separately. And for the forth woman, just write 2 corresponding to “after”line is fine.)

    Sorry for the complicated questions....Any help will be appreciated.

    Regards,
    Catherine
    Last edited by Catherine Li; 09 Dec 2022, 19:59.

  • #2
    Try this:
    Code:
    //  ISOLATE THE OBSERVATIONS WHERE EMPLOY_STATUS IS NOT MISSING
    frame put pid employ_num employ_status af bef across ///
        if !missing(employ_status), into(working)
    
    frame change working
    //  RULES 1 & 2
    by pid (employ_status), sort: gen byte all_same = (employ_status[1] == employ_status[_N])
    gen employ_final = employ_status if all_same
    by pid (af), sort: gen employ_af = af[1] if all_same & employ_final == 3
    by pid (bef), sort: gen employ_bef = bef[1] if all_same & employ_final == 1
    
    //  RULE 3
    by pid: egen across_count = total(employ_status == 2)
    replace employ_status = 2 if across_count == 1
    by pid: egen employ_across = min(cond(employ_status == 2, across, .))
    
    //  RULE 4
    xtset pid employ_num
    by pid: egen byte gaps = total(employ_status == 3 & F1.employ_status == 1)
    replace employ_final = 4 if gaps == 1
    by pid (employ_num), sort: replace employ_af = ///
        max(cond(employ_status == 3, af, .)) if gaps == 1
    by pid (employ_num), sort: replace employ_bef = ///
        max(cond(employ_status == 1, bef, .)) if gaps == 1
    
    //  CREATE EMPLOY_USED
    by pid: gen employ_used = _N
    by pid: egen after_count = total(employ_status == 3)
    replace employ__used = after_count if gaps == 1
    
    //  REDUCE TO ONE OBSERVATION PER PID
    by pid: keep if _n == 1
    keep pid employ_final employ_af employ_bef employ_across employ_used
    
    frame change default
    frlink m:1 pid, frame(working)
    frget employ_*, from(working)
    This code is untested because the example data was provided as a screenshot, which is an unusable format. As such it may contain typos or other errors. Even the logic may be incorrect, because I had to try to visualize the changing data from beginning to end while writing the code. Even if it is wrong, it should point you in the right direction, and you probably can fix whatever problems it contains.

    If you need additional help with this, however, you must provide the example data by using the -dataex- command when you post back. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data. And in all your future posts, always use -dataex- to show example data when asking for help with code.

    Comment


    • #3
      Thank you Schechter! I tried your code but it seems be not right in the first line.. it shows the "if" not allowed. Below please see the -dataex- of me. Where is wrong?

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float pid byte employ_num float(af bef across employ_status)
      1 1 50  .   . 3
      1 2 30  .   . 3
      1 3 20  .   . 3
      1 4 10  .   . 3
      2 1  . 60   . 1
      2 2  .  5   . 1
      2 3  .  .   . .
      2 4  .  .   . .
      3 1  .  . 100 2
      3 2  .  .   . .
      3 3  .  .   . .
      3 4  .  .   . .
      4 1 10  .   . 3
      4 2  6  .   . 3
      4 3  .  2   . 1
      4 4  .  .   . .
      5 1  2  .   . 3
      5 2  .  .  50 2
      5 3  .  .   . .
      5 4  .  .   . .
      end
      label values employ_status employ2
      label def employ2 1 "before", modify
      label def employ2 2 "across", modify
      label def employ2 3 "after", modify

      Comment


      • #4
        did you run Clyde Schechter 's code from a do file? if not, you will always get an error message as it must be run from a do file (rather than interactively from the command line)

        Comment


        • #5
          Rich Goldstein HI, thank you for your reply! Yes... I run it from a do file...but it just shows "if not allowed...."

          Comment


          • #6
            I cannot reproduce your problem using Clyde's code with your example data.
            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input float pid byte employ_num float(af bef across employ_status)
            1 1 50  .   . 3
            1 2 30  .   . 3
            1 3 20  .   . 3
            1 4 10  .   . 3
            2 1  . 60   . 1
            2 2  .  5   . 1
            2 3  .  .   . .
            2 4  .  .   . .
            3 1  .  . 100 2
            3 2  .  .   . .
            3 3  .  .   . .
            3 4  .  .   . .
            4 1 10  .   . 3
            4 2  6  .   . 3
            4 3  .  2   . 1
            4 4  .  .   . .
            5 1  2  .   . 3
            5 2  .  .  50 2
            5 3  .  .   . .
            5 4  .  .   . .
            end
            label values employ_status employ2
            label def employ2 1 "before", modify
            label def employ2 2 "across", modify
            label def employ2 3 "after", modify
            
            //  ISOLATE THE OBSERVATIONS WHERE EMPLOY_STATUS IS NOT MISSING
            frame put pid employ_num employ_status af bef across ///
                if !missing(employ_status), into(working)
            Code:
            . do "/var/folders/xr/lm5ccr996k7dspxs35yqzyt80000gp/T//SD26186.000000"
            
            . * Example generated by -dataex-. To install: ssc install dataex
            . clear
            
            . input float pid byte employ_num float(af bef across employ_status)
            
                       pid  employ~m         af        bef     across  employ_~s
              1. 1 1 50  .   . 3
              2. 1 2 30  .   . 3
              3. 1 3 20  .   . 3
              4. 1 4 10  .   . 3
              5. 2 1  . 60   . 1
              6. 2 2  .  5   . 1
              7. 2 3  .  .   . .
              8. 2 4  .  .   . .
              9. 3 1  .  . 100 2
             10. 3 2  .  .   . .
             11. 3 3  .  .   . .
             12. 3 4  .  .   . .
             13. 4 1 10  .   . 3
             14. 4 2  6  .   . 3
             15. 4 3  .  2   . 1
             16. 4 4  .  .   . .
             17. 5 1  2  .   . 3
             18. 5 2  .  .  50 2
             19. 5 3  .  .   . .
             20. 5 4  .  .   . .
             21. end
            
            . label values employ_status employ2
            
            . label def employ2 1 "before", modify
            
            . label def employ2 2 "across", modify
            
            . label def employ2 3 "after", modify
            
            . 
            . //  ISOLATE THE OBSERVATIONS WHERE EMPLOY_STATUS IS NOT MISSING
            . frame put pid employ_num employ_status af bef across ///
            >     if !missing(employ_status), into(working)
            
            . 
            end of do-file
            
            .
            Please take a few moments to review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. It is particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using code delimiters [CODE] and [/CODE], and to use the dataex command to provide sample data, as described in section 12 of the FAQ.

            Comment


            • #7
              Let me make some supplements. The following is the final data I want to achieve and the example data(the preliminary one) is above.

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input float pid byte employ_num float(af bef across employ_status employ_final employ_af employ_bef employ_used employ_across)
              1 1 50  .   . 3 3 10 . 4   .
              1 2 30  .   . 3 3 10 . 4   .
              1 3 20  .   . 3 3 10 . 4   .
              1 4 10  .   . 3 3 10 . 4   .
              2 1  . 60   . 1 1  . 5 2   .
              2 2  .  5   . 1 1  . 5 2   .
              2 3  .  .   . . 1  . 5 2   .
              2 4  .  .   . . 1  . 5 2   .
              3 1  .  . 100 2 2  . . 1 100
              3 2  .  .   . . 2  . . 1 100
              3 3  .  .   . . 2  . . 1 100
              3 4  .  .   . . 2  . . 1 100
              4 1 10  .   . 3 4  6 2 2   .
              4 2  6  .   . 3 4  6 2 2   .
              4 3  .  2   . 1 4  6 2 2   .
              4 4  .  .   . . 4  6 2 2   .
              5 1  2  .   . 3 2  . . 2  50
              5 2  .  .  50 2 2  . . 2  50
              5 3  .  .   . . 2  . . 2  50
              5 4  .  .   . . 2  . . 2  50
              end
              label values employ_status employ2
              label def employ2 1 "before", modify
              label def employ2 2 "across", modify
              label def employ2 3 "after", modify
              label values employ_final employ_final_w
              label def employ_final_w 1 "before", modify
              label def employ_final_w 2 "across", modify
              label def employ_final_w 3 "after", modify
              label def employ_final_w 4 "gap", modify

              Comment


              • #8
                Thank you for the -dataex- example. The following code corrects errors found in #2 and produces results matching what you show in #7. It runs from a do-file without error messages.
                Code:
                //  ISOLATE THE OBSERVATIONS WHERE EMPLOY_STATUS IS NOT MISSING
                frame put pid employ_num employ_status af bef across ///
                    if !missing(employ_status), into(working)
                
                frame change working
                //  RULES 1 & 2
                by pid (employ_status), sort: gen byte all_same = (employ_status[1] == employ_status[_N])
                gen employ_final = employ_status if all_same
                by pid (af), sort: gen employ_af = af[1] if all_same & employ_final == 3
                by pid (bef), sort: gen employ_bef = bef[1] if all_same & employ_final == 1
                
                //  RULE 3
                by pid: egen across_count = total(employ_status == 2)
                replace employ_final = 2 if across_count == 1
                by pid: egen employ_across = min(cond(employ_status == 2, across, .))
                
                //  RULE 4
                xtset pid employ_num
                by pid (employ_num), sort: gen byte gaps = sum(employ_status == 3 & F1.employ_status == 1)
                by pid (gaps), sort: replace gaps = gaps[_N]
                replace employ_final = 4 if gaps == 1
                by pid (employ_num), sort: egen temp = ///
                    min(cond(employ_status == 3 & gaps == 1, af, .))
                replace employ_af = temp if gaps == 1
                drop temp
                by pid (employ_num), sort: egen temp = ///
                    min(cond(employ_status == 1 & gaps == 1, bef, .))
                replace employ_bef = temp if gaps == 1
                drop temp
                
                //  CREATE EMPLOY_USED
                by pid: gen employ_used = _N
                by pid: egen after_count = total(employ_status == 3)
                replace employ_used = after_count if gaps == 1
                
                //  REDUCE TO ONE OBSERVATION PER PID
                by pid: keep if _n == 1
                keep pid employ_final employ_af employ_bef employ_across employ_used
                
                frame change default
                frlink m:1 pid, frame(working)
                frget employ_*, from(working)

                Comment


                • #9
                  Clyde Schechter Now it works! Thank you very much fro your help!

                  Comment

                  Working...
                  X