Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Pick the first nonmissing (and non-zero) value and create new variable

    Hello all,

    I am using Stata/IC 12.1.

    I have a panel data set and I need to pick the first nonmissing (and non-zero) value of the leverage variable of firms. This value is to used as a new variable "initial leverage" for each firm.
    My data looks like:

    id year leverage
    1 1980 .
    1 1981 .
    1 1982 2
    2 1980 5
    2 1982 6
    2 1983 1
    3 1980 4
    3 1983 .
    3 1984 2

    The desired result would be:
    id inilev
    1 2
    1 2
    1 2
    2 5
    2 5
    2 5
    3 4
    3 4
    3 4


    Comments:
    - years may be different within group of firms, but they do not repeat
    - regardless of the magnitude, I need to get always the first nonmissing value for each firm
    - the series spams from 1950-200 for over 400 firms

    Previous attemps:
    Searching the archives I found similar codes, but was not able to adapt to my issue.
    - one case picked the minimum value
    Code:
     by id, sort: egen inilev = min(cond(!missing(leverage), leverage, .))
    Thank you so much,
    Clarice

  • #2
    I'm a relative newbie to Stata, so one of the tenured members will probably come up with a more elegant solution, but this seems to work for the data you provided.

    Code:
    // Generate some data to illustrate.
    // Include a case with leverage = 0.
    // Include one id with leverage missing completely.
    
    clear
    input id year leverage
    1 1980 .
    1 1981 .
    1 1982 2
    1 1982 0
    2 1980 5
    2 1982 6
    2 1983 1
    3 1980 4
    3 1983 .
    3 1984 2
    4 1980 .
    4 1981 .
    4 1982 .
    end
    
    save "C:\Temp\Original data.dta", replace
    recode leverage (0 = .), gen(inilev) // Copy of leverage with 0 as missing
    collapse (firstnm) inilev, by(id) // 1st non-missing value of inilev by ID
    merge 1:m id using "C:\Temp\Original data.dta"
    sort id year
    list
    OUTPUT:

    Code:
         +---------------------------------------------+
         | id   inilev   year   leverage        _merge |
         |---------------------------------------------|
      1. |  1        2   1980          .   matched (3) |
      2. |  1        2   1981          .   matched (3) |
      3. |  1        2   1982          0   matched (3) |
      4. |  1        2   1982          2   matched (3) |
      5. |  2        5   1980          5   matched (3) |
         |---------------------------------------------|
      6. |  2        5   1982          6   matched (3) |
      7. |  2        5   1983          1   matched (3) |
      8. |  3        4   1980          4   matched (3) |
      9. |  3        4   1983          .   matched (3) |
     10. |  3        4   1984          2   matched (3) |
         |---------------------------------------------|
     11. |  4        .   1980          .   matched (3) |
     12. |  4        .   1981          .   matched (3) |
     13. |  4        .   1982          .   matched (3) |
         +---------------------------------------------+
    --
    Bruce Weaver
    Email: [email protected]
    Version: Stata/MP 18.5 (Windows)

    Comment


    • #3
      See also a -first()- function for -egen- in -egenmore- (SSC).

      Comment


      • #4
        Thanks for the tip, Nick--I was hoping there was a method that did not require merging files. I've got it working, more or less, but not quite as efficiently as I hoped. Please see below, and thanks for any advice you can offer.

        Code:
        // Uncomment next line to install egenmore if necessary.
        // ssc inst egenmore, replace
        
        clear
        input id year leverage
        1 1980 .
        1 1981 .
        1 1982 2
        1 1982 0
        2 1980 5
        2 1982 6
        2 1983 1
        3 1980 4
        3 1983 .
        3 1984 2
        4 1980 .
        4 1981 .
        4 1982 .
        end
        
        recode leverage (0 = .), gen(levcopy) // Copy of leverage with 0 as missing
        egen inilev = first(levcopy), by(id)
        list
        
        // Why wasn't inilev written to all cases within ID?
        // Is there an option to make that happen?
        // Or do I have to something like the following?
        
        sort id inilev
        replace inilev = inilev[_n-1] if missing(inilev) & id == id[_n-1]
        sort id year
        list
        --
        Bruce Weaver
        Email: [email protected]
        Version: Stata/MP 18.5 (Windows)

        Comment


        • #5
          Avoid save, collapse, and merge, which slow you down and/or result in unnecessary I/O operations.

          Code:
          version 12.0
          
          clear
          input id year leverage
          1 1980 .
          1 1981 .
          1 1982 2
          2 1980 5
          2 1982 6
          2 1983 1
          3 1980 4
          3 1983 .
          3 1984 2
          4 2000 .
          4 2002 5
          4 2001 0
          end
          
          isid id year
          sort id year
          
          generate inilev=.
          tempvar x y
          by id: generate `x'=_n if (leverage>0 & !missing(leverage))
          by id: egen `y'=min(`x')
          by id: replace inilev=leverage[`y']
          
          list, sepby(id)
          Produces:
          Code:
               +-------------------------------+
               | id   year   leverage   inilev |
               |-------------------------------|
            1. |  1   1980          .        2 |
            2. |  1   1981          .        2 |
            3. |  1   1982          2        2 |
               |-------------------------------|
            4. |  2   1980          5        5 |
            5. |  2   1982          6        5 |
            6. |  2   1983          1        5 |
               |-------------------------------|
            7. |  3   1980          4        4 |
            8. |  3   1983          .        4 |
            9. |  3   1984          2        4 |
               |-------------------------------|
           10. |  4   2000          .        5 |
           11. |  4   2001          0        5 |
           12. |  4   2002          5        5 |
               +-------------------------------+

          Comment


          • #6
            Yet another way to get there:

            Code:
            clear
            input id year leverage
            1 1980 .
            1 1981 .
            1 1982 2
            1 1983 0
            2 1980 5
            2 1982 6
            2 1983 1
            3 1980 4
            3 1983 .
            3 1984 2
            4 1980 .
            4 1981 .
            4 1982 .
            end
            
            * the following should be true; make sure the data is sorted
            isid id year, sort
            
            * use a running sum to number observations with non-missing values
            by id: gen order = sum(!mi(leverage)) if !mi(leverage)
            
            * reorder and use the first non-missing
            sort id order
            by id: gen inilev = leverage[1]
            
            * return to original order
            sort id year

            Comment


            • #7
              Hello, everyone!

              Many thanks!

              Just some comments, in order to document results here:
              • Bruce, as a newbie to Stata, you are doing very well. Your first option worked.
              • the
                -first()- function for -egen-
                not avaliable for me (maybe b/c I am using Stata 12)
              • response #5 - works perfectly
              • response #6 - works perfectly as well
              Thanks to all!!

              I always learn so much with this group!!

              Comment


              • #8
                Originally posted by Sergiy Radyakin View Post
                Avoid save, collapse, and merge, which slow you down and/or result in unnecessary I/O operations.

                Code:
                version 12.0
                
                tempvar x y
                by id: generate `x'=_n if (leverage>0 & !missing(leverage))
                by id: egen `y'=min(`x')
                by id: replace inilev=leverage[`y']
                Ah, that's more like it. Thank you, Sergiy.
                --
                Bruce Weaver
                Email: [email protected]
                Version: Stata/MP 18.5 (Windows)

                Comment


                • #9
                  Clarice re: #7 -- egenmore is a user-written package that you need to download and install; in general, you do not need to know where it is from (use "search" or "findit") but in this case you do; you can install via
                  Code:
                  ssc install egenmore

                  Comment


                  • #10
                    By the way, this same thing can be done quite easily with SPSS, in case anyone is interested. For example:

                    Code:
                    MISSING VALUES leverage(0).
                    AGGREGATE
                      /OUTFILE=* MODE=ADDVARIABLES
                      /BREAK=id
                      /inilev=FIRST(leverage).
                    LIST.
                    
                    Output from LIST:
                    
                      id year leverage inilev
                     
                       1 1980      .       2
                       1 1981      .       2
                       1 1982      2       2
                       1 1982      0       2
                       2 1980      5       5
                       2 1982      6       5
                       2 1983      1       5
                       3 1980      4       4
                       3 1983      .       4
                       3 1984      2       4
                       4 1980      .       .
                       4 1981      .       .
                       4 1982      .       .
                     
                    Number of cases read:  13    Number of cases listed:  13
                    AGGREGATE is similar to collapse, but allows one to write the aggregated variables into the active dataset, as you might have inferred from the MODE=ADDVARIABLES. (One can also tack on OVERWRITE=YES if necessary.)

                    --
                    Bruce Weaver
                    Email: [email protected]
                    Version: Stata/MP 18.5 (Windows)

                    Comment

                    Working...
                    X