Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Loop to fill missing values

    Hello,

    I am using a panel data set that has a unique firm id called 'firm_id'. However, this firm_id has some missing values. These missing values are all unique firms, so among the missing values no firm occurs twice. I want to fill those missing values by continuing to count from the current highest available firm_id.
    (max_firm_id = the max of firm_id)

    I wrote this loop:

    global i = max_firm_id + 1

    foreach x of varlist firm_id {
    replace `x' = $i if (`x' == .)
    global i = $i + 1
    }

    However, when I run this, all the missing values are replaced by the value of max_firm_id + 1. What I want is that only the first missing value gets the value of max_firm_id + 1, the second missing value is one higher, the third missing value is again one higher, and so on.

  • #2
    Try something like this.
    Code:
    sort firm_id // missing-valued firm IDs will go to the end
    replace firm_id = firm_id[_n-1] + 1 if missing(firm_id)

    Comment


    • #3
      Worked perfectly. Thank you Joseph!

      Comment


      • #4
        Here is an example:

        Code:
        . clear
        
        . input firm_id
        
               firm_id
          1. 10
          2. 12
          3. .
          4. 15
          5. 14
          6. 13
          7. .
          8. .
          9. end
        
        .
        . gen mis = missing(firm_id)
        
        . sum firm_id, meanonly
        
        . bys mis: replace firm_id  = r(max) + _n if mis
        (3 real changes made)
        
        . drop mis
        
        . list
        
             +---------+
             | firm_id |
             |---------|
          1. |      15 |
          2. |      10 |
          3. |      13 |
          4. |      12 |
          5. |      14 |
             |---------|
          6. |      16 |
          7. |      17 |
          8. |      18 |
             +---------+
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #5
          What is max_firm_id here? Is it a scalar? If it is a variable, then

          Code:
          global i = max_firm_id + 1
          will be evaluated using the value in the first observation (row, record, case)

          Code:
          global i = max_firm_id[1]  + 1
          which perhaps you don't mind as I guess (a) it is a variable (b) it holds a constant.

          Your loop seems based on a misunderstanding that

          Code:
          foreach x of varlist firm_id {
          
          
          }
          is a loop over observations; but not so, it is -- as the syntax makes explicit -- a loop over the variable (name) list that follows, which here contains just one variable name. That's perfectly legal. So, as you report your loop boils down to


          Code:
          replace firm_id = $i if firm_id == .
          and bumping up the global has no effect because the loop is executed only once.

          There is no data example here, but your need is clear.

          I would do this

          Code:
          su firm_id, meanonly
          local max = r(max)
          gsort -firm_id, mfirst
          replace firm_id = `max' + _n if missing(firm_id)
          which carries out your intended loop. The point is to sort missing values to the start of the dataset and then the observation number is available to do the work for you.

          EDIT: #3 to answer, but the explanation of why the loop did not work slowed me down. I like Joseph Coveney's answer more than my own!
          Last edited by Nick Cox; 29 Mar 2023, 02:19.

          Comment

          Working...
          X