Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Filling in missing data with egen

    Hi,

    I calculated means (Y) for originator B. However, I would like to create a variable Z that fills in this mean (from originator B) for A and C as well within each month. My data looks as like this:
    Originator Date Y Z
    A 2005m1 . 1.5
    B 2005m1 1.5 1.5
    C 2005m1 . 1.5
    A 2006m1 . 1
    B 2006m1 1 1
    C 2006m1 . 1
    I think I need to use the egen command, but I can't figure out how to fill in the missing value.

    Any help is appreciated!


  • #2
    After 24 posts you should be feeling familiar with basic practices, which means (please) giving reproducible data examples using dataex (SSC) and CODE delimiters, as in most replies to you. Here are two ways to do it. mipolate would need to be installed before you can use it.

    Code:
    clear
    input str1 Originator   Date    Y
    A       2005    .       
    B       2005    1.5     
    C       2005    .       
    A       2006    .       
    B       2006    1       
    C       2006    .       
    end 
    
    egen mean = mean(Y), by(Date) 
    gen Y2 = Y 
    replace Y2 = mean if missing(Y) 
    
    * ssc desc mipolate 
    * ssc inst mipolate 
    mipolate Y Date, by(Date) groupwise gen(Y3) 
    
    l, sepby(Date)
    
         +------------------------------------------+
         | Origin~r   Date     Y   mean    Y2    Y3 |
         |------------------------------------------|
      1. |        A   2005     .    1.5   1.5   1.5 |
      2. |        B   2005   1.5    1.5   1.5   1.5 |
      3. |        C   2005     .    1.5   1.5   1.5 |
         |------------------------------------------|
      4. |        A   2006     .      1     1     1 |
      5. |        B   2006     1      1     1     1 |
      6. |        C   2006     .      1     1     1 |
         +------------------------------------------+

    Comment


    • #3
      Thank you Nick. I will use dataex in my following questions.

      Mipolate works good, but can you explain what the seby command does?

      Comment


      • #4
        Code:
        seby

        Comment


        • #5
          sepby() is an option of the list command (which can be abbreviated to l)

          Code:
          help l

          Comment


          • #6
            Originally posted by Nick Cox View Post
            After 24 posts you should be feeling familiar with basic practices, which means (please) giving reproducible data examples using dataex (SSC) and CODE delimiters, as in most replies to you. Here are two ways to do it. mipolate would need to be installed before you can use it.

            Code:
            clear
            input str1 Originator Date Y
            A 2005 .
            B 2005 1.5
            C 2005 .
            A 2006 .
            B 2006 1
            C 2006 .
            end
            
            egen mean = mean(Y), by(Date)
            gen Y2 = Y
            replace Y2 = mean if missing(Y)
            
            * ssc desc mipolate
            * ssc inst mipolate
            mipolate Y Date, by(Date) groupwise gen(Y3)
            
            l, sepby(Date)
            
            +------------------------------------------+
            | Origin~r Date Y mean Y2 Y3 |
            |------------------------------------------|
            1. | A 2005 . 1.5 1.5 1.5 |
            2. | B 2005 1.5 1.5 1.5 1.5 |
            3. | C 2005 . 1.5 1.5 1.5 |
            |------------------------------------------|
            4. | A 2006 . 1 1 1 |
            5. | B 2006 1 1 1 1 |
            6. | C 2006 . 1 1 1 |
            +------------------------------------------+
            Is there a workaround in mipolate (groupwise option) where different non-missing values of a var within groups can be ignored but that interpolation will take place for groups with just one distinct or multiple same non-missing values

            Comment


            • #7
              You'd need to identify beforehand which groups to use. Code might be something like

              Code:
              bysort group (x) : gen OK = (x == x[1]) | missing(x) 
              by group: egen allOK = min(OK)
              and then

              Code:
              mipolate .... if allOK

              Comment

              Working...
              X