Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • For a set of dummies: if a dummy has value 1 for an observation of an ID, all observations for that ID should have value 1 for that dummy

    Hi all,

    I am using Stata 14.2 on windows. My problem should not be very hard but I cannot seem to grasp what command exactly I should use.
    My data has a random amount of observations for every ID and for every observation it has a value 1 for one of the function dummies.
    What I would like is for every observation of an ID to have a value 1 for a dummy if any other observation within that ID group has a 1 for that dummy.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long ID float(woon bijeenkomst gemengd kantoor industrie winkel overig onderwijs sport)
    2010 0 0 1 0 0 0 0 0 0
    2012 0 0 1 0 0 0 0 0 0
    2012 1 0 0 0 0 0 0 0 0
    2012 1 0 0 0 0 0 0 0 0
    2017 0 1 0 0 0 0 0 0 0
    2017 1 0 0 0 0 0 0 0 0
    2017 1 0 0 0 0 0 0 0 0
    2017 1 0 0 0 0 0 0 0 0
    2017 1 0 0 0 0 0 0 0 0
    2033 0 0 1 0 0 0 0 0 0
    2037 0 0 1 0 0 0 0 0 0
    2047 0 0 1 0 0 0 0 0 0
    2052 0 0 1 0 0 0 0 0 0
    2062 0 0 1 0 0 0 0 0 0
    2062 0 0 1 0 0 0 0 0 0
    2062 0 0 1 0 0 0 0 0 0
    2062 0 0 0 0 0 1 0 0 0
    2062 1 0 0 0 0 0 0 0 0
    2079 0 0 1 0 0 0 0 0 0
    2080 0 0 1 0 0 0 0 0 0
    2081 0 0 0 0 1 0 0 0 0
    2081 1 0 0 0 0 0 0 0 0
    2084 0 0 1 0 0 0 0 0 0
    2084 1 0 0 0 0 0 0 0 0
    2088 0 0 1 0 0 0 0 0 0
    2088 1 0 0 0 0 0 0 0 0
    2091 0 0 1 0 0 0 0 0 0
    2096 0 0 0 0 0 0 1 0 0
    2096 0 0 0 0 0 0 1 0 0
    2096 0 0 0 0 0 0 1 0 0
    2096 0 0 0 0 0 0 1 0 0
    2096 0 0 0 0 0 0 1 0 0
    2096 0 0 0 0 0 0 1 0 0
    2096 0 0 0 0 0 0 1 0 0
    2096 0 0 0 0 0 0 1 0 0
    2096 1 0 0 0 0 0 0 0 0
    2096 1 0 0 0 0 0 0 0 0
    2096 1 0 0 0 0 0 0 0 0
    2096 1 0 0 0 0 0 0 0 0
    2096 1 0 0 0 0 0 0 0 0
    2096 1 0 0 0 0 0 0 0 0
    2096 1 0 0 0 0 0 0 0 0
    2096 1 0 0 0 0 0 0 0 0
    2096 1 0 0 0 0 0 0 0 0
    2096 1 0 0 0 0 0 0 0 0
    2096 1 0 0 0 0 0 0 0 0
    2096 1 0 0 0 0 0 0 0 0
    2096 1 0 0 0 0 0 0 0 0
    2096 1 0 0 0 0 0 0 0 0
    2096 1 0 0 0 0 0 0 0 0
    2096 1 0 0 0 0 0 0 0 0
    2096 1 0 0 0 0 0 0 0 0
    2096 1 0 0 0 0 0 0 0 0
    2096 1 0 0 0 0 0 0 0 0
    2096 1 0 0 0 0 0 0 0 0
    2096 1 0 0 0 0 0 0 0 0
    2096 1 0 0 0 0 0 0 0 0
    2096 1 0 0 0 0 0 0 0 0
    2096 1 0 0 0 0 0 0 0 0
    2096 1 0 0 0 0 0 0 0 0
    2096 1 0 0 0 0 0 0 0 0
    2096 1 0 0 0 0 0 0 0 0
    2096 1 0 0 0 0 0 0 0 0
    2096 1 0 0 0 0 0 0 0 0
    2096 1 0 0 0 0 0 0 0 0
    2096 1 0 0 0 0 0 0 0 0
    2096 1 0 0 0 0 0 0 0 0
    2097 0 0 1 0 0 0 0 0 0
    2097 0 0 1 0 0 0 0 0 0
    2101 0 0 0 0 0 0 1 0 0
    2101 0 0 0 0 0 0 1 0 0
    2101 1 0 0 0 0 0 0 0 0
    2101 1 0 0 0 0 0 0 0 0
    2101 1 0 0 0 0 0 0 0 0
    2101 1 0 0 0 0 0 0 0 0
    2101 1 0 0 0 0 0 0 0 0
    2101 1 0 0 0 0 0 0 0 0
    2101 1 0 0 0 0 0 0 0 0
    2101 1 0 0 0 0 0 0 0 0
    2101 1 0 0 0 0 0 0 0 0
    2101 1 0 0 0 0 0 0 0 0
    2110 0 0 1 0 0 0 0 0 0
    2146 0 0 1 0 0 0 0 0 0
    2146 1 0 0 0 0 0 0 0 0
    2150 0 0 1 0 0 0 0 0 0
    2150 1 0 0 0 0 0 0 0 0
    2150 1 0 0 0 0 0 0 0 0
    2150 1 0 0 0 0 0 0 0 0
    2150 1 0 0 0 0 0 0 0 0
    2150 1 0 0 0 0 0 0 0 0
    2150 1 0 0 0 0 0 0 0 0
    2160 0 0 1 0 0 0 0 0 0
    2163 0 0 1 0 0 0 0 0 0
    2165 0 0 1 0 0 0 0 0 0
    2169 0 0 1 0 0 0 0 0 0
    2190 0 0 1 0 0 0 0 0 0
    2197 0 0 1 0 0 0 0 0 0
    2197 1 0 0 0 0 0 0 0 0
    2197 1 0 0 0 0 0 0 0 0
    2203 0 0 1 0 0 0 0 0 0
    end
    So for example it should look like this for ID 2062:

    Code:
    2062 1 0 1 0 0 1 0 0 0
    2062 1 0 1 0 0 1 0 0 0
    2062 1 0 1 0 0 1 0 0 0
    2062 1 0 1 0 0 1 0 0 0
    2062 1 0 1 0 0 1 0 0 0
    I tried
    Code:
     bysort ID: replace woon = 1 if woon[_N] == 1
    and that copied a value of 1 to each woon dummy in the ID group, but when I tried to do that for the other functions it did not replace the values correctly.
    I have to admit I don't really understand how to apply [_n] and [_N], I thought that if I used the above command, it would read 'if woon[_N] == 1' as 'if for any observation for this ID the value of woon == 1 give all observations within this ID group a value of 1 for woon'.

    Thank you for reading,
    Lonneke

  • #2
    The subscript "_n" means the current observation, and the subscript "_N" means the last observation. With the "by" prefix these are taken within the by group - within each ID in your case. What you wrote is "for the last observation" rather than "for any observation".

    So what you need is to get any nonzero values to the end and then look at the last value.
    Code:
    bysort ID (woon): replace woon = 1 if woon[_N] == 1
    Putting it all together with your example data (thanks!) we have a general solution.
    Code:
    // need a variable that gives the current observation sequence
    generate seq = _n
    
    foreach v of varlist woon bijeenkomst gemengd kantoor industrie winkel overig onderwijs sport {
        quietly bysort ID (`v'): replace `v' = 1 if `v'[_N] == 1
        }
    
    // put in order by ID and then by original order within ID
    sort ID seq
    drop seq
    Code:
    . list if ID==2062, noobs
    
      +--------------------------------------------------------------------------------------------+
      |   ID   woon   bijeen~t   gemengd   kantoor   indust~e   winkel   overig   onderw~s   sport |
      |--------------------------------------------------------------------------------------------|
      | 2062      1          0         1         0          0        1        0          0       0 |
      | 2062      1          0         1         0          0        1        0          0       0 |
      | 2062      1          0         1         0          0        1        0          0       0 |
      | 2062      1          0         1         0          0        1        0          0       0 |
      | 2062      1          0         1         0          0        1        0          0       0 |
      +--------------------------------------------------------------------------------------------+

    Comment


    • #3
      Amazing, thank you so much.
      Maybe you could answer one more question: say, if I would want to do the same thing but for a numeric variable and than sum the values up, so now the situation would be like this:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input long ID float(sqrmeterwoon sqrmeterbijeen sqrmetergemengd sqrmeterkantoor sqrmeterindus sqrmeterwinkel sqrmeteroverig sqrmeteronderwijs sqrmetersports)
      2010 0 0 18 0 0 0 0 0 0
      2012 0 0 90 0 0 0 0 0 0
      2012 80 0 0 0 0 0 0 0 0
      2012 80 0 0 0 0 0 0 0 0
      2017 0 365 0 0 0 0 0 0 0
      2017 114 0 0 0 0 0 0 0 0
      2017 113 0 0 0 0 0 0 0 0
      2017 63 0 0 0 0 0 0 0 0
      2017 87 0 0 0 0 0 0 0 0
      2033 0 0 120 0 0 0 0 0 0
      2037 0 0 680 0 0 0 0 0 0
      2047 0 0 65 0 0 0 0 0 0
      2052 0 0 3427 0 0 0 0 0 0
      2062 0 0 230 0 0 0 0 0 0
      2062 0 0 135 0 0 0 0 0 0
      2062 0 0 85 0 0 0 0 0 0
      2062 0 0 0 0 0 170 0 0 0
      2062 170 0 0 0 0 0 0 0 0
      2079 0 0 15296 0 0 0 0 0 0
      2080 0 0 1975 0 0 0 0 0 0
      2081 0 0 0 0 54 0 0 0 0
      2081 90 0 0 0 0 0 0 0 0
      2084 0 0 105 0 0 0 0 0 0
      2084 106 0 0 0 0 0 0 0 0
      2088 0 0 40 0 0 0 0 0 0
      2088 50 0 0 0 0 0 0 0 0
      2091 0 0 136 0 0 0 0 0 0
      2096 0 0 0 0 0 0 23 0 0
      2096 0 0 0 0 0 0 18 0 0
      2096 0 0 0 0 0 0 18 0 0
      2096 0 0 0 0 0 0 18 0 0
      2096 0 0 0 0 0 0 18 0 0
      2096 0 0 0 0 0 0 18 0 0
      2096 0 0 0 0 0 0 18 0 0
      2096 0 0 0 0 0 0 18 0 0
      2096 86 0 0 0 0 0 0 0 0
      2096 74 0 0 0 0 0 0 0 0
      2096 86 0 0 0 0 0 0 0 0
      2096 86 0 0 0 0 0 0 0 0
      2096 86 0 0 0 0 0 0 0 0
      2096 86 0 0 0 0 0 0 0 0
      2096 74 0 0 0 0 0 0 0 0
      2096 86 0 0 0 0 0 0 0 0
      2096 74 0 0 0 0 0 0 0 0
      2096 86 0 0 0 0 0 0 0 0
      2096 86 0 0 0 0 0 0 0 0
      2096 86 0 0 0 0 0 0 0 0
      2096 74 0 0 0 0 0 0 0 0
      2096 86 0 0 0 0 0 0 0 0
      2096 86 0 0 0 0 0 0 0 0
      2096 74 0 0 0 0 0 0 0 0
      2096 74 0 0 0 0 0 0 0 0
      2096 86 0 0 0 0 0 0 0 0
      2096 74 0 0 0 0 0 0 0 0
      2096 74 0 0 0 0 0 0 0 0
      2096 74 0 0 0 0 0 0 0 0
      2096 86 0 0 0 0 0 0 0 0
      2096 74 0 0 0 0 0 0 0 0
      2096 74 0 0 0 0 0 0 0 0
      2096 86 0 0 0 0 0 0 0 0
      2096 74 0 0 0 0 0 0 0 0
      2096 74 0 0 0 0 0 0 0 0
      2096 74 0 0 0 0 0 0 0 0
      2096 74 0 0 0 0 0 0 0 0
      2096 86 0 0 0 0 0 0 0 0
      2096 74 0 0 0 0 0 0 0 0
      2096 86 0 0 0 0 0 0 0 0
      2097 0 0 115 0 0 0 0 0 0
      2097 0 0 447 0 0 0 0 0 0
      2101 0 0 0 0 0 0 18 0 0
      2101 0 0 0 0 0 0 18 0 0
      2101 55 0 0 0 0 0 0 0 0
      2101 83 0 0 0 0 0 0 0 0
      2101 83 0 0 0 0 0 0 0 0
      2101 83 0 0 0 0 0 0 0 0
      2101 83 0 0 0 0 0 0 0 0
      2101 83 0 0 0 0 0 0 0 0
      2101 55 0 0 0 0 0 0 0 0
      2101 83 0 0 0 0 0 0 0 0
      2101 55 0 0 0 0 0 0 0 0
      2101 83 0 0 0 0 0 0 0 0
      2110 0 0 79 0 0 0 0 0 0
      2146 0 0 187 0 0 0 0 0 0
      2146 210 0 0 0 0 0 0 0 0
      2150 0 0 210 0 0 0 0 0 0
      2150 29 0 0 0 0 0 0 0 0
      2150 30 0 0 0 0 0 0 0 0
      2150 28 0 0 0 0 0 0 0 0
      2150 42 0 0 0 0 0 0 0 0
      2150 31 0 0 0 0 0 0 0 0
      2150 23 0 0 0 0 0 0 0 0
      2160 0 0 1413 0 0 0 0 0 0
      2163 0 0 4839 0 0 0 0 0 0
      2165 0 0 1063 0 0 0 0 0 0
      2169 0 0 13627 0 0 0 0 0 0
      2190 0 0 18 0 0 0 0 0 0
      2197 0 0 105 0 0 0 0 0 0
      2197 97 0 0 0 0 0 0 0 0
      2197 97 0 0 0 0 0 0 0 0
      2203 0 0 76 0 0 0 0 0 0
      end

      so to take out ID 2062 again as an example, this is the original:

      Code:
      2062   0   0   230 0  0 0  0 0 0
      2062   0   0   135 0  0 0  0 0 0
      2062   0   0    85 0  0 0  0 0 0
      2062  0   0     0 0  0 170  0 0 0
      2062 170   0     0 0  0 0  0 0 0
      and this is what I would want it to look like if that is possible:

      Code:
      2062   170   0   450 0  0 170  0 0 0
      2062   170   0   450 0  0 170  0 0 0
      2062   170   0    450 0  0 170  0 0 0
      2062  170   0     450 0  0 170  0 0 0
      2062 170   0    450 0  0 170  0 0 0
      Would you use the same type of command you showed me, and could it be altered to sum up instead of assign values of 1?

      Comment


      • #4
        This approach will do what you want.
        Code:
        // need a variable that gives the current observation sequence
        generate seq = _n
        
        foreach v of varlist sqrmeterwoon sqrmeterbijeen sqrmetergemengd sqrmeterkantoor sqrmeterindus sqrmeterwinkel sqrmeteroverig sqrmeteronderwijs sqrmetersports {
            bysort ID: egen temp = total(`v')
            quietly replace `v' = temp
            drop temp
            }
        
        // put in order by ID and then by original order within ID
        sort ID seq
        drop seq
        Code:
        . list if ID==2062, noobs clean
        
              ID   sqrme~on   sqrme~en   sqrmet~d   sqrmet~r   sqrme~us   sqrmet~l   sqrmet~g   sqrme~js   sqrme~ts  
            2062        170          0        450          0          0        170          0          0          0  
            2062        170          0        450          0          0        170          0          0          0  
            2062        170          0        450          0          0        170          0          0          0  
            2062        170          0        450          0          0        170          0          0          0  
            2062        170          0        450          0          0        170          0          0          0
        The technique is slightly different than the technique in post #2. I would have used it in post #2 but I wanted to instead to build on the approach you had. But the following would work equally well as a solution in post #2.
        Code:
        // need a variable that gives the current observation sequence
        generate seq = _n
        
        foreach v of varlist woon bijeenkomst gemengd kantoor industrie winkel overig onderwijs sport {
            bysort ID: egen temp = max(`v')
            quietly replace `v' = temp
            drop temp
            }
        
        // put in order by ID and then by original order within ID
        sort ID seq
        drop seq
        Code:
        . list if ID==2062, noobs clean
        
              ID   woon   bijeen~t   gemengd   kantoor   indust~e   winkel   overig   onderw~s   sport  
            2062      1          0         1         0          0        1        0          0       0  
            2062      1          0         1         0          0        1        0          0       0  
            2062      1          0         1         0          0        1        0          0       0  
            2062      1          0         1         0          0        1        0          0       0  
            2062      1          0         1         0          0        1        0          0       0
        The good thing about using egen is that it is a more general approach than the code in post #2, even if it does take 3 commands instead of 1.
        Last edited by William Lisowski; 20 Jun 2019, 14:38.

        Comment


        • #5
          Thank you so much, it worked perfectly.

          Comment

          Working...
          X