Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create a variable based on times of one value appeared in the other variable in panel data

    Hi Experts:

    I have a panel data looks like this:

    ID employment New Variable
    1 1 1
    1 1 1
    1 1 1
    1 2 0
    1 2 0
    1 1 1
    1 2 0
    2 1 1
    2 1 1
    2 2 2
    2 2 2
    2 2 2
    2 2 2

    I want to create a new variable that equals to 2 if "2" in the variable employment appears continuously from first time of being observed to the last time of being observed. If the number "2" only appears sporadically, and does not last to the last time of being observed. Then the new variable only records this as "0".

    Is there anyone who knows how to code this?

    Thank you in advance!

    Connie

  • #2
    Your example calculation of the new variable is inconsistent with your explanation of what your want. Your explanation calls for a variable that will take on the values 0 and 2, but your example includes many observations where it is 1. Moreover, you say that you want it to be 2 if employment remains 2 throughout once the first observation with 2 occurs. This is the case for ID 2 in your data, yet you have it as 1 in some of that person's observations.

    Since I don't understand your example, I'll just show you how to get a variable that does what you asked for in words. Perhaps you can take it from there.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(id employment wanted_cg)
    1 1 1
    1 1 1
    1 1 1
    1 2 0
    1 2 0
    1 1 1
    1 2 0
    2 1 1
    2 1 1
    2 2 2
    2 2 2
    2 2 2
    2 2 2
    end
    
    gen long obs_no = _n
    by id (obs_no), sort: gen byte two_in_two_out = sum((employment==2) != (employment[_n-1] ==2))
    by id (obs_no): gen byte wanted_cs = two_in_two_out[_N] == 1
    replace wanted_cs = 2*wanted_cs
    Note: The variable wanted_cs which I have calculated above does what you asked for in your explanation. The variable wanted_cg is what you showed as "new variable" in your example. I leave it to you to reconcile the difference between them.

    In the future, when showing data examples, please use the -dataex- command to do so, as I have done in this response. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      Presumably you have some kind of time variable too; if not that is surprising if not alarming.

      You don't appear to state all the rules which are

      2 for 2 if there was just one change to 2 before the end of the panel

      0 for 2 otherwise

      1 otherwise.

      Code:
      clear
      input id employment new
      1 1 1
      1 1 1
      1 1 1
      1 2 0
      1 2 0
      1 1 1
      1 2 0
      2 1 1
      2 1 1
      2 2 2
      2 2 2
      2 2 2
      2 2 2
      end
      
      sort id, stable
      by id : gen time = _n
      
      by id : gen changeto2 = sum(employ == 2 & employ[_n-1] != 2)
      
      by id : gen wanted = cond(employ == 2 & changeto2[_N] == 1, 2, employ == 1)
      
      list, sepby(id)
      
          +------------------------------------------------+
           | id   employ~t   new   time   change~2   wanted |
           |------------------------------------------------|
        1. |  1          1     1      1          0        1 |
        2. |  1          1     1      2          0        1 |
        3. |  1          1     1      3          0        1 |
        4. |  1          2     0      4          1        0 |
        5. |  1          2     0      5          1        0 |
        6. |  1          1     1      6          1        1 |
        7. |  1          2     0      7          2        0 |
           |------------------------------------------------|
        8. |  2          1     1      1          0        1 |
        9. |  2          1     1      2          0        1 |
       10. |  2          2     2      3          1        2 |
       11. |  2          2     2      4          1        2 |
       12. |  2          2     2      5          1        2 |
       13. |  2          2     2      6          1        2 |
           +------------------------------------------------+
      
      .
      EDIT: Clyde makes very similar comments. Teachers' t test may be applied at your discretion.

      Comment


      • #4
        Clyde and Nick: thank you so much for the fabulous coding!! They are just what I want . Will remember to use -dataex- next time.

        Comment


        • #5
          Hi Professor Clyde and Nick: regarding to my above post, I am wondering how to solve the missing variable in such case.

          Suppose I have below data. According to my definition, if employment=2 appears continuously to the end of the wave, even though there is missing value in between, it is still recoded as 2. This is the case in ID1. If there is missing at the last wave, but previous waves show employment=2 continuously, it is still recoded as 2. This is the case in ID2. However, if there is missing value in certain wave, and the last wave is employment=1, then previous employment=2 should be coded as 0. This is the case in ID3.

          [CODE]clear

          . input byte (id employment wanted)

          id employ~t wanted
          1. 1 1 1
          2. 1 2 2
          3. 1 2 2
          4. 1 . .
          5. 1 2 2
          6. 2 1 1
          7. 2 2 2
          8. 2 2 2
          9. 2 2 2
          10. 2 . .
          11. 3 1 1
          12. 3 2 0
          13. 3 2 0
          14. 3 . .
          15. 3 1 1
          /CODE]

          I am not sure how to revise the code you provided with missing variable.

          Thank you,

          Connie

          Comment


          • #6
            I'm confused. You start by talking about missing values, but in your example, you never change the missing values: you leave them missing in wanted. So I'll just ignore that part of the post.

            It seems you are concerned with recoding employment = 2 as employment = 0 if the final observation for a given id has employment = 1

            Code:
            clear
            input byte(id employt wanted)
            1 1 1
            1 2 2
            1 2 2
            1 . .
            1 2 2
            2 1 1
            2 2 2
            2 2 2
            2 2 2
            2 . .
            3 1 1
            3 2 0
            3 2 0
            3 . .
            3 1 1
            end
            
            gen long obs_no = _n
            
            by id (obs_no), sort: replace employt = 0 if employt == 2 & employt[_N] == 1
            
            // VERIFY RESULTS ARE AS DESIRED
            assert employt == wanted

            Comment

            Working...
            X