Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • long format delete same observatiosn per id

    Hi,

    I have a long dataset and I want to tab and analyse outcomes by sex, pmqreg and age. At the minute for each id the sex is repeated for the same id. So when I tabulate I get a false number of males and females instead of couting by id. I cannot reshape wide as I need it in long to do the kind of analysis I want. How can I delete all but the first value for sex per id, and leave the others as missing?

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(id sex pmqreg age ayear y outcome)
    1 1 2 1 2010  4 1
    1 1 2 1 2011  5 1
    1 1 2 1 2012  6 0
    2 0 1 1 2013  6 1
    2 0 1 1 2014  7 1
    3 0 3 1 2017  8 1
    3 0 3 1 2018  9 1
    3 0 3 1    .  . .
    3 0 3 1    .  . .
    3 0 3 1    .  . .
    4 1 1 0 2010  4 1
    4 1 1 0 2011  5 1
    5 0 1 1 2012  5 1
    5 0 1 1 2013  6 1
    5 0 1 1 2014  7 0
    5 0 1 1    .  . .
    6 1 2 0 2014 10 0
    end


    Thanks,
    Carla

  • #2
    You do not need to delete anything, you can just tag one observation per id like this:

    Code:
    egen tag = tag(id)
    and then you can restrict your estimations or tabulations to one observation per id by including

    Code:
    if tag
    qualifier to your commands.

    Comment


    • #3
      Actually, with sex though it just generates a '1' per id- whether they are male or female so still can't tab it accurately

      input float(id sex pmqreg age ayear y outcome) byte tagsex
      1 1 2 1 2010 4 1 1
      1 1 2 1 2011 5 1 0
      1 1 2 1 2012 6 0 0
      2 2 1 1 2013 6 1 1
      2 2 1 1 2014 7 1 0
      3 2 3 1 2017 8 1 0
      3 2 3 1 2018 9 1 0
      3 2 3 1 . . . 0
      3 2 3 1 . . . 0
      3 2 3 1 . . . 0
      4 1 1 0 2010 4 1 0
      4 1 1 0 2011 5 1 0
      5 2 1 1 2012 5 1 0
      5 2 1 1 2013 6 1 0
      5 2 1 1 2014 7 0 0
      5 2 1 1 . . . 0
      6 1 2 0 2014 10 0 0
      end
      [/CODE]


      So I suppose I only want the first value of sex per id and the remaining sex observatiosn per id to missing

      Comment


      • #4
        Joro Kolev was bang on in #2: why not try what he suggested?

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input float(id sex pmqreg age ayear y outcome)
        1 1 2 1 2010  4 1
        1 1 2 1 2011  5 1
        1 1 2 1 2012  6 0
        2 0 1 1 2013  6 1
        2 0 1 1 2014  7 1
        3 0 3 1 2017  8 1
        3 0 3 1 2018  9 1
        3 0 3 1    .  . .
        3 0 3 1    .  . .
        3 0 3 1    .  . .
        4 1 1 0 2010  4 1
        4 1 1 0 2011  5 1
        5 0 1 1 2012  5 1
        5 0 1 1 2013  6 1
        5 0 1 1 2014  7 0
        5 0 1 1    .  . .
        6 1 2 0 2014 10 0
        end
        
        
        egen tag = tag(id) 
        
        tab sex if tag 
        
        
                sex |      Freq.     Percent        Cum.
        ------------+-----------------------------------
                  0 |          3       50.00       50.00
                  1 |          3       50.00      100.00
        ------------+-----------------------------------
              Total |          6      100.00

        Comment


        • #5
          Ah ok, I misunderstood the 'if tag' part, thanks

          Comment


          • #6
            Originally posted by Carla Hope View Post
            Ah ok, I misunderstood the 'if tag' part, thanks
            No Carla, you misunderstood the tagging part.

            What you want to do, is done by tagging one observation per id, hence the command to correctly generate the tag is

            Code:
            egen tag = tag(id)
            What you did instead is
            Code:
            egen wrongtag = tag(sex)
            which is tagging one observation per sex. (Look at the data you shared in #3, one observation in the group where sex==1 is tagged by 1, and then one obervation in the group where sex==2 is tagged by 1, and the rest of the observations the tag is 0.)

            Comment

            Working...
            X