Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • struggle to check number of unemployment by individual in a panel data

    Hello,

    I am using a panel data that survey every month for individual. And one of the varialbe or information I would like to find out is the number of unemployment for each individual. And the code I am using is following:
    Code:
    bysort prim_key: gen n_ump = sum(unemploy)
    But I don't know whether the code is correct to get what I want.

    I also use following to see whether there is a difference
    Code:
     bysort prim_key: egen n_ump = count(unemploy)
    But will get error message
    Code:
    variable _000000 already defined
    variable _000002 aldready defined
    r(110)
    Last edited by Wenhan Yan; 07 Jan 2023, 06:50.

  • #2
    You need a new variable name if n_ump is already in use.

    Also, the Stata function sum() calculates cumulative or running sums, so the value you want is in the last observation for each person.

    Further the egen function count() is wrong here: it counts non-missing values, so 0s and 1s alike are included.

    What you want is I think

    Code:
    egen wanted = total(unemp), by(prim_key)
    or

    Code:
    bysort prim_key : egen wanted = total(unemp).

    Comment


    • #3
      Originally posted by Nick Cox View Post
      You need a new variable name if n_ump is already in use.

      Also, the Stata function sum() calculates cumulative or running sums, so the value you want is in the last observation for each person.

      Further the egen function count() is wrong here: it counts non-missing values, so 0s and 1s alike are included.

      What you want is I think

      Code:
      egen wanted = total(unemp), by(prim_key)
      or

      Code:
      bysort prim_key : egen wanted = total(unemp).
      Hello Nick,

      Thanks for the reply, but there is something wrong with the egen here, I tried egen with total(), still gave me the similar error code, even I tried drop the variable first, tried different name, but still same error code.
      Code:
      variable _000000 already defined
      variable _000001 already defined
      r(110)
      Also the unemployis a dummy variable indicate whether the individual in this month is unemployed.

      Thanks
      Last edited by Wenhan Yan; 08 Jan 2023, 01:50.

      Comment


      • #4
        You don't give a data example. If you try this example

        Code:
        clear 
        input prim_key mdate unemp
        1   800  0
        1   801  0 
        1   802  1 
        2   800  0
        2   801  1
        2   802  1
        end 
        
        egen wanted = total(unemp), by(prim_key)
        
        list, sepby(prim_key)
        
             +-----------------------------------+
             | prim_key   mdate   unemp   wanted |
             |-----------------------------------|
          1. |        1     800       0        1 |
          2. |        1     801       0        1 |
          3. |        1     802       1        1 |
             |-----------------------------------|
          4. |        2     800       0        2 |
          5. |        2     801       1        2 |
          6. |        2     802       1        2 |
             +-----------------------------------+
        and don't get the same results then my only guess is that your Stata files are corrupted. Contact Stata technical services. https://www.stata.com/support/faqs/t...-tech-support/

        Comment


        • #5
          Originally posted by Nick Cox View Post
          You don't give a data example. If you try this example

          Code:
          clear
          input prim_key mdate unemp
          1 800 0
          1 801 0
          1 802 1
          2 800 0
          2 801 1
          2 802 1
          end
          
          egen wanted = total(unemp), by(prim_key)
          
          list, sepby(prim_key)
          
          +-----------------------------------+
          | prim_key mdate unemp wanted |
          |-----------------------------------|
          1. | 1 800 0 1 |
          2. | 1 801 0 1 |
          3. | 1 802 1 1 |
          |-----------------------------------|
          4. | 2 800 0 2 |
          5. | 2 801 1 2 |
          6. | 2 802 1 2 |
          +-----------------------------------+
          and don't get the same results then my only guess is that your Stata files are corrupted. Contact Stata technical services. https://www.stata.com/support/faqs/t...-tech-support/
          Hello Nick,

          Thank you for the reply, I am sorry that I forgot to provide the data sample, and the example you provide is close to what I got and what I want.

          As for the error, there may result from some part of my cleanning code since when I use the code with original dataset, it works well

          Comment


          • #6
            If you get the same error messages again, follow up with


            Code:
            summarize __000011
            summarize __000012
            to see whether somehow you already have those variables already in the dataset, although that shouldn't happen.

            Comment


            • #7
              Originally posted by Nick Cox View Post
              If you get the same error messages again, follow up with


              Code:
              summarize __000011
              summarize __000012
              to see whether somehow you already have those variables already in the dataset, although that shouldn't happen.
              Hello Nick,

              I do find out the variables exist in my dataset but I don't know why I am having these in my dataset, maybe something wrong when I use loop to clean the data?

              Also I don't know why I cannot use egen by having these variables

              Comment


              • #8
                Perhaps someone created them on purpose or Stata crashed for some reason and garbage was left in your dataset.

                What is easier to explain is that Stata feels free to create temporary variables while a command is running -- and should remove them when a command has finished.

                The names Stata uses for temporary variables start at __000000 and continue upwards. Nothing stops anyone deliberately creating such a variable, but it is a bad idea because then the problem you mentioned will bite you.

                Code:
                * never do this even if you understand what you're doing 
                gen __000000 = 42

                You need to look at those variables, and rename them if you can work out what they are and they are worth keeping -- or otherwise drop them from the dataset.


                This isn't specific to egen. My guess in #4 that your files are corrupted seems less likely on this evidence.

                Comment


                • #9
                  Originally posted by Nick Cox View Post
                  Perhaps someone created them on purpose or Stata crashed for some reason and garbage was left in your dataset.

                  What is easier to explain is that Stata feels free to create temporary variables while a command is running -- and should remove them when a command has finished.

                  The names Stata uses for temporary variables start at __000000 and continue upwards. Nothing stops anyone deliberately creating such a variable, but it is a bad idea because then the problem you mentioned will bite you.

                  Code:
                  * never do this even if you understand what you're doing
                  gen __000000 = 42

                  You need to look at those variables, and rename them if you can work out what they are and they are worth keeping -- or otherwise drop them from the dataset.


                  This isn't specific to egen. My guess in #4 that your files are corrupted seems less likely on this evidence.
                  Hello Nick,

                  When I finished cleanning my data, and tried to use egen it worked. And the variable _000000 was not in my dataset. But when I tried to open my saved data in other analysis do file, the _000000 comes out and not able to use egen. Here is my code to save and open the data
                  Code:
                  *cleanning
                  ...
                  ...
                  save $data/unemploy_health, replace
                  log close
                  Code:
                  *************Different File
                  *analysis
                  clear all
                  set more off
                  set seed 2021218
                  set maxvar 30000
                  macro drop all
                  *change directory
                  cd "..."
                  cap log close
                  global data "data"
                  global log "log"
                  global graph "graph"
                  global table "table"
                  global do "do"
                  
                  log use $log/eventstudy.log, replace text
                  *import data
                  use $data/unemploy_health,replace
                  The last line is where problem comes in, have no idea why since when saving, the data is fine, but when I close or clean everything then open again, the problem pops up
                  Last edited by Wenhan Yan; 14 Jan 2023, 02:19.

                  Comment


                  • #10
                    It's the same question and I don't have another answer. Naturally I can't see your datasets. Nor, a little more positively, can I see anything in your code that would generate a variable with a name like __000000.

                    Where do your datasets come from? Your own work? A fellow researcher? Some official source?


                    Comment


                    • #11
                      Hello Nick,

                      When I try not to use global variable to save my dataset, the problem was gone
                      Code:
                      data/unemploy_health, replace
                      instead of
                      Code:
                      $data/unemployment_health, replace
                      As for the data, I am using Singapore Life Panel data, the data was collected by The Centre For Research On Successful Ageing, and here is the link for the webside: https://rosa.smu.edu.sg/

                      The data is a montly survey asking people from age 50 and above about their demographic, consumption behavior, employment and other variables linked to individual level.

                      Comment

                      Working...
                      X