Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • using conditions to create new variables

    P_ID Firm_ID year trainee_or_employed wage Average_wage_of_training_years Firm_fixed_effect Firm_fixed_effect_last_year_of_training Firm_size Firm_size_last_year_of_training
    1 1111 2011 1(trainee) 800 850 -.34 -.34 50 54
    1 1111 2012 1 900 850 -.34 -.34 54 54
    1 1111 2013 2 (employed) 2500 850 -.34 -.34 52 54
    1 1111 2014 2 2600 850 -.34 -.34 57 54
    2 3333 2011 1 600 716.6 .523 .822 123 161
    2 4444 2012 1 700 716.6 .822 .822 152 161
    2 4444 2013 1 850 716.6 .822 .822 161 161
    2 5555 2014 2 2700 716.6 1.04 .822 236 161
    2 5555 2015 2 2800 716.6 1.04 .822 254 161
    2 5555 2016 2 3000 716.6 1.04 .822 250 161
    Hi,
    I have such a dataset and need to create some new variables conditioned on some other variables and got confused.
    1. Creating variable "Average_wage_of_training_years" from the person's trainee years? Such a similar thing I should write?
    bys year: gen Average_wage_of_training_years=mean(wage) if trainee_or_employed==1

    2. Creating the variable "Firm_size_last_year_of_training" by using the person's firm in the last training year?

    Would you please help me with these two commands?

    Best,
    Alp

  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int(p_id firm_id year trainee_or_employed wage firm_size)
    1 1111 2011 1  800  50
    1 1111 2012 1  900  54
    1 1111 2013 2 2500  52
    1 1111 2014 2 2600  57
    2 3333 2011 1  600 123
    2 4444 2012 1  700 152
    2 4444 2013 1  850 161
    2 5555 2014 2 2700 236
    2 5555 2015 2 2800 254
    2 5555 2016 2 3000 250
    end
    sort p_id year
    by p_id: egen awty = mean(cond(trainee_or_employed==1,wage,.))
    by p_id: egen lty = max(cond(trainee_or_employed==1,year,.))
    by p_id: egen fslyt = max(cond(year==lty, firm_size, .))
    list, sepby(p_id) noobs
    Code:
    . list, sepby(p_id) noobs
    
      +------------------------------------------------------------------------------+
      | p_id   firm_id   year   traine~d   wage   firm_s~e       awty    lty   fslyt |
      |------------------------------------------------------------------------------|
      |    1      1111   2011          1    800         50        850   2012      54 |
      |    1      1111   2012          1    900         54        850   2012      54 |
      |    1      1111   2013          2   2500         52        850   2012      54 |
      |    1      1111   2014          2   2600         57        850   2012      54 |
      |------------------------------------------------------------------------------|
      |    2      3333   2011          1    600        123   716.6667   2013     161 |
      |    2      4444   2012          1    700        152   716.6667   2013     161 |
      |    2      4444   2013          1    850        161   716.6667   2013     161 |
      |    2      5555   2014          2   2700        236   716.6667   2013     161 |
      |    2      5555   2015          2   2800        254   716.6667   2013     161 |
      |    2      5555   2016          2   3000        250   716.6667   2013     161 |
      +------------------------------------------------------------------------------+
    In the future, please use the dataex command to present example data. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, dataex is already part of your official Stata installation. If not, run ssc install dataex to get it. Either way, run help dataex and read the simple instructions for using it. dataex will includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use the dataex command.

    Comment


    • #3
      Hi William,

      Thank you for the answer and your recommendation. Sure I learn about dataex.
      Would please tell me under which specific topic can I learn basically about these sorts of commands with conditions? A video tutorial or article, that I can learn the logic behind it.
      I would really appreciate it.

      Best,
      Alp

      Comment


      • #4
        I don't know anything about Stata video tutorials, other than that they exist. The three important techniques in post #2 are documented in
        Code:
        help by
        help egen
        help cond()
        and in the full PDF documentation included with your Stata installation and linked to from the beginning of the output of each of these help commands.

        The way I learned about this technique is that I saw code similar to what I presented here and I used the help command to read the documentation and understand the code that was presented, and I tested my knowledge by applying the commands I had studied to other data.

        I note that your previous Statalist topic was answered with very similar code using by, egen, and cond(). Perhaps you took the code as given, without examining how it worked?

        More generally, when I began using Stata in a serious way, I started, as have others here, by reading my way through the Getting Started with Stata manual relevant to my setup. Chapter 18 then gives suggested further reading, much of which is in the Stata User's Guide, and I worked my way through much of that reading as well. All of these manuals are included as PDFs in the Stata installation and are accessible from within Stata - for example, through the PDF Documentation section of Stata's Help menu.

        The objective in doing the reading was not so much to master Stata - I'm still far from that goal - as to be sure I'd become familiar with a wide variety of important basic techniques, so that when the time came that I needed them, I might recall their existence, if not the full syntax, and know how to find out more about them in the help files and PDF manuals.

        Stata supplies exceptionally good documentation that amply repays the time spent studying it - there's just a lot of it. The path I followed surfaces the things you need to know to get started in a hurry and to work effectively.

        Comment


        • #5
          William Lisowski gives excellent advice.

          The main trick in #2 is also written up in Section 9 of https://www.stata-journal.com/articl...article=dm0055

          Whether there is a video tutorial I wouldn't know. There are people who prepare video tutorials (Chuck Huber of StataCorp is #1 in this field) and people who write papers, and not much overlap between the two.

          Comment


          • #6
            Thank you William and Nick for your recommendations! I will go through them.

            Comment


            • #7
              I want to create a variable called TENURE, which shows the days a skilled person (erwastat==101) has worked in a company, but not when he was a trainee (erwastat==102) in the company. The days in the company (Tage_im_Betrieb) are accumulated if the person stays in the same company (Mover==0). To solve this, I create Tenure_trainee (by ID_t: egen Tenure_trainee = max(cond(jahr==lty, Tage_im_Betrieb, .)). So if I subtract this from the Tage_im_Betrieb (days in the company) when the person is a skilled worker (erwastat==101), the desired TENURE variable is created for the first person (10001) with this command "gen TENURE=Tage_im_Betrieb - Tenure_trainee if erwstat==101 & Mover==0".
              But then the TENURE for the second person becomes problematic; he has changed the company after training (Mover==1). I want to avoid performing this subtraction command for those who change their training company when they become skilled worker (erwastat==101); instead of the red days in data below, the blue ones remain for person 10002.

              I thought as it is relevant to my previous question here, I post it here not a new topic.
              I appreciate your possible help.

              Code:
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input int ID_t long betnr int(jahr erwstat) float lty byte Mover float(Tenure_trainee Tage_im_Betrieb TENURE)
              10001 11111 2012 102 2014 0 640  210  210
              10001 11111 2013 102 2014 0 640  420  420
              10001 11111 2014 102 2014 0 640  640  640
              10001 11111 2015 101 2014 0 640  850  210
              10001 11111 2016 101 2014 0 640  950  310
              10001 11111 2017 101 2014 0 640 1100  460
              10001 11111 2018 101 2014 0 640 1400  760
              10001 11111 2019 101 2014 0 640 1700 1060
              10002 22222 2012 102 2014 0 300  210  210
              10002 33333 2013 102 2014 1 300  100  100
              10002 33333 2014 102 2014 0 300  300  300
              10002 44444 2015 101 2014 1 300  200  200
              10002 44444 2016 101 2014 0 300  400  100
              10002 44444 2017 101 2014 0 300  600  300
              10002 44444 2018 101 2014 0 300  800  500
              10002 44444 2019 101 2014 0 300 1000 700
              end
              Last edited by Alp Marta; 15 Dec 2022, 06:32.

              Comment

              Working...
              X