Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating new variables

    Hi everyone.
    I am fairly new to Stata and wondered if anyone of you more experienced users could give me a tip?

    I am looking at the progression of academic careers by following a number of researchers from 1995 to 2015. Now a “start” year is chosen for each observation and I want to create a range of pre and post measures on their publication activities before and after this given year.

    Say I want to make a pre measure for observations with the start year 2001:
    gen newvar = (var1_95+var1_96 +var1_97+var1_98 var1_99 var00)/(var2_95+var2_96+var2_97+var2_98+var2_99) if start_year==2001


    Now I have to do this with numerous different start years and several other variables, can I do this in an easier way for instance with loops? Can I somehow automatically generate these new variables based on the start year?

    Hope you can help. Thanks in advance.

  • #2
    I would recommend you to reshape your dataset from wide to long. A kind of mantra in this forum is that experienced users generally agree that, Stata makes it much more easier to manage complex computations using a long rather than a wide layout of the same data. Type help reshape for more details.

    Comment


    • #3
      Sure you can do some loop, but we need preciser information before to give you the exact code :

      Could you precise what var1 and var2 stand for? Are they the number of papers published by the author over the total number of papers published?

      Could you describe your data (or input some using dataex)? It seems you have wide data (because your variables are called var_95 var_96, etc...
      It would be easier to add the condition on years, but also to do a forvalues loop using long data (see help reshape), or at least with variables renamed var_1995 ... var_2000,
      In addition, doing what you ask, you'll end up with a lot of variables (22 new variables : 11 year * 2 dimensions : pre and post publications), are you sure you need each of them?

      Anyway, take a look on help foreach and help forvalues, they describe how set a loop on Stata.


      Best,
      Charlie

      Comment


      • #4
        var1* is citation impact for each year and var2* is number of publications each year.
        I am ultimately looking to create an average citation impact in a six years period before they started their post.doc and six years after (there are four other research performance variables just like this). I cannot reshape the dataset as it would complicate some of the other variables in the dataset substantially.

        The challenge here is that for the following year 2002 the code would be:
        gen newvar = (var1_96 +var1_97+var1_98+var1_99+var1_00 var1_01)/(var2_96+var2_97+var2_98+var2_99+var2_00+var2_01) if start_year==2002


        Can I use a loop or a foreach command in this case or must I manually compute these variables?
        Last edited by Malene Christensen; 12 Sep 2016, 07:42.

        Comment


        • #5
          Well, I won't address the problem of wanting to do this with several sets of variables. But here's how I wold approach it for multiple start years and just var1 and var2. The key is, as Oded points out, to put the data into long layout. You will struggle endlessly to do this in wide layout, and even if you do succeed, you will only encounter more obstacles as you move on to other aspects of data management. There is a slight complication here: the sequence 96, 97, 98, 99, 00, 01, 02 is not a natural sequence without the understanding that it refers to 1996 through 2002. Of course, Stata won't know that's what you mean, so we will have to explicitly deal with that. I assume you have a variable called id which identifies individual observations in your data. You don't say which start years you want to do this for, but for illustration, I'll assume you want to do it for 1998 2000 and 2002.

          Code:
          local startyears 1998 2000 2002 // OR WHATEVER YEARS YOU ARE ACTUALLY INTERESTED IN
          reshape long var1_ var2_, i(id) j(year) // PROBABLY NEED TO INCLUDE OTHER VARIABLES IN YOUR VARLIST HERE
          replace year = 2000 + year if year <= 16
          replace year = 1900 + year if year > 16
          foreach s of local startyears {
              by id (year), sort: egen numerator = total(cond(year < start_year, var1_, .))
              by id (year):  egen denominator = total(cond(year < start_year, var2_, .))
              gen newvar_for_`s' = numerator/denominator
              drop numerator denominator
          }
          Notes:
          1. Not tested. Beware of typos.
          2. Do read the [D] manual section on the -egen- command which contains a large array of extremely useful data management functions.
          3. Also, -foreach- is a basic looping command that is a must-know for any serious Stata user. You will find it explained and illustrated in the [P] manual.

          To do this with multiple sets of variables will entail embedding the above -foreach s of local startyears- loop inside a loop over the other variable pairs. The details of that depend on how those variables are named.

          If, when all of this is done, you have some compelling reason to go back to wide layout, you can, of course do so. But it is likely that whatever it is you need to do after this will also be easier in long layout, so, in general, I would advise sticking with the long layout throughout your analyses. (Wide layout is more convenient for certain types of graphs, and for a small number of analysis commands, but long layout is better for most things in Stata.)

          Added: I don't know what circumstances you face that lead you to say that it would complicate your life to go to long layout, and I'm skeptical. But if that is really true, you can start by saving your data, reducing the data set to just id, var1_* and var2_*, and then using the code above to calculate your new variables. Then you can -reshape- the results back to wide and -merge- back to your original data.
          Last edited by Clyde Schechter; 12 Sep 2016, 09:03.

          Comment

          Working...
          X