Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Actually, you need to change the code in a few places.
    Code:
    drop if year < 1990
    summ year, meanonly
    local last = r(max)
    by country isic (year), sort: keep if year[1] == 1990 & year[_N] == `last' ///
        & _N == `last'-1990+1

    Comment


    • #17
      Originally posted by Clyde Schechter View Post
      Actually, you need to change the code in a few places.
      Code:
      drop if year < 1990
      summ year, meanonly
      local last = r(max)
      by country isic (year), sort: keep if year[1] == 1990 & year[_N] == `last' ///
      & _N == `last'-1990+1
      And from there I can construct the standard deviation by .... bysort isic year: egen sd= sd(ly)? Thank you!

      Comment


      • #18
        Well, that is how you can construct the standard deviation within each industry-year combination. But in the beginning of this thread you said you wanted the standard deviations within industry-tech intensity pairs.

        Comment


        • #19
          Originally posted by Clyde Schechter View Post
          Well, that is how you can construct the standard deviation within each industry-year combination. But in the beginning of this thread you said you wanted the standard deviations within industry-tech intensity pairs.
          Right, but if I get by industry isic, I can just use an if statement to get it in terms of tech_intensity. Right?

          Comment


          • #20
            Right, but if I get by industry isic, I can just use an if statement to get it in terms of tech_intensity.
            I don't know what you mean by this. If you want it by all triplets of industry, isic, and tech_intensity, then it would be
            Code:
            by iindustry isic tech_intensity, sort: egen sdly = sd(ly)

            Comment


            • #21
              Originally posted by Clyde Schechter View Post
              I don't know what you mean by this. If you want it by all triplets of industry, isic, and tech_intensity, then it would be
              Code:
              by iindustry isic tech_intensity, sort: egen sdly = sd(ly)
              Let me try this. Thank you very much!

              Comment


              • #22
                Originally posted by Clyde Schechter View Post
                I'm interpreting this as meaning that you want the sample to consist of all and only those country#industry combinations that have an observation in every year that appears in the data set.

                Code:
                summ year, meanonly
                local first = r(min)
                local last = r(max)
                by country isic (year), sort: keep if year[1] == `first' & year[_N] == `last' ///
                & _N == `last'-`first'+1
                should do it. Now, the example data in #1 contains only one year (1963), so it results in everything being kept. But based on your statement that the full data set is unbalanced, I believe this code will retain those and only those country#isic combinations that have an observation in every year. Just be aware that because of the special nature of the example data in #1, this code is not fully tested.
                I think the best way to phrase (and the fault is entirely mine) is the group of countries that contains all industries starting from 2000 in all years.

                Comment


                • #23
                  Something like this:
                  Code:
                  keep if year >= 2000
                  levelsof isic, local(levels)
                  local n_isic: word count `levels'
                  
                  summ year, meanonly
                  local n_years = r(max) - 2000 + 1
                  
                  local complete_obs_no = `n_years' * `n_isic'
                  
                  isid country isic year, sort
                  by country (isic year): keep if _N == `complete_obs_no'

                  Comment


                  • #24
                    Thank you, Professor Schechter! That's exactly what I did!

                    Comment


                    • #25
                      Originally posted by Clyde Schechter View Post
                      Something like this:
                      Code:
                      keep if year >= 2000
                      levelsof isic, local(levels)
                      local n_isic: word count `levels'
                      
                      summ year, meanonly
                      local n_years = r(max) - 2000 + 1
                      
                      local complete_obs_no = `n_years' * `n_isic'
                      
                      isid country isic year, sort
                      by country (isic year): keep if _N == `complete_obs_no'
                      Thank you very much! On a different post, I constructed the following variables after running the code:

                      Code:
                       tabstat  ly0 ly if year==2005 & ly0!=. & ly!=., by(tech_intensity) statistics(sd)
                      
                      Summary statistics: sd
                        by categories of: tech_intensity 
                      
                      tech_intensity |       ly0        ly
                      ---------------+--------------------
                                   0 |  1.155845  1.075978
                                   1 |  1.238137  1.199836
                                   2 |  1.307299  1.326163
                                   3 |  1.166958   .965853
                      ---------------+--------------------
                               Total |  1.245615  1.194803.
                      Code:
                       tabstat  ly0 ly if year==2015 & ly0!=. & ly!=., by(tech_intensity) statistics(sd)
                      
                      Summary statistics: sd
                        by categories of: tech_intensity 
                      
                      tech_intensity |       ly0        ly
                      ---------------+--------------------
                                   0 |  1.075115  .8698506
                                   1 |   1.23723  1.023555
                                   2 |  1.344891  1.050074
                                   3 |  1.112831  .9552974
                      ---------------+--------------------
                               Total |  1.259314  1.036577.
                      ------------------------------------

                      Shouldnt't ly0 for year==2015 equal to ly for year==2005?

                      My hypothesis is that some industries (isic) particularly tech_intensity==3, start having fewer observations over time (though I chose the group of countries (country) that contains all industries in my data (isic) starting from 2000 in all years until 2020). One example is isic==33.

                      Code:
                       tab ly0 ly if year==2015 & ly0!=. & ly!=.&isic=="33"
                      
                                 |     ly
                             ly0 |  9.514457 |     Total
                      -----------+-----------+----------
                        9.541218 |         1 |         1 
                      -----------+-----------+----------
                           Total |         1 |         1
                      Do you have a suggestion on how I can address this issue? Thank you, thank you so much!


                      Last edited by Hugo Rocha; 16 Jun 2022, 08:45.

                      Comment


                      • #26
                        Please post back with an example of the data that reproduces this problem. There are several possibilities, and I can't distinguish them without seeing the data as it looks just before you run these commands.

                        Comment


                        • #27

                          Sure. I constructed the group of countries that contains all industries starting from 2000 in all years using your code. Then, I computed the log of value added per worker (ly) and then I computed the log of value added per worker (L10.ly). My purpose in using the tables through tabstat is comparing the standard deviation of log of value added per worker in one year (for instance, ly in 2015) and the standard deviation of the same variable 10 years before (for instance, L10.ly). Then I use the tables to compare this standard deviation over different decades as shown. The inconsistency I find is that the standard deviation for ly in 2005 should be equal to the standard deviation for ly0 (=L10.ly) in 2015 but they are not (the data is yearly so L10 should be ten years before). I group all industries isic by groups called tech_intensity (0,1,2,and 3). My hypothesis is that start having fewer observations over time (though I chose the group of countries (country) that contains all industries in my data (isic) starting from 2000 in all years until 2020). One example is isic=="33". I

                          Code:
                           tab ly0 ly if year==2015 & ly0!=. & ly!=.&isic=="33"
                          
                                     |     ly
                                 ly0 |  9.514457 |     Total
                          -----------+-----------+----------
                            9.541218 |         1 |         1 
                          -----------+-----------+----------
                               Total |         1 |         1
                          Once I do
                          Code:
                           tab ly0 ly if year==2005 & ly0!=. & ly!=.&isic=="33"
                          , I find a much higher number of observations (I could not include it here due to character limits)



                          Why is the number of observations so different in 2005 and 2015 ? Could it be responsible for this "anomaly" on the tables? The dataex of the current data is

                          dataex country year isic tech_intensity ly ly0 if inrange(year, 1995, 2015) //Limit the range for the purpose of simplicity

                          ----------------------- copy starting from the next line -----------------------
                          Code:
                           * Example generated by -dataex-. To install: ssc install dataex
                          clear
                          input float(country year) str2 isic float(tech_intensity ly ly0)
                          8 1995 "15" 1        .        .
                          8 1996 "15" 1        .        .
                          8 1997 "15" 1        .        .
                          8 1998 "15" 1        . 9.058169
                          8 1999 "15" 1        . 8.877411
                          8 2000 "15" 1  8.17525 8.155766
                          8 2001 "15" 1 8.462609        .
                          8 2002 "15" 1 8.465451        .
                          8 2003 "15" 1  8.82891        .
                          8 2004 "15" 1 8.980627        .
                          8 2005 "15" 1 9.174892        .
                          8 2006 "15" 1 9.137696        .
                          8 2007 "15" 1  9.21706        .
                          8 2008 "15" 1 9.405575        .
                          8 2009 "15" 1  9.23012        .
                          8 2010 "15" 1  9.27365  8.17525
                          8 2011 "15" 1 9.130654 8.462609
                          8 2012 "15" 1 9.127211 8.465451
                          8 2013 "15" 1 9.197433  8.82891
                          8 2014 "15" 1 9.201413 8.980627
                          8 2015 "15" 1 8.940754 9.174892
                          8 1995 "16" 1        .        .
                          8 1996 "16" 1        .        .
                          8 1997 "16" 1        .        .
                          8 1998 "16" 1        .        .
                          8 1999 "16" 1        .        .
                          8 2000 "16" 1 8.056465        .
                          8 2001 "16" 1 8.272923        .
                          8 2002 "16" 1 7.955277        .
                          8 2003 "16" 1 8.055888        .
                          8 2004 "16" 1        .        .
                          8 2005 "16" 1 8.674198        .
                          8 2006 "16" 1 8.447207        .
                          8 2007 "16" 1 8.640269        .
                          8 2008 "16" 1 8.172178        .
                          8 2009 "16" 1 8.525334        .
                          8 2010 "16" 1 8.784348 8.056465
                          8 2011 "16" 1 9.530767 8.272923
                          8 2012 "16" 1 8.830056 7.955277
                          8 2013 "16" 1        . 8.055888
                          8 2014 "16" 1 8.489489        .
                          8 2015 "16" 1        . 8.674198
                          8 1995 "17" 1        .        .
                          8 1996 "17" 1        .        .
                          8 1997 "17" 1        .        .
                          8 1998 "17" 1        . 8.116125
                          8 1999 "17" 1        . 8.157984
                          8 2000 "17" 1  7.44716 7.723116
                          8 2001 "17" 1 7.608407        .
                          8 2002 "17" 1 7.697477        .
                          8 2003 "17" 1 7.895647        .
                          8 2004 "17" 1 7.959837        .
                          8 2005 "17" 1 8.107668        .
                          8 2006 "17" 1  8.23846        .
                          8 2007 "17" 1 8.409211        .
                          8 2008 "17" 1 8.583404        .
                          8 2009 "17" 1 8.543821        .
                          8 2010 "17" 1 8.452183  7.44716
                          8 2011 "17" 1 8.549864 7.608407
                          8 2012 "17" 1 8.562744 7.697477
                          8 2013 "17" 1 8.634156 7.895647
                          8 2014 "17" 1 8.663963 7.959837
                          8 2015 "17" 1 8.786993 8.107668
                          8 1995 "18" 1        .        .
                          8 1996 "18" 1        .        .
                          8 1997 "18" 1        .        .
                          8 1998 "18" 1        .        .
                          8 1999 "18" 1        .        .
                          8 2000 "18" 1        .        .
                          8 2001 "18" 1        .        .
                          8 2002 "18" 1        .        .
                          8 2003 "18" 1        .        .
                          8 2004 "18" 1        .        .
                          8 2005 "18" 1        .        .
                          8 2006 "18" 1        .        .
                          8 2007 "18" 1        .        .
                          8 2008 "18" 1        .        .
                          8 2009 "18" 1        .        .
                          8 2010 "18" 1        .        .
                          8 2011 "18" 1        .        .
                          8 2012 "18" 1        .        .
                          8 2013 "18" 1        .        .
                          8 2014 "18" 1        .        .
                          8 2015 "18" 1 8.292338        .
                          8 1995 "19" 1        .        .
                          8 1996 "19" 1        .        .
                          8 1997 "19" 1        .        .
                          8 1998 "19" 1        .        .
                          8 1999 "19" 1        .        .
                          8 2000 "19" 1 7.383794        .
                          8 2001 "19" 1 7.702435        .
                          8 2002 "19" 1 7.840409        .
                          8 2003 "19" 1 8.008558        .
                          8 2004 "19" 1  8.27576        .
                          8 2005 "19" 1 8.277926        .
                          8 2006 "19" 1 8.241632        .
                          8 2007 "19" 1 8.370808        .
                          8 2008 "19" 1 8.513551        .
                          8 2009 "19" 1 8.618648        .
                          8 2010 "19" 1 8.686357 7.383794
                          end


                          Comment


                          • #28
                            Well, in your new example data, the industries all have the same number of observations (21), except for "19" which is a bit short. But that's because -dataex- cuts off after 100 observations unless you override it, so I'm going to ignore "19." The rest of the industries all have complete data from 1995 through 2015. And in your example data, the sd of ly in 2005 does agree with the sd of ly0 in 2015:

                            Code:
                            . drop if isic == "19"
                            (16 observations deleted)
                            
                            . version 16: table year if inlist(year, 2005, 2015), c(N ly N ly0 sd ly sd ly0)
                            
                            ----------------------------------------------------------
                                 year |      N(ly)      N(ly0)      sd(ly)     sd(ly0)
                            ----------+-----------------------------------------------
                                 2005 |          3           0    .5339506            
                                 2015 |          3           3    .3388137    .5339506
                            ----------------------------------------------------------
                            And, although I am not showing the results, the same is true for all pairs of years 10 years apart except when one or other of the standard deviations is missing.

                            In short, I cannot replicate your problem in the example data.

                            You seem to have a particular concern about isic "33," but I can't say anything about that because it was not included in your example data.
                            Last edited by Clyde Schechter; 16 Jun 2022, 13:51.

                            Comment


                            • #29
                              Originally posted by Clyde Schechter View Post
                              Well, in your new example data, the industries all have the same number of observations (21), except for "19" which is a bit short. But that's because -dataex- cuts off after 100 observations unless you override it, so I'm going to ignore "19." The rest of the industries all have complete data from 1995 through 2015. And in your example data, the sd of ly in 2005 does agree with the sd of ly0 in 2015:

                              Code:
                              . drop if isic == "19"
                              (16 observations deleted)
                              
                              . version 16: table year if inlist(year, 2005, 2015), c(N ly N ly0 sd ly sd ly0)
                              
                              ----------------------------------------------------------
                              year | N(ly) N(ly0) sd(ly) sd(ly0)
                              ----------+-----------------------------------------------
                              2005 | 3 0 .5339506
                              2015 | 3 3 .3388137 .5339506
                              ----------------------------------------------------------
                              And, although I am not showing the results, the same is true for all pairs of years 10 years apart except when one or other of the standard deviations is missing.

                              In short, I cannot replicate your problem in the example data.

                              You seem to have a particular concern about isic "33," but I can't say anything about that because it was not included in your example data.
                              But once, I group my industries (by the categories of tech_intensity) the standard deviations are not the same, right? This is my concern, because in the final paper I want to group them by tech_intensity.

                              Code:
                               tabstat  ly0 ly if year==2005 & ly0!=. & ly!=., by(tech_intensity) statistics(sd)
                              
                              Summary statistics: sd
                                by categories of: tech_intensity
                              
                              tech_intensity |       ly0        ly
                              ---------------+--------------------
                                           0 |  1.155845  1.075978
                                           1 |  1.238137  1.199836
                                           2 |  1.307299  1.326163
                                           3 |  1.166958   .965853
                              ---------------+--------------------
                                       Total |  1.245615  1.194803
                              Here ly= ly0 for 2015. But it is not the case
                              ------------------------------------

                              Code:
                               tabstat  ly0 ly if year==2015 & ly0!=. & ly!=., by(tech_intensity) statistics(sd)
                              
                              Summary statistics: sd
                                by categories of: tech_intensity
                              
                              tech_intensity |       ly0        ly
                              ---------------+--------------------
                                           0 |  1.075115  .8698506
                                           1 |   1.23723  1.023555
                                           2 |  1.344891  1.050074
                                           3 |  1.112831  .9552974
                              ---------------+--------------------
                                       Total |  1.259314  1.036577
                              But that is not the case....here ly0 does not equal ly in 2005...
                              Last edited by Hugo Rocha; 16 Jun 2022, 14:00.

                              Comment


                              • #30
                                Well, again, with the example data you show, I can't say anything because only tech intensity 1 was included.

                                I can speculate on one possibility here: did any of the industries change their tech_intensity during the time covered by your data?

                                Comment

                                Working...
                                X