Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Time Series Line Graphs (including adding up values for each year) and Geometric Trend for given data

    Hello!
    I am a student just started using Stata. I experience two issues trying to use Stata. Hopefully someone can help me
    I have data for different companies which I have classified R&D intensive or not (given the sic code) through an new variable. So each company is either R&D intensive or non R&D intensive. Not every company has an observation in different ways: not always is there an entry for xrd (.) and not every company has observations for all the years from 1987-1997. It looks like this:

    (please see Data Stata.jpg)
    Click image for larger version

Name:	Data Stata.JPG
Views:	1
Size:	57.8 KB
ID:	1322362


    I want to draw a graph (i) for the sum of the R&D spendings for all companies (let’s say from 1987 to 1997) and (ii) a graph for the sum just for companies which are classified non R&D intensive. The missing values can be handled as “0”. Also that for some years, a certain company is not always included (there are i.e. just values for 1992-1995)- that does not matter. The gvkey and company name do not matter as well.
    Simple as that, no matter what I try, I am not able to create such a figure. It should look like this (just an example plotted with excel):

    (please see Graph Stata.jpg)
    Click image for larger version

Name:	Graph Stata.JPG
Views:	1
Size:	25.7 KB
ID:	1322363


    I tried to use:

    *create new var for plotting graph just non-R&D intensive companies*

    egen totalrd_nrd = total (industry=="Non R&D Intensive"), by(year)

    It puts the sum of the R&D Spendings into all the observations. When I then use

    graph twoway line totalrd_nrd year

    it is a huge amount of points connected where I just want a few points connected (the sum of the spendings over the years 1987 to 1997, so eleven points connected which should give the graph. and then the second graph, also having the eleven points [sum for each year] connected).


    The second issue I got, is to fit a geometric trend to the real data on R&D Spendings from 1982 to 1995 for the as R&D intensive classified companies. It is the same data: so I wanted to use “R&D intensive” companies and “1982 to 1995”. But no matter what I try, I am not able to calculate and display a trend. Any ideas would be really helpful!

    Thanks to all of you!

  • #2
    Code:
    egen totalrd_nrd = total (industry=="Non R&D Intensive"), by(year)
    will count observations for which the true or false expression is true, as

    Code:
    industry=="Non R&D Intensive"
    is evaluated as 1 or 0 depending on whether the comparison evaluates as true or false. It doesn't do what you say.

    Here's a way towards what you want.

    Code:
    label def intensive 0 "Non R&D Intensive" 1 "R&D intensive" 
    decode industry, label(intensive) gen(intensive) 
    save alldata 
    
    collapse (total) xrd, by(intensive year) 
    line xrd year if intensive == 0, sort || line xrd year if intensive == 1, sort 
    save reduceddata

    Comment


    • #3
      Thank you for your help! I get the idea but trying to use the code, stata says to the second line:
      decode industry, label(intensive) gen(intensive) "not possible with string variable""
      So it seems trying to attach a label through "intensive" to industry does not work out. But when I use "destring" for industry, the code does not work any more. I tried the second line with "encode" and thought I could transform it into a numeric variable so I can work with it but that does not work.

      Any ideas how to solve?

      Comment


      • #4
        Your own #1 says that industry is a str13 variable.

        The egen statement you cite makes no sense otherwise.

        Something is wrong somewhere. Perhaps you are working with two or more different versions of your data. Either way, if you give false information, code suggestions may not work.

        Perhaps you should start again and use FAQ Advice #12 for uidance on telling us about your data.

        Comment


        • #5
          Hi Nick!

          Yes, you are right, industry is a str13 variable.

          When I tried using your suggestion (decode), it gave me the error code r(108):

          not possible with string variable;
          You have requested something that is logically impossible
          with a string variable, such as decoding it. Perhaps you meant
          another variable or typed decode when you meant encode.

          So I used encode to convert the variable "industry" to a numeric variable ("intensive") instead of decode.

          I am using Version 13 and looked for collapse because Stata told me after using the first part of your code that (total) is an invalid statistics (I thought it was changed from sum to total, but maybe not in connection with "collapse"?!).

          So I replaced it with (sum) and now it draws me the two graphs. I will double-check my whole code tomorrow but most importantly: I get the idea.

          Thank you so much for your help and suggestions, Nick! You helped me a lot.



          Comment


          • #6
            In fact two mistakes on my part when in a rush. Sorry about that.

            Code:
            label def intensive 0 "Non R&D Intensive" 1 "R&D intensive"  
            encode industry, label(intensive) gen(intensive)  
            save alldata  
            collapse (sum) xrd, by(intensive year)  
            line xrd year if intensive == 0, sort || line xrd year if intensive == 1, sort  
            save reduceddata

            Comment

            Working...
            X