Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating different times series variables from one times series variable

    Hi All
    I have a string variable representing industry of the firm. There are 5 different industries. My dataset is quarterly cross section time series. I am interested in the aggregate profits of firms in each industry in a given quarter.
    I measured aggregate profits in each industry in a given quarter with weights based on lagged market capitalization as following:

    egen aggregate= wtmean(profit), by (fqdate industry) weight (lag_market)
    duplicates drop fqdate, force

    This now gives me a new variable “aggregate” that takes values representing the aggregate profit of each industry in a given quarter. I want now to create 5 time series where each time series represent each industry. In other words, I want to have 5 times series industry variables instead one industry variables.

    Thanks for your help in advance
    Lisa

  • #2
    Hi again
    Thinking about it again, I believe that the duplicate drop fqdate here is not correct because it will keep only an aggregate representing one industry each quarter. I may use this after creating the mulyiple time series....
    Any help?
    Lisa

    Comment


    • #3
      In #1 you use but do not explain the community-contributed egen function wtmean() from the package _gwtmean on SSC. (Please note FAQ Advice #12 on explaining references to community-contributed software.)

      It would seem that aggregate profit is the sum not the mean (???) suggesting that

      Code:
      collapse (sum) profit [w=lag_market], by(industry qdate)
      egen panelid = group(industry), label
      is a more direct route for what you want, although collapse does support means directly if that is indeed what you want. Then your data are in standard panel data structure, but if there is a good reason for different variables for different industries, then

      Code:
      separate profit, by(industry) veryshortlabel
      is one possibility.

      Comment


      • #4
        Thanks Nick
        In fact, my industry aggregate should be value weighted cross sectional average rather than sum.
        I changed
        HTML Code:
        sum
        in the first line of your code to
        HTML Code:
        mean
        or
        Code:
        wtmean
        ?!

        The problem now is that the resulting dataset runs from fqdate say 1970q1 to 2015q4 and shows the industry and the aggregate profit in each quarter whereas all other industry series show missing values, then the time variable starts again from 1970q1 to 2015q4 and show the second industry and the aggregate profit in each quarter whereas all other industry series show missing values and so on...

        I want the dataset to show time and series without missing values each time a series show values. Is this clear? Should I send a sample of my dataset to clarify this problem?
        Thanks
        Lisa
        Last edited by Lisa Wilson; 07 Aug 2018, 02:30.

        Comment


        • #5
          The help for collapse answers your question. It supports weighted means by allowing weights. Means are the default in collapse, but you're at liberty to specify

          Code:
          collapse (mean)
          wtmean is just David Kantor's name for his egen function; it's not generic Stata syntax.

          Comment


          • #6
            Thanks Nick.
            I have actually revised my post after doing that and encountered another problem. Do you have any suggestion to my revised post #4?

            Comment


            • #7
              I don't understand the difficulty without seeing enough original data to reproduce it. Usually missing data after collapse mean missing data in the original.

              Comment


              • #8
                I use dataex to upload the data as following:
                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input long gvkey int(datadate fqdate) float(lag_market industry profit)
                 8252 7760 84 122.05431 1 3.9592724
                 5968 7760 84 587.43774 1  3.910057
                 7754 7760 84   72.1735 1 1.8557103
                 5073 7760 84  9954.955 1  3.363194
                 7675 7760 84  859.8544 1 1.5741892
                 2479 7760 84         . 1 2.9935496
                10540 7760 84   393.312 1  .8415915
                 5476 7760 84 164.28525 1   .575923
                 7373 7760 84  281.1525 1 2.0901177
                 6639 7760 84         . 1  2.313471
                 1663 7760 84 1234.9847 1 1.5138514
                11284 7760 84 2126.9133 1  .8250414
                 1246 7760 84         . 1  2.528295
                 8935 7760 84  692.3658 1  3.013576
                 1919 7760 84   785.757 1  1.110183
                 9985 7760 84         . 1   1.53541
                 4062 7760 84   950.136 1  .4701737
                 2090 7760 84    14.172 1    4.4822
                 1108 7760 84   6.56425 1  5.216558
                 4769 7760 84  164.9545 1  1.481566
                 9388 7760 84  62.60475 1  1.484355
                 6845 7760 84 10.660687 1 4.4851575
                 2290 7760 84  908.6122 1  2.582276
                 9504 7760 84         . 1  .9077477
                 3746 7760 84 2154.0784 1 2.3640664
                11017 7760 84  270.8359 1 1.6063673
                 1120 7760 84  12.94875 1  4.463491
                 6573 7760 84  60.89762 1  2.561683
                 4535 7760 84         . 1  6.250216
                 8253 7760 84   585.395 1 2.0204425
                 8730 7760 84   194.523 1 1.4483584
                 9284 7760 84 26.022125 1  5.037015
                 8005 7760 84  752.8281 1  2.723059
                 6475 7760 84  888.2648 1 1.3669474
                 2919 7760 84    33.867 1  2.303609
                 7154 7760 84 1950.9956 1 1.0388031
                 5055 7760 84 1504.0603 1 2.1720443
                10443 7760 84  6062.633 1 1.9180068
                 5764 7760 84   62.1335 1   7.55559
                10952 7760 84  193.0906 1  4.959033
                13990 7760 84  61.26563 1  2.610246
                 7230 7760 84 33.838375 1  1.395119
                11266 7760 84         . 1 2.0601277
                 3167 7760 84    103.76 1 2.2195196
                 3239 7760 84       270 1  .3576266
                 1344 7760 84         . 1 2.2038078
                 6326 7760 84  60.42738 1   2.05855
                 5256 7760 84  487.7955 1 1.1592128
                11453 7760 84 113.92545 1 1.6651073
                 5453 7760 84         . 1 2.9128954
                 2157 7760 84  1353.401 1  2.691121
                10131 7760 84   160.208 1 2.1311116
                 5731 7760 84   21.4285 1  1.723308
                 1498 7760 84  52.69012 1 4.6999025
                 9463 7760 84 239.76974 1  2.163188
                 8510 7760 84  1336.715 1  .7057932
                10884 7760 84 184.17175 1  8.812179
                 1935 7760 84     11.75 1  3.270246
                 3657 7760 84    17.829 1 3.2670064
                 3125 7760 84   84.2745 1  4.236088
                 7241 7760 84  768.1357 1 1.1646166
                 8543 7760 84  1253.523 1  4.291421
                 2043 7760 84         . 1   3.02263
                 9113 7760 84    3661.1 1   2.40946
                 7325 7760 84  38.95975 1 2.6517644
                 3642 7760 84 24.578314 1  6.568151
                 3612 7760 84   128.439 1  .8942488
                 1278 7760 84  304.4275 1  2.745142
                 3869 7760 84   165.813 1 1.5869882
                 1643 7760 84 262.41187 1 4.0497856
                10915 7760 84         . 1  4.382927
                 5726 7760 84     81.34 1  1.779362
                 5268 7760 84  50.91275 1 4.0629444
                10160 7760 84  249.2065 1 3.0018775
                 9416 7760 84   47.7445 1  2.409673
                 8875 7760 84  83.86088 1  4.166289
                 5722 7760 84   98.0515 1  1.761272
                 5772 7760 84    76.969 1  1.777035
                 2033 7760 84 35.855637 1  2.632339
                 7139 7760 84  346.3628 1 1.0844092
                 1408 7760 84  983.5838 1   4.69546
                 7891 7760 84  39.24375 1  2.792379
                 5185 7760 84    44.084 1  2.815101
                21504 7760 84   218.994 1 3.5999286
                 2489 7760 84   505.494 1  4.109499
                 7938 7760 84   40.3144 1  4.629065
                 4945 7760 84 218.21213 1   3.34945
                 4608 7760 84   110.448 1  .8892611
                 3733 7760 84  83.41125 1 4.1231203
                 9293 7760 84   133.055 1 1.4808687
                 1599 7760 84  365.7795 1  2.812956
                 6753 7760 84   50.8743 1 1.7619166
                 6670 7760 84 143.19463 1 1.4494872
                 6179 7760 84  22.65375 1 4.0960655
                 6288 7760 84 217.22313 1  1.081877
                 5669 7760 84  44.80766 1  5.685085
                 9656 7760 84         . 1  5.221913
                 8852 7760 84  572.1066 1 2.3024843
                 1863 7760 84     8.577 1 2.3949826
                 3863 7760 84   470.296 1  .8693243
                end
                format %d datadate
                format %tq fqdate
                label values industry industry
                label def industry 1 "Consumer Durables, NonDurables, Wholesale, Retail, and Some Services (Laundries, Repair Shops)", modify
                When I use now:
                HTML Code:
                collapse (mean) profit[w=lag_market], by (industry fqdate)
                
                egen panelid=group(industry), label
                
                separate profit, by (industry) veryshortlabel
                I encounter the problem I was trying to describe (which is having missing values for other series each time a series has values)

                Comment


                • #9
                  Yes; if an observation is in one industry, necessarily it cannot be in any other. You've not said why you want the separate variables, but it may be that for some analyses you want a reshape wide. For most Stata purposes, your data are better off in long layout.

                  Comment


                  • #10
                    I want to separate the variables because I need to measure the value weighted profit in each industry separately. I will then combine the resulting time series with another time series data that includes aggregaate investment in each quarter. I want to run a time series regression of aggregate investment on aggregate profit in each industry in order to understand which industry in contributing more to aggregate investment.

                    All what I need is a series of aggregate profit for each industry (as if each series representing a different variable).

                    If reshape is the way to go, how can this be implemented?

                    Comment


                    • #11

                      I need to measure the value weighted profit in each industry separately.
                      Your present data layout (structure, format) qualifies.

                      I will then combine the resulting time series with another time series dataset that includes aggregate investment in each quarter.
                      We need to know the layout of that other dataset to advise how to do it.

                      I want to run a time series regression of aggregate investment on aggregate profit in each industry
                      We need to know what that means precisely but you can run regressions separately for each industry with your present data layout using if to select.

                      Notes: the word format is so overloaded that I try to avoid it here.

                      If reshape is the way to go, how can this be implemented?
                      Well documented. Start with the help.

                      Comment


                      • #12
                        The data set I will merge with is simply a time series of aggregate investment and the same time variable fqdate.The merge will be 1:1 based on fqdate.

                        I am struggling with how to add reshape to Nick's code. Shall I use reshape instead of Nick's code or I reshape the variables after collapse?

                        To clarify, I need a data set similar to this (where the values in ind1 to ind4 represent the value weighted average profit in each industry)
                        fqdate ind1 ind2 ind3 ind4
                        1980q1 0.2 0.3 2.1 2.1
                        1980q2 0.4 0.1 0.3 1.1
                        1980q3 0.1 0.8 2.3 0.4
                        Therefore after merging this dataset with aggregate investment, I will run a regression of investment on each of these series (i.e. each series is a separate variable). For example reg inv ind1 ind2 ind3 ind4, r

                        Comment


                        • #13
                          This really is the easiest kind of reshape wide. If you're totally dependent on people telling you the simplest things, you won't progress with Stata at all. What's the struggle? What code did you try? What happened?

                          Comment


                          • #14
                            I tried the following now:

                            HTML Code:
                            collapse (mean) profit[w=lag_market], by (industry fqdate)
                            
                            egen panelid=group(industry), label
                            
                            reshape wide Tbtm, i(panelid) j(fqdate)
                            However, it did not produce the dataset I am looking for.

                            Do I implement reshape incorrectly?

                            Comment


                            • #15
                              I took your data example from #8 and tried your code from #14, but there is no variable Tbtm. Sorry to seem awkward but as yet I can't reproduce your results and "did not produce the dataset I am looking for" doesn't explain why the results you do get but we can't see are not as desired.

                              Old story this, but understanding how your code produces your results with real(istic) data is still elusive.

                              Comment

                              Working...
                              X