Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create a revenue-weighted sum variable from site-level hightech use

    Hello,

    I am trying to create a weighted sum score variable – i.e., revenue-weighted sum of site-level hightech use (firm_hightech_use). Here, sites refer to a firm’s subsidiaries. site_hightech_use is a binary variable, which refers to whether a site is using hightech (1: yes, 0: no). I want to create a new firm-level variable that aggregates each site’s value of site_hightech_use. For example, for firm_id E01, the firm_hightech_use = (1564/(1564+837+706)*1) + (837/(1564+837+706))*0+(706/(/(1564+837+706))*1 = 0.731

    Some sites have missing values on site_hightech_use or site_revenue. Some firms have more than 5 sites and some may have just one site. In total, the sample includes two years and more than 1,000 sites.

    I have a hard time creating this new variable firm_hightech_use. I was wondering if anyone can help me? Below is my sample dataset.

    Code:
    clear
    
    input firm_id site_id year site_revenue site_employees site_hightech_use firm_hightech_use
    E01    1030    2015    1564    200    1    
    E01    1031    2015    837    77    0    
    E01    1032    2015    706    73    1    
    E02    1040    2015    3108    750    1    
    E02    1041    2015    2976    650    1    
    E03    1050    2015    23    19    1    
    E04    1060    2015    28593    1500    1    
    E01    1030    2014    1309    200    1    
    E01    1031    2014    910    77    0    
    E01    1032    2014    688    73    0    
    E02    1040    2014    3303    750    1    
    E02    1041    2014    2711    650    1    
    E03    1050    2014    7    19    1    
    E04    1060    2014    13    1500    0    
    end
    Many thanks,
    David

  • #2
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str3 firm_id int(site_id year site_revenue site_employees) byte site_hightech_use
    "E01" 1030 2015  1564  200 1
    "E01" 1031 2015   837   77 0
    "E01" 1032 2015   706   73 1
    "E02" 1040 2015  3108  750 1
    "E02" 1041 2015  2976  650 1
    "E03" 1050 2015    23   19 1
    "E04" 1060 2015 28593 1500 1
    "E01" 1030 2014  1309  200 1
    "E01" 1031 2014   910   77 0
    "E01" 1032 2014   688   73 0
    "E02" 1040 2014  3303  750 1
    "E02" 1041 2014  2711  650 1
    "E03" 1050 2014     7   19 1
    "E04" 1060 2014    13 1500 0
    end
    
    by firm_id year, sort: egen denominator = total(site_revenue)
    by firm_id year: egen firm_hightech_use = total(site_revenue*site_hightech_use/denominator)
    In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      Thank you very much, Clyde! I tried the codes you provided, but I kept getting "type mismatch r(109);". I double-checked the codes a few times to ensure there are no typos there. Is that because some sites have missing values on firm_id? Another question I have is that: is the ", sort" required? I am not sure why the first statement has ", sort" and the second does not? I really get stuck on the questions.

      Thank you, Clyde.


      Best,
      David

      Comment


      • #4
        The type mismatch message means that Stata is finding a string where something numeric is expected, or vice versa. The code ran correctly with my example data. My example data, which was not a direct copy of yours (because yours could not be copied directly into Stata -- you did not use -dataex-) has site_revenue and site_hightech_use as numeric variables. I'm' inferring that your actual data has one or both of those as string. Run -des site_revenue site_hightech_use- and I'm sure you will one or both is stored as a string. (Look in the storage type column: if it begins with str, it's a string.) The solution is to convert the variable(s) to numeric. See -help destring- for instructions on how to do that. Had you posted your example using -dataex-, the data would have come to me with that variable (those variables) as string, and I would have included conversion code in my solution. This is why it is so crucial to always use -dataex- to show example data.

        The -, sort- is required in the first command. Whenever you run a command prefixed with -by-, the data must be sorted in the order prescribed in the -by- prefix. If the data are not already in that sort order, you accomplish that by using the -sort- option in the -by- prefix. The second command does not require another -,sort- because it was already sorted by the first command and nothing in between has broken the sort order.

        Comment


        • #5
          Thank you very much, Clyde! Yes, you are right. Now, I get it.

          Best,
          David

          Comment


          • #6
            Hi Clyde, the variable turned out to be a string rather than numeric value. After converting them to numeric values, your commands work perfectly. Thanks, again!

            Best,
            David

            Comment

            Working...
            X