Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshape long format

    Hi everyone,

    The whole file is attached below
    How can I calculate the average value for DM Age colume of each company, the proportion of females and reshape wide to long format?
    I tried to attach id for each company by using the command: egen company_id = group (BvD ID number) but the order of companies then changed -> is it ok?
    I use the command bysort companyname: egen average_age = mean(Age) after encode the variable DMAge but it doesn't work.

    Thank you
    Company name BvD ID number Operating revenue (Turnover)
    th EUR
    2015
    Operating revenue (Turnover)
    th EUR
    2016
    Operating revenue (Turnover)
    th EUR
    2017
    Number of directors & managers DM
    Age
    DM
    Gender
    1. NESTLÉ S.A. CHCHE105909036 82,665,557 84,259,512 77,466,693 163 77 M
    NESTLÉ S.A. CHCHE105909036 82,665,557 84,259,512 77,466,693 163 67 M
    NESTLÉ S.A. CHCHE105909036 82,665,557 84,259,512 77,466,693 163 67 M
    NESTLÉ S.A. CHCHE105909036 82,665,557 84,259,512 77,466,693 163 67 M
    NESTLÉ S.A. CHCHE105909036 82,665,557 84,259,512 77,466,693 163 67 M
    NESTLÉ S.A. CHCHE105909036 82,665,557 84,259,512 77,466,693 163 68 M
    NESTLÉ S.A. CHCHE105909036 82,665,557 84,259,512 77,466,693 163 68 M
    NESTLÉ S.A. CHCHE105909036 82,665,557 84,259,512 77,466,693 163 68 M
    NESTLÉ S.A. CHCHE105909036 82,665,557 84,259,512 77,466,693 163 68 M
    NESTLÉ S.A. CHCHE105909036 82,665,557 84,259,512 77,466,693 163 67 M
    ROCHE HOLDING AG CHCHE101602521 46,665,159 49,148,191 47,659,316 80 63 M
    ROCHE HOLDING AG CHCHE101602521 46,665,159 49,148,191 47,659,316 80 63 M
    ROCHE HOLDING AG CHCHE101602521 46,665,159 49,148,191 47,659,316 80 63 M
    ROCHE HOLDING AG CHCHE101602521 46,665,159 49,148,191 47,659,316 80 63 M
    ROCHE HOLDING AG CHCHE101602521 46,665,159 49,148,191 47,659,316 80 60 F
    ROCHE HOLDING AG CHCHE101602521 46,665,159 49,148,191 47,659,316 80 63 M
    ROCHE HOLDING AG CHCHE101602521 46,665,159 49,148,191 47,659,316 80 M
    ROCHE HOLDING AG CHCHE101602521 46,665,159 49,148,191 47,659,316 80 61 M
    3. NOVARTIS AG CHCHE103867266 48,163,868 48,726,896 43,445,359 140 66 M
    NOVARTIS AG CHCHE103867266 48,163,868 48,726,896 43,445,359 140 66 M
    NOVARTIS AG CHCHE103867266 48,163,868 48,726,896 43,445,359 140 70 M
    NOVARTIS AG CHCHE103867266 48,163,868 48,726,896 43,445,359 140 64 F
    NOVARTIS AG CHCHE103867266 48,163,868 48,726,896 43,445,359 140 71 F
    4. SOCAR TRADING SA CHCHE113990112 n.a. n.a. n.a. 13 56 M
    SOCAR TRADING SA CHCHE113990112 n.a. n.a. n.a. 13 44 M
    SOCAR TRADING SA CHCHE113990112 n.a. n.a. n.a. 13 44 F
    SOCAR TRADING SA CHCHE113990112 n.a. n.a. n.a. 13 62 M
    SOCAR TRADING SA CHCHE113990112 n.a. n.a. n.a. 13 62 M
    5. COOP-GRUPPE GENOSSENSCHAFT CHCHE109029938 24,934,747 26,445,305 24,970,144 71 66 M
    COOP-GRUPPE GENOSSENSCHAFT CHCHE109029938 24,934,747 26,445,305 24,970,144 71 57 M
    COOP-GRUPPE GENOSSENSCHAFT CHCHE109029938 24,934,747 26,445,305 24,970,144 71 57 M
    COOP-GRUPPE GENOSSENSCHAFT CHCHE109029938 24,934,747 26,445,305 24,970,144 71 57 M
    Attached Files

  • #2
    I use the command bysort companyname: egen average_age = mean(Age) after encode the variable DMAge but it doesn't work.
    -encoding- this variable is definitely the wrong thing to do. Either it will give you an error message (if DMAge is already numeric within Stata) or it will give you incorrect results (if it was for some reason a string variable in Stata). For future reference, never use -encode- on a variable that, to human eyes, looks like a number. Such a variable, if Stata has it as a string variable, should be converted to numeric using -destring-, not -encode-. The purpose of -encode- is entirely different; I won't elaborate on that here.

    However, assuming that age variable is really numeric, -bysort companyname: egen average_age = mean(DMAge)- will give you the mean age. If it is actually a string variable, then run -destring DMAge, replace- first.

    To get the proportion of females: -bysort companyname: egen proportion_female = mean(DM_Gender == "F")-

    If you need more specific advice you need to post back. Do not attach an Excel file. Many of us won't download any attachments from people we don't know, especially not Microsoft Office documents, which can contain active malware. Do not, in fact, attach anything. First import your data into Stata. Then use the -dataex- command to show example data. It is the Stata data that is critical to providing you with code and advice that will actually work. How the data looks in Excel does not provide important information, such as whether a variable like DMAge is a string or numeric.

    If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Hello Clyde,

      Thank you for your help. My stata version is 13 and I run -help dataex- to find instructions, but can't take an example data.
      I ran this command -egen firm_id = group (BvD ID number)- but then the firm_ id started with 1866 instead of 1 -> Is it ok ?




      Comment


      • #4
        -dataex- is not part of Stata version 13. But you can get it by running -ssc install dataex-. Then, open your Stata data file (or import your Excel file into Stata) and then run -dataex-. Copy the output you get from the Results window into the Forum editor. That will give the first 100 observations in your data set as an example.

        I don't know what to make of what you are reporting about using -egen firm_id = group(BVD ID number)-. That cannot be the exact code you ran, unless your data set has three separate variables named BVD, ID, and number. You seem to have a single variable corresponding, I think, to column B of your spreadsheet. When imported into Stata its name would be run together as BvDIDnumber.

        Also, when responding, please don't say that something "didn't work." Explain exactly what went wrong as clearly as you can. Better still, show the actual exact results you got and explain how that differs from what you wanted to get.

        Comment


        • #5
          Below is the example of my data. Could you show me to attach id to companies and convert wide to long format (my data has some missing values for colume Turnover 2015,2016,2017) ? The firm_id starts with 452 instead of 1.

          input str5 A str97 Companyname str14 BvDIDnumber str17(Turnover2015 Turnover2016 Turnover2017) int Numberofdirectorsmanagers str3 DMAge str1 DMGender float firm_id
          "1." "NESTLÉ S.A." "CHCHE105909036" "82665557.48352841" "84259511.5793985" "77466693.2520148" 163 "77" "M" 452
          "" "NESTLÉ S.A." "CHCHE105909036" "82665557.48352841" "84259511.5793985" "77466693.2520148" 163 "67" "M" 452
          "" "NESTLÉ S.A." "CHCHE105909036" "82665557.48352841" "84259511.5793985" "77466693.2520148" 163 "67" "M" 452
          "" "NESTLÉ S.A." "CHCHE105909036" "82665557.48352841" "84259511.5793985" "77466693.2520148" 163 "67" "M" 452
          "" "NESTLÉ S.A." "CHCHE105909036" "82665557.48352841" "84259511.5793985" "77466693.2520148" 163 "67" "M" 452
          "" "NESTLÉ S.A." "CHCHE105909036" "82665557.48352841" "84259511.5793985" "77466693.2520148" 163 "68" "M" 452
          "" "NESTLÉ S.A." "CHCHE105909036" "82665557.48352841" "84259511.5793985" "77466693.2520148" 163 "68" "M" 452
          "" "NESTLÉ S.A." "CHCHE105909036" "82665557.48352841" "84259511.5793985" "77466693.2520148" 163 "68" "M" 452
          "" "NESTLÉ S.A." "CHCHE105909036" "82665557.48352841" "84259511.5793985" "77466693.2520148" 163 "68" "M" 452
          "" "NESTLÉ S.A." "CHCHE105909036" "82665557.48352841" "84259511.5793985" "77466693.2520148" 163 "67" "M" 452
          "2." "ROCHE HOLDING AG" "CHCHE101602521" "46665159.4727371" "49148191.4858677" "47659315.9994572" 80 "61" "M" 134
          "" "ROCHE HOLDING AG" "CHCHE101602521" "46665159.4727371" "49148191.4858677" "47659315.9994572" 80 "61" "M" 134
          "" "ROCHE HOLDING AG" "CHCHE101602521" "46665159.4727371" "49148191.4858677" "47659315.9994572" 80 "63" "M" 134
          "" "ROCHE HOLDING AG" "CHCHE101602521" "46665159.4727371" "49148191.4858677" "47659315.9994572" 80 "63" "M" 134
          "" "ROCHE HOLDING AG" "CHCHE101602521" "46665159.4727371" "49148191.4858677" "47659315.9994572" 80 "63" "M" 134
          "" "ROCHE HOLDING AG" "CHCHE101602521" "46665159.4727371" "49148191.4858677" "47659315.9994572" 80 "60" "F" 134
          "" "ROCHE HOLDING AG" "CHCHE101602521" "46665159.4727371" "49148191.4858677" "47659315.9994572" 80 "60" "F" 134
          "" "ROCHE HOLDING AG" "CHCHE101602521" "46665159.4727371" "49148191.4858677" "47659315.9994572" 80 "67" "M" 134
          "" "ROCHE HOLDING AG" "CHCHE101602521" "46665159.4727371" "49148191.4858677" "47659315.9994572" 80 "67" "M" 134
          "" "ROCHE HOLDING AG" "CHCHE101602521" "46665159.4727371" "49148191.4858677" "47659315.9994572" 80 "65" "M" 134
          "" "ROCHE HOLDING AG" "CHCHE101602521" "46665159.4727371" "49148191.4858677" "47659315.9994572" 80 "65" "M" 134
          "" "ROCHE HOLDING AG" "CHCHE101602521" "46665159.4727371" "49148191.4858677" "47659315.9994572" 80 "64" "M" 134
          "" "ROCHE HOLDING AG" "CHCHE101602521" "46665159.4727371" "49148191.4858677" "47659315.9994572" 80 "64" "M" 134
          "" "ROCHE HOLDING AG" "CHCHE101602521" "46665159.4727371" "49148191.4858677" "47659315.9994572" 80 "38" "M" 134
          "" "ROCHE HOLDING AG" "CHCHE101602521" "46665159.4727371" "49148191.4858677" "47659315.9994572" 80 "55" "F" 134
          "" "ROCHE HOLDING AG" "CHCHE101602521" "46665159.4727371" "49148191.4858677" "47659315.9994572" 80 "55" "F" 134

          Comment


          • #6
            Your data show two values of BvDIDnumber with two values of firm_id
            Code:
            CHCHE105909036  452
            CHCHE101602521  134
            You think that the group numbering should start at 1 with the first observation, but in fact Stata first sorts your data by BvDIDnumber and then assigns group numbers. The fact that firm_id 134 has a lower BvDIDnumber than firm_id 452, even though it appears later in your data, supports this explanation.

            The reason Stata sorts the data is to get every observation with the same BvDIDnumber together; otherwise, for each observation it would potentially have to look at all preceeding observations to see if that BvDIDnumber had appeared earlier.

            Comment


            • #7

              You think that the group numbering should start at 1 with the first observation, but in fact Stata first sorts your data by BvDIDnumber and then assigns group numbers. The fact that firm_id 134 has a lower BvDIDnumber than firm_id 452, even though it appears later in your data, supports this explanation.

              The reason Stata sorts the data is to get every observation with the same BvDIDnumber together; otherwise, for each observation it would potentially have to look at all preceeding observations to see if that BvDIDnumber had appeared earlier.

              It doesn'taffect my results when I convert wide to long data and linear regression,right ? thank you

              Comment


              • #8
                It does not affect your results.

                Comment


                • #9
                  I believe what you want is:
                  Code:
                  egen full_id = concat(firm_id BvDIDnumber), punct(" ")
                  drop A
                  gen long obs_no = _n
                  reshape long Turnover, i(obs_no) j(year)
                  Added: I notice that most of your observations in the example data are exact duplicates of each other. Perhaps in the real data they disagree on other variables not shown? If not, what is the rationale for having so many replicates of the same information?

                  Comment

                  Working...
                  X