Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • reshape

    Dear members,
    I have data that i wish to reshape into the long format. This is the code i have run but does seem to give me the desired result. I would wish to have country firm year and then the variables be the columns. Looking for a panel data format.

    // load data
    import excel using Combined.xlsx, firstrow clear

    duplicates tag country_id firm_id var_id, gen(dupl)
    duplicates drop country_id firm_id var_id, force
    egen sid = group(country_id firm_id var_id)
    drop if sid == .

    *clean variable name
    replace VARIABLE = subinstr(VARIABLE," ","",.)
    replace VARIABLE = subinstr(VARIABLE,"-","_",.)
    replace VARIABLE = substr(VARIABLE,1,20)
    rename VARIABLE variable

    *convert all variables in yr2000 to yr2022 to similar formats (strings. Use destring to convert to numeric)
    forvalues i = 2005/2021 {
    tostring yr`i', replace force
    }
    reshape long yr, i(sid) j(year)
    drop sid

    egen xid = group(country_id firm_id year)
    *duplicates drop xid, force
    reshape wide variable, i(firm_id) j(variable) string
    Attached Files

  • #2
    In the future, please do not post attachments; post an extract of the data using dataex instead. See the Statalist FAQ for this and other advice on how best to participate in this forum.

    That said, here is some code that might work for you:

    Code:
    import excel using Combined.xlsx, firstrow clear
    reshape long yr, i(country_id Country firm_id CompanyName var_id VARIABLE Code) j(year)
    rename yr v_
    
    drop Code var_id
    reshape wide v_, i(country_id Country firm_id CompanyName year) j(VARIABLE) string
    rename v_* *
    We do end up losing the information in var_id and Code. Are either of these critical in some form?

    Comment


    • #3
      In running the reshape long it returns error: variable id does not uniquely identify the observations

      Comment


      • #4
        There is no variable id in my code. My code was tested to run on the excel dataset you provided. There seems to be some difference caused by the difference in datasets and/or your adaptation of my code.

        Comment


        • #5
          Thanks Kumar it worked. Although now my columns are red- i guess because during reshape wide we put string. How can i change to numbers that i can work with?

          Comment


          • #6
            See
            Code:
            help destring

            Comment


            • #7
              Rogers Ochenge again, you may need to look carefully for differences between the dataset you supplied and the one you are using, and/or your adaptation of my code. When I use my code on the dataset you provided, all variables are numeric, other than Country and CompanyName, which were strings to start with. The issue is not with the reshape command -- I used the option string in that command only because VARIABLE is a string. It does not cause the data originally stored in the yr* columns to become string (in the data you provided, these are all read in by Stata as numeric).

              So there are two approaches to solving the problem: (i) look for and fix differences in your data and/or your adaptation of my code; and/or (ii) use the command destring to convert variables that have numbers stored as strings, as suggested in #6 above.

              Comment

              Working...
              X