Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to expand the columns of a dataset based on values a third column.

    Dear statalists,



    I am having trouble trying to expand my dataset, which contains some financial information as per the data below. The original dataset spans the years 2015-2020 and contains more than 15 values in financial_id. Below is a mock dataset with 2 firms over 3 months and 7 financial_id values.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str8 firmid int year byte month str14 financial_id long financial_amount
    "CE284001" 2015 1 "revenue"        150000
    "CE284001" 2015 1 "asset"           35000
    "CE284001" 2015 1 "wages"           29000
    "CE284001" 2015 1 "inputs"          15000
    "CE284001" 2015 1 "purchases"       53000
    "CE284001" 2015 1 "taxes"           53000
    "CE284001" 2015 1 "other_expenses"  19000
    "CE284001" 2015 2 "revenue"         90000
    "CE284001" 2015 2 "asset"           35000
    "CE284001" 2015 2 "wages"           29000
    "CE284001" 2015 2 "inputs"          10000
    "CE284001" 2015 2 "purchases"       35000
    "CE284001" 2015 2 "taxes"           18000
    "CE284001" 2015 2 "other_expenses"   5000
    "CE284001" 2015 3 "revenue"        175000
    "CE284001" 2015 3 "asset"           67000
    "CE284001" 2015 3 "wages"           29000
    "CE284001" 2015 3 "inputs"          17000
    "CE284001" 2015 3 "purchases"       72000
    "CE284001" 2015 3 "taxes"           31000
    "CE284001" 2015 3 "other_expenses"  22000
    "CE284002" 2015 1 "revenue"        230000
    "CE284002" 2015 1 "asset"               0
    "CE284002" 2015 1 "wages"            5000
    "CE284002" 2015 1 "inputs"          48000
    "CE284002" 2015 1 "purchases"       72000
    "CE284002" 2015 1 "taxes"           75000
    "CE284002" 2015 1 "other_expenses"   9000
    "CE284002" 2015 2 "revenue"        110000
    "CE284002" 2015 2 "asset"               0
    "CE284002" 2015 2 "wages"            3000
    "CE284002" 2015 2 "inputs"          25000
    "CE284002" 2015 2 "purchases"       33000
    "CE284002" 2015 2 "taxes"           21000
    "CE284002" 2015 2 "other_expenses"   2000
    "CE284002" 2015 3 "revenue"        250000
    "CE284002" 2015 3 "asset"               0
    "CE284002" 2015 3 "wages"            4000
    "CE284002" 2015 3 "inputs"          35000
    "CE284002" 2015 3 "purchases"       35000
    "CE284002" 2015 3 "taxes"           53000
    "CE284002" 2015 3 "other_expenses"  22000
    end

    I would like to use the financial indicators (revenue, asset, wages, taxes, inputs, ...) in financial_id as outcomes in my analysis. So, I should put them into columns as per the table below. Can anyone help me on this?


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str8 firmid int year byte month long(revenue asset) int(wages inputs) long(purchases taxes) int other_expenses
    "CE284001" 2015 1 150000 35000 29000 15000 53000 53000 19000
    "CE284001" 2015 2      .     .     .     .     .     .     .
    "CE284001" 2015 3      .     .     .     .     .     .     .
    "CE284008" 2015 1      .     .     .     .     .     .     .
    "CE284008" 2015 2      .     .     .     .     .     .     .
    "CE284008" 2015 3      .     .     .     .     .     .     .
    end

    I have tried to save multiple sub-datasets for each value of the financial_id using the following code:
    Code:
    import delimited  "mock_data.csv"
    keep if financial_id == "revenue"
    save "mock_data_revenue.dta", replace
     
    import delimited  "mock_data.csv"
    keep if financial_id == "asset"
    save "mock_data_asset.dta", replace
    
    import delimited  "mock_data.csv"
    keep if financial_id == "wages"
    save "mock_data_wages.dta", replace
    
    import delimited  "mock_data.csv"
    keep if financial_id == "inputs"
    save "mock_data_inputs.dta", replace
    
    import delimited  "mock_data.csv"
    keep if financial_id == "purchases"
    save "mock_data_purchases.dta", replace
    
    import delimited  "mock_data.csv"
    keep if financial_id == "taxes"
    save "mock_data_taxes.dta", replace
    
    import delimited  "mock_data.csv"
    keep if financial_id == "other_expenses"
    save "mock_data_other_expenses.dta", replace
    and then merge them back altogether. But it seems inefficient since I have multiple years to work with.
    ​​​​​​​

    I really appreciate your time,

  • #2
    Code:
    reshape wide financial_amount, i(firmid year month) j(financial_id) string
    rename financial_amount* *
    Res.:

    Code:
    . l, sepby(firmid)
    
         +------------------------------------------------------------------------------------------+
         |   firmid   year   month   asset   inputs   other_~s   purcha~s   revenue   taxes   wages |
         |------------------------------------------------------------------------------------------|
      1. | CE284001   2015       1   35000    15000      19000      53000    150000   53000   29000 |
      2. | CE284001   2015       2   35000    10000       5000      35000     90000   18000   29000 |
      3. | CE284001   2015       3   67000    17000      22000      72000    175000   31000   29000 |
         |------------------------------------------------------------------------------------------|
      4. | CE284002   2015       1       0    48000       9000      72000    230000   75000    5000 |
      5. | CE284002   2015       2       0    25000       2000      33000    110000   21000    3000 |
      6. | CE284002   2015       3       0    35000      22000      35000    250000   53000    4000 |
         +------------------------------------------------------------------------------------------+

    Comment


    • #3
      Thank you very much Andrew. It worked indeed. I dealt with duplicates issues beforehand. Everything works now. Many thanks!

      Comment

      Working...
      X