How to expand the columns of a dataset based on values a third column.

guyle Smith

Join Date: Feb 2023
Posts: 14

How to expand the columns of a dataset based on values a third column.

18 Sep 2023, 15:34

Dear statalists,

I am having trouble trying to expand my dataset, which contains some financial information as per the data below. The original dataset spans the years 2015-2020 and contains more than 15 values in financial_id. Below is a mock dataset with 2 firms over 3 months and 7 financial_id values.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str8 firmid int year byte month str14 financial_id long financial_amount
"CE284001" 2015 1 "revenue"        150000
"CE284001" 2015 1 "asset"           35000
"CE284001" 2015 1 "wages"           29000
"CE284001" 2015 1 "inputs"          15000
"CE284001" 2015 1 "purchases"       53000
"CE284001" 2015 1 "taxes"           53000
"CE284001" 2015 1 "other_expenses"  19000
"CE284001" 2015 2 "revenue"         90000
"CE284001" 2015 2 "asset"           35000
"CE284001" 2015 2 "wages"           29000
"CE284001" 2015 2 "inputs"          10000
"CE284001" 2015 2 "purchases"       35000
"CE284001" 2015 2 "taxes"           18000
"CE284001" 2015 2 "other_expenses"   5000
"CE284001" 2015 3 "revenue"        175000
"CE284001" 2015 3 "asset"           67000
"CE284001" 2015 3 "wages"           29000
"CE284001" 2015 3 "inputs"          17000
"CE284001" 2015 3 "purchases"       72000
"CE284001" 2015 3 "taxes"           31000
"CE284001" 2015 3 "other_expenses"  22000
"CE284002" 2015 1 "revenue"        230000
"CE284002" 2015 1 "asset"               0
"CE284002" 2015 1 "wages"            5000
"CE284002" 2015 1 "inputs"          48000
"CE284002" 2015 1 "purchases"       72000
"CE284002" 2015 1 "taxes"           75000
"CE284002" 2015 1 "other_expenses"   9000
"CE284002" 2015 2 "revenue"        110000
"CE284002" 2015 2 "asset"               0
"CE284002" 2015 2 "wages"            3000
"CE284002" 2015 2 "inputs"          25000
"CE284002" 2015 2 "purchases"       33000
"CE284002" 2015 2 "taxes"           21000
"CE284002" 2015 2 "other_expenses"   2000
"CE284002" 2015 3 "revenue"        250000
"CE284002" 2015 3 "asset"               0
"CE284002" 2015 3 "wages"            4000
"CE284002" 2015 3 "inputs"          35000
"CE284002" 2015 3 "purchases"       35000
"CE284002" 2015 3 "taxes"           53000
"CE284002" 2015 3 "other_expenses"  22000
end

I would like to use the financial indicators (revenue, asset, wages, taxes, inputs, ...) in financial_id as outcomes in my analysis. So, I should put them into columns as per the table below. Can anyone help me on this?

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str8 firmid int year byte month long(revenue asset) int(wages inputs) long(purchases taxes) int other_expenses
"CE284001" 2015 1 150000 35000 29000 15000 53000 53000 19000
"CE284001" 2015 2      .     .     .     .     .     .     .
"CE284001" 2015 3      .     .     .     .     .     .     .
"CE284008" 2015 1      .     .     .     .     .     .     .
"CE284008" 2015 2      .     .     .     .     .     .     .
"CE284008" 2015 3      .     .     .     .     .     .     .
end

I have tried to save multiple sub-datasets for each value of the financial_id using the following code:

Code:

import delimited  "mock_data.csv"
keep if financial_id == "revenue"
save "mock_data_revenue.dta", replace
 
import delimited  "mock_data.csv"
keep if financial_id == "asset"
save "mock_data_asset.dta", replace

import delimited  "mock_data.csv"
keep if financial_id == "wages"
save "mock_data_wages.dta", replace

import delimited  "mock_data.csv"
keep if financial_id == "inputs"
save "mock_data_inputs.dta", replace

import delimited  "mock_data.csv"
keep if financial_id == "purchases"
save "mock_data_purchases.dta", replace

import delimited  "mock_data.csv"
keep if financial_id == "taxes"
save "mock_data_taxes.dta", replace

import delimited  "mock_data.csv"
keep if financial_id == "other_expenses"
save "mock_data_other_expenses.dta", replace

and then merge them back altogether. But it seems inefficient since I have multiple years to work with.

I really appreciate your time,

Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10214

18 Sep 2023, 17:14

Code:

reshape wide financial_amount, i(firmid year month) j(financial_id) string
rename financial_amount* *

Res.:

Code:

. l, sepby(firmid)

     +------------------------------------------------------------------------------------------+
     |   firmid   year   month   asset   inputs   other_~s   purcha~s   revenue   taxes   wages |
     |------------------------------------------------------------------------------------------|
  1. | CE284001   2015       1   35000    15000      19000      53000    150000   53000   29000 |
  2. | CE284001   2015       2   35000    10000       5000      35000     90000   18000   29000 |
  3. | CE284001   2015       3   67000    17000      22000      72000    175000   31000   29000 |
     |------------------------------------------------------------------------------------------|
  4. | CE284002   2015       1       0    48000       9000      72000    230000   75000    5000 |
  5. | CE284002   2015       2       0    25000       2000      33000    110000   21000    3000 |
  6. | CE284002   2015       3       0    35000      22000      35000    250000   53000    4000 |
     +------------------------------------------------------------------------------------------+

Comment

guyle Smith

Join Date: Feb 2023

Posts: 14
#3

20 Sep 2023, 03:53

Thank you very much Andrew. It worked indeed. I dealt with duplicates issues beforehand. Everything works now. Many thanks!
Comment

Announcement

How to expand the columns of a dataset based on values a third column.

Comment

Comment