Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with reshaping and assigning column names and labels

    For the below sample dataset, I would like to reshape the data to wide format such that there are three columns: date, C010000001 and C010010001. The last two columns will take values from the "value" column during the reshaping process. I further need to assign labels for C010000001 and C010010001 columns using the corresponding description values. So, the column "C010000001" should have the label "U.S. Reformulated Gasoline All Sales/Deliveries by Prime Supplier (Thousand Gallons per Day)" and "C010010001" the label "East Coast (PADD 1) Reformulated Gasoline All Sales/Deliveries by Prime Supplier (Thousand Gallons per Day)". Any help is appreciated.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str107(description series) double value float date
    "U.S. Reformulated Gasoline All Sales/Deliveries by Prime Supplier (Thousand Gallons per Day)"                "C010000001"   111901 746
    "East Coast (PADD 1) Reformulated Gasoline All Sales/Deliveries by Prime Supplier (Thousand Gallons per Day)" "C010010001"  42033.9 746
    "U.S. Reformulated Gasoline All Sales/Deliveries by Prime Supplier (Thousand Gallons per Day)"                "C010000001" 111079.1 745
    "East Coast (PADD 1) Reformulated Gasoline All Sales/Deliveries by Prime Supplier (Thousand Gallons per Day)" "C010010001"  42331.7 745
    "East Coast (PADD 1) Reformulated Gasoline All Sales/Deliveries by Prime Supplier (Thousand Gallons per Day)" "C010010001"  39600.2 744
    "U.S. Reformulated Gasoline All Sales/Deliveries by Prime Supplier (Thousand Gallons per Day)"                "C010000001"   104018 744
    end
    format %tm date


  • #2
    Variable labels are up to 80 characters. Here is a way considering this restriction:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str107(description series) double value float date
    "U.S. Reformulated Gasoline All Sales/Deliveries by Prime Supplier (Thousand Gallons per Day)"                "C010000001"   111901 746
    "East Coast (PADD 1) Reformulated Gasoline All Sales/Deliveries by Prime Supplier (Thousand Gallons per Day)" "C010010001"  42033.9 746
    "U.S. Reformulated Gasoline All Sales/Deliveries by Prime Supplier (Thousand Gallons per Day)"                "C010000001" 111079.1 745
    "East Coast (PADD 1) Reformulated Gasoline All Sales/Deliveries by Prime Supplier (Thousand Gallons per Day)" "C010010001"  42331.7 745
    "East Coast (PADD 1) Reformulated Gasoline All Sales/Deliveries by Prime Supplier (Thousand Gallons per Day)" "C010010001"  39600.2 744
    "U.S. Reformulated Gasoline All Sales/Deliveries by Prime Supplier (Thousand Gallons per Day)"                "C010000001"   104018 744
    end
    format %tm date
    
    frame put description series, into(labels)
    drop description
    reshape wide value, i(date) j(series) string
    rename value* *
    gen lab=_n
    frame labels{
        contract desc series, nomiss
        gen lab=_n
    }
    frlink 1:1 lab, frame(labels)
    frget desc series, from(label)
    levelsof label, local(labs)
    foreach lab of local labs{
        lab var `=series[`lab']' "`=description[`lab']'"
    }
    drop lab labels description series
    frame drop labels
    Res.:

    Code:
    . l, sep(0)
    
         +------------------------------+
         |   date   C01000~1   C01001~1 |
         |------------------------------|
      1. | 2022m1     104018    39600.2 |
      2. | 2022m2   111079.1    42331.7 |
      3. | 2022m3     111901    42033.9 |
         +------------------------------+
    
    . desc
    
    Contains data
     Observations:             3                  
        Variables:             3                  
    ------------------------------------------------------------------------------------------------------------------------------
    Variable      Storage   Display    Value
        name         type    format    label      Variable label
    ------------------------------------------------------------------------------------------------------------------------------
    date            float   %tm                   
    C010000001      double  %10.0g                U.S. Reformulated Gasoline All Sales/Deliveries by Prime Supplier (Thousand Gall
    C010010001      double  %10.0g                East Coast (PADD 1) Reformulated Gasoline All Sales/Deliveries by Prime Supplier
    ------------------------------------------------------------------------------------------------------------------------------
    Sorted by: date
         Note: Dataset has changed since last saved.

    Comment


    • #3
      Thank you, Andrew! Frames are new to me but it works well.

      Comment

      Working...
      X