Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshaping data from wide to long panel-data

    Good day,

    I recently reshaped my dataset from wide to long into balanced panel dataset. Clyde Schechter has been so gracious in helping me and I have a follow up on my post on Creating a year identifier for pre-post analysis to use for diff-in-diff but focused on the reshaping aspect of the response.

    I am using panel-data (balanced data) with unit of analysis is the county level. Variables have observations in years from 2008-2018 but my period of interest is 2011-2017. I reshaped the data from wide too long. Some of the variables include observations reporting data for each year (example # of FQHCs reported for each year 2011-2018) and some with data for 5-year estimates (for example: veteran and non-veteran education level 2012-2016, population by race and gender 2011-2017). I also have variables that reflect count/percentage/total# of observations over a period of time (example number of black females).

    Here is the dataset before the reshaped:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long povertystat12 byte ruralclinic10 long vet_hs12 int vet_nohs12
     54598 0  4778 254
    196640 0 18324 942
     23641 1  1532 208
     20603 2  1111 159
     57099 2  3933 527
     10154 1   351  55
     19977 2  1172 160
    112690 0  9421 814
     33503 0  2028 304
     25465 1  1923 303
     43301 0  2682 288
     13091 1   819 107
     24448 5  1125 150
     13106 0   890 197
     14726 0   602 136
     50255 0  5786 398
     53910 0  3740 429
     12622 0   728  92
     10497 1   810  71
     37016 2  2729 308
     13662 1   786 111
     80126 1  4900 604
     48338 0  5918 309
     40895 0  1930 196
     70108 2  3308 399
    end
    label var povertystat12 "# Pers w/Pov Status Determined 2012-16" 
    label var ruralclinic10 "# Rural Health Clinics 2010" 
    label var vet_hs12 "Veterans 25+ w/HS Dipl or more 2012-16" 
    label var vet_nohs12 "Veterans 25+ w/< HS Diploma 2012-16"
    When running a tabstat of the data looking at the total number of persons with veteran education

    Code:
     tabstat vet_hs12, stat(sum) format(%14.0fc) c(v)
    
       stats |  vet_hs12
    ---------+----------
         sum |    18,018,157
    --------------------
    After reshaping the data, the data looks like this:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte year int vetedu_hs byte ruralclinic long vetedu_hsplus
    12  132  0   981
    12   59  0   585
    12  197  0  1608
    12    1  1   159
    12   16  2   314
    12    7  0    52
    12  311  0  2752
    12  840  0 12952
    12  157  0  1208
    12  143  0  1498
    12    2  0    48
    12   87  3  1200
    12 2140  0 29503
    12   78  0   341
    end
    label var year "Reshaped Year variable panel data" 
    label var vetedu_hs "Veterans 25+ w/< HS Diploma 2012-16" 
    label var ruralclinic "# of rural health clinics 2010-17" 
    label var vetedu_hsplus "Veterans 25+ w/HS Diploma+ 2012-16"
    Here is the results of sum of variable vetedu_hs by year - which reflects the number of veteran persons with high school education by year. What I know is wrong is that the original variable noted that the vet_hs12 was the number of veterans with high school diploma 2012-2016 but when reshaped it only populated in year 2012

    Code:
    . tabstat vetedu_hs, by (year) stat(sum)
    Summary for variables: vetedu_hs
         by categories of: year (Reshaped Year variable panel data)
    
        year |       sum
    ---------+----------
          11 |         0
          12 |   1328412
          13 |         0
          14 |         0
          15 |         0
          16 |         0
          17 |         0
          18 |         0
    ---------+----------
       Total |   1328412
    --------------------

    It is clear something is missing. I am befuddled. I consider I need to do some loops of the reshaped and populate across a range of time. I talked about this Creating a categorical variable from multiple numeric variables in post #4, however I do not think this is correct approach. Should I have not included the -vet_hs12- variable in the reshaped but I am not sure how I would have been able to build my model if i did not capture the fact that in my panel data I have 5-year estimate data?

    Hope this is clear.

    Thanks

    Rene
    Stata 12 on MAC OS (but also have access to Stata 15 on Windows)

  • #2
    Any additional information I can provide that may help clarify my question?

    Comment

    Working...
    X