Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate dummies with start and end dates, and convert data to long format

    Hi all,

    I have a panel dataset, monthly between 1998m12-2018m12 for 28 countries. Here is an example of the dataset:

    Code:
    input Country startdate enddate 
    "Austria" 1999m2 1999m4 
    "Austria" 2012m7 2012m9 
    "Austria" 2015m1 2015m3 
    "Austria" 2016m8 2016m9 
    "Belgium" 2005m8 2005m10

    I want to create a crisis dummy which takes a value of 1 for all the dates between 'startdate' and 'enddate'. I achieved that, but I struggle to convert these data into a balanced panel in long format. This is what I want to achieve:

    Code:
    Country     date        crisis
    Austria    1998m12      0
    Austria    1999m1       0
    Austria    1999m2       1
    Austria    1999m3       1
    Austria    1999m4       1
    Austria    1999m5       0
    ...          ...        ...
    Austria    2018m12      0
    Belgium    1998m1       0
    Belgium    1999m2       0
    ...          ...       ...
    I need the full period for each country in order to merge these data with others dataset.
    I found useful examples here and there, but it's not exactly what I want to do. I believe that the process involve expand and tsfill, but I don't know precisely what to do with these commands.

    Thanks for your help,

    Pierre



  • #2
    There are several ways to do this. Here is one that is, I think, simplest:

    Code:
    clear
    input str7 country float(startdate enddate)
    "Austria" 469 471
    "Austria" 630 632
    "Austria" 660 662
    "Austria" 679 680
    "Belgium" 547 549
    end
    format %tm startdate
    format %tm enddate
    encode country, gen(n_country)
    
    tempfile crisis_periods
    save `crisis_periods'
    
    //  CREATE A FULL FRAME OF COUNTRY-DATE PAIRS
    //  STARTING FROM ITS "BORDERS"\
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str7 country float date
    "Austria" 456
    "Austria" 551
    "Belgium" 456
    "Belgium" 551
    end
    format date %tm
    
    encode country, gen(n_country)
    xtset n_country date, monthly
    tsfill
    
    joinby n_country using `crisis_periods'
    gen byte crisis = inrange(date, startdate, enddate)
    collapse (max) crisis, by(n_country date)
    So this code starts with a simple "frame" of country-date pairs: be sure to include each country, and pick the earliest and latest dates you need. Note that to use -tsfill- (and frankly for almost anything you will do with this data later), you need a numeric variable for country, hence the -encode- command. Then it is a few lines of code to match up the two data sets and mark the crisis months.

    In the future please use real -dataex- output to show examples, as I have done in this response. What you show is either destructively edited, or perhaps an attempt to make something that looks like -dataex- output. Either way, what you showed as example data could not be rapidly imported to Stata the way -dataex- output could. In fact, it took me far longer to wrestle your data into Stata than to write and test the code that solves your problem.

    If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.



    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      Please always use dataex to post data examples (as I do below). See more in the FAQ in how and why: https://www.statalist.org/forums/help#stata
      The code suggested assumes your monthly variables startdate and enddate are proper Stata dates:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str8 Country float(startdate enddate)
      "Austria" 469 471
      "Austria" 630 632
      "Austria" 660 662
      "Austria" 679 680
      "Belgium" 547 549
      end
      format %tm startdate
      format %tm enddate
      
      * Code
      bysort Country: gen n=_n
      reshape wide startdate enddate, i(Country) j(n)
      
      expand 241
      bysort Country: gen month = monthly("1998m11", "YM")+_n
      format %tm month
      gen crisis = 0
      order Country month crisis
      
      foreach m of numlist 1(1)4{
      replace crisis =1 if month>=startdate`m' & month<=enddate`m'
      }

      Comment


      • #4
        Thank you Clyde and Jorrit for your answers, it works.

        Sorry for not using the -dataex- command, I should have known.

        Comment

        Working...
        X