Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Expanding the observations as much as the number of months between start month and last month

    No the Start and End might be in different years. Then simply adding 1 will not work.

    Company Start End
    A 2018m12 2019m02
    B 2016m12 2017m03

    Using stata, from the above I want to make the below data.

    Company Month
    A 2018m12
    A 2019m01
    A 2019m02
    B 2016m12
    B 2017m01
    B 2017m02
    B 2017m03


    Code:
    clear
    input str6 Company Start End
    "A" "2018m12" "2019m02"
    "B" "2016m12" "2017m03"
    end

    ChatGPT's solution is this, but it seems it doesn't use forval correctly, but maybe this provides a hint.


    Code:
    clear
    
    * Create your original dataset
    input str1 Company str7 Start str7 End
    A 2018m12 2019m02
    end
    
    * Extract the year and month components from Start and End
    gen start_year = year(monthly(Start, "Ym"))
    gen start_month = month(monthly(Start, "Ym"))
    gen end_year = year(monthly(End, "Ym"))
    gen end_month = month(monthly(End, "Ym"))
    
    * Initialize a counter variable
    gen counter = 1
    
    * Create an empty dataset to hold the results
    tempfile expanded_data
    save "`expanded_data'"
    
    * Loop through the years and months, and generate Month variable
    forval year = start_year(1)end_year {
        local end_month = 12
        if `year' == `end_year' {
            local end_month = `end_month'
        }
        forval month = start_month(1)`end_month' {
            gen Month = "`year'm0`month'"
            save "`expanded_data'", append
            local counter = `counter' + 1
        }
    }
    
    * Merge the expanded data with the original dataset to get the Company variable
    use "`expanded_data'", clear
    sort Month
    gen counter = _n
    sort Company counter
    drop counter
    merge 1:1 counter Company using original_dataset
    drop counter
    
    * Display the result
    list Company Month

  • #2
    Hi James,

    ChatGPT rarely has anything useful except for running a naive regression or summing some variables together. Some sort of STATA co-pilot would be awesome but alas we shall have to wait.

    From your example, it looks like you're essentially converting your data from wide to long format. Three lines of code is enough:

    Code:
    *** Generate data ***
    clear
    input str1 Company str7 Start str7 End
    "A" "2018m12" "2019m02"
    "B" "2016m12" "2017m03"
    end
    
    *** Generate new variables ***
    encode Company, gen(Company2) // encode to numeric
    gen date1 = monthly(Start,"YM") // gen date variable
    gen date2 = monthly(End,"YM")
        format date1 date2 %tm
    
    reshape long date, i(Company2) j(tmp) // wide to long
    
    tsset Company2 date // declare as timeseries with Company2 as id
    tsfill // fill gaps in timeseries
    
    drop Company Start End tmp
    First is just converting your string variables to something Stata can interpret. I named the date variables date1 and date2 so reshape can easily transform these to new rows, tsset as timeseries with "date" as your time variables, fill gaps with tsfill.

    Alternatively, but messier, generate your own long format with expand:

    Code:
    gen datediff = date2 - date1 // difference in months
    expand = datediff
    sort Company date1
    Then bysort Company (Start): Start = Start +1 or something, but haven't thought this one through as I think my first solution should be sufficient.

    Best,
    Wei

    Comment


    • #3
      The problem here seems to start with your data example, which won't work as desired: your input statement must flag that the dates are string variables and further that they are at least str7.

      Once that is fixed, the task is standard and can be achieved more directly.

      Code:
      clear
      input str6 Company str7 (Start End) 
      "A" "2018m12" "2019m02"
      "B" "2016m12" "2017m03"
      end
      
      foreach v in Start End {
          
          gen `v'_Date = monthly(`v', "YM")
          
      }
      
      expand End_Date-Start_Date+1 
      
      bysort Company : gen Date = Start_Date + _n - 1 
      format *Date %tm 
      
      list, sepby(Company)
          +-------------------------------------------------------------+
           | Company     Start       End   Start_~e   End_Date      Date |
           |-------------------------------------------------------------|
        1. |       A   2018m12   2019m02    2018m12     2019m2   2018m12 |
        2. |       A   2018m12   2019m02    2018m12     2019m2    2019m1 |
        3. |       A   2018m12   2019m02    2018m12     2019m2    2019m2 |
           |-------------------------------------------------------------|
        4. |       B   2016m12   2017m03    2016m12     2017m3   2016m12 |
        5. |       B   2016m12   2017m03    2016m12     2017m3    2017m1 |
        6. |       B   2016m12   2017m03    2016m12     2017m3    2017m2 |
        7. |       B   2016m12   2017m03    2016m12     2017m3    2017m3 |
           +-------------------------------------------------------------+

      Comment


      • #4
        #2 wasn’t visible to me when I was writing.

        Comment


        • #5
          Seems we wrote our answers at the same time. Interestingly, you following through with the expand/bysort. Thank you Nick for teaching me about including expressions straight in expand. Guess it depends on what your goal is James. expand duplicates your observations for _N observations, tsfill creates blanks for all variables except Company and date.

          Comment

          Working...
          X