Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpolate to end of month instead of mid month data?

    Hello, all my data are for the end of month. But I have one variable where the data is listed mid month instead. 15.xx. I need to format this into end of month data.

    So that if I have for example
    ...
    15.05.1990 4
    15.06.1990 5
    15.07.1990 6
    .....
    i need to make a new series with estimated data such as this:
    31.05.1990 4.5
    30.06.1990 ...
    30.06.1900 ...
    ......

    I know of the ipolate function, but not how to implement it for this. Or should I use a different function?
    Last edited by Basil Levi; 26 Jan 2021, 04:47.

  • #2
    This is a little tricksy but soluble. Note that strictly ipolate is a command, not a function.


    Code:
    clear
    input str10 strdate whatever
    "15.05.1990" 4
    "15.06.1990" 5
    "15.07.1990" 6
    end
    
    gen numdate = daily(strdate, "DMY")
    format numdate %td
    expand 2
    bysort numdate : replace whatever = . if _n == 2
    by numdate : replace numdate = dofm(mofd(numdate) + 1) - 1  if _n == 2
     
    ipolate whatever numdate, gen(wanted) epolate
    
    list
    
         +-----------------------------------------------+
         |    strdate   whatever     numdate      wanted |
         |-----------------------------------------------|
      1. | 15.05.1990          4   15may1990           4 |
      2. | 15.05.1990          .   31may1990    4.516129 |
      3. | 15.06.1990          5   15jun1990           5 |
      4. | 15.06.1990          .   30jun1990         5.5 |
      5. | 15.07.1990          6   15jul1990           6 |
         |-----------------------------------------------|
      6. | 15.07.1990          .   31jul1990   6.5333333 |
         +-----------------------------------------------+
    See also https://www.stata-journal.com/articl...article=dm0100 for getting the last day of the month, although the code above is the nub of the matter namely


    last day of the current month IS first day of the next month MINUS 1

    so extract the current monthly date, add 1, get the first daily date of that next month, and then subtract 1

    meaning that you need never bother with trying to code for different month lengths, leap years etc. Stata knows the calendar well enough for this purpose.

    Comment


    • #3
      You can expand the observations, interpolate and then keep the wanted dates. For a technique on determining end of month dates (implemented below), see https://journals.sagepub.com/doi/ful...36867X19874247

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(date value)
      11092 4
      11123 5
      11153 6
      end
      format date %td
      expand 2, g(new)
      bys date: replace date= cond(new, dofm(ym(year(date), month(date)) +1)-1, date)
      replace value=. if new
      ipolate value date, gen(value2) epolate
      keep if new
      Res.:

      Code:
      . l
      
           +-------------------------------------+
           |      date   value   new      value2 |
           |-------------------------------------|
        1. | 31may1990       .     1    4.516129 |
        2. | 30jun1990       .     1         5.5 |
        3. | 31jul1990       .     1   6.5333333 |
           +-------------------------------------+

      Comment


      • #4
        Thank you, this was very helpful!


        What would be the best way to use the replace value=. command if I have many variables, let's say 10 which I have to do it for? I get "too many variables specified" if I add more in the same command

        Comment


        • #5
          You can loop:

          Code:
          foreach var in var1 var2 var3 ... var10{
              replace `var'=. if new
          }
          foreach var in var1 var2 var3 ... var10{
              ipolate `var' date, gen(`var'2) epolate
          }
          where you replace what is highlighted in blue with the names of your variables.

          Comment


          • #6
            The loops in #5 can be combined.

            Comment


            • #7
              That worked great

              Okey so now I'm working with a lot of variables now and sometimes I need to delete a lot of newly created variables.
              Do stata have some kind of groupvar I can put a set of variables created through loop with?

              lets say I have
              foreach var in var1 var2 (...) var100 {

              gen T_`var' =
              sum T_`var'
              ...
              }
              Now I might want to delete these vars and do some changes(and might not, so I don't want vars to disapear in the same loop)

              I guess I could just painfully list all the variables I want to drop with a foreach loop, but I want to simplify this with grouped var or something ?
              Do I instead need a var to keep track, like the gen(new2) thing and then drop if new?

              Comment


              • #8
                You can hold such information in a local. Here is an example:

                Code:
                sysuse auto, clear
                local tempvars
                foreach var of varlist price-turn{
                    gen T_`var'= 3*`var'
                    local tempvars "`tempvars' T_`var'"
                    sum T_`var'
                }
                di "`tempvars'"
                drop `tempvars'
                Res.:

                Code:
                . di "`tempvars'"
                 T_price T_mpg T_rep78 T_headroom T_trunk T_weight T_length T_turn
                
                .
                . drop `tempvars'


                But if you do not have variables named T_varname in your dataset, apart from the temporary variables, you could simply drop these using prefix+ the wildcard "*".

                Code:
                sysuse auto, clear
                foreach var of varlist price-turn{
                    gen T_`var'= 3*`var'
                    sum T_`var'
                }
                drop T_*
                A third way is to use preserve and restore

                Code:
                sysuse auto, clear
                preserve
                foreach var of varlist price-turn{
                    gen T_`var'= 3*`var'
                    sum T_`var'
                }
                restore
                Last edited by Andrew Musau; 03 Feb 2021, 07:56.

                Comment

                Working...
                X