Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Time Trend Variable

    Hello All,

    I am preparing my data to run an interrupted time series.
    I have to create a time trend variable.

    My data has a variable called month in the format 201401 (2014 January)

    Every household ID (hh_id) has 4 entries for each month (hence there are four 201401 entries)

    I attempted to use tsset 'month' however I get the 'repeated time variables in sample' message

    I was wondering how I can create a time trend variable which has just one unique entry for each month ?

    Please let me know if there is other information you might need

    Thanks

  • #2
    Well, your month variable is not going to be usable for your purposes. You will need to convert it to a Stata internal format monthly date variable. The exact code for doing this will depend on whether month is stored as a string or as a numeric variable.

    More generally, without understanding why there are four observations for each month, it isn't possible to advise you on how to proceed. What do the four observations represent?

    In the end, to get a concrete and helpful response, I think you need to post an example of your data, using the -dataex- command to do so, and you need to explain what the four entries per month are about.

    If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.




    Comment


    • #3
      Hi Clyde,

      Thanks for the advice- I'm fairly new to the forum.

      So my data comprises of many households who have recorded a monthly volume of food from four different food groups.

      Each household has 4 observations (or volumes) recorded per month (of the 4 different food groups) and each of these observations have a corresponding monthly date (in my sample, it is in the format 201401 etc)

      Each household has recorded the volume purchased for 4 years or more so there is quite a large amount of data per household, and it is an unbalanced panel as some households were recruited after the start of the study or left prior to the end.

      I want to conduct a ITSA looking at an intervention which occurred approximately midway during the study (month 201510)

      So I am a bit confused how to create the appropriate time trend variable, given that there are multiple entries per month for the same household.

      Below is an example of the data: apologies for the huge amount of decimal points in volume.

      So my question is how to create a time variable that can be used for this panel.

      When I tried the tsset option, stata told me that I had repeated variables




      ----------------------- copy starting from the next line -----------------------
      [CODE]
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float hh_id long month float foodgr double volume
      1 201401 4 1.999442135903211
      1 201401 2 .7829436004855133
      1 201401 1 0
      1 201401 3 .22998967434827636
      1 201402 2 1.0236987562129212
      1 201402 4 2.1374358996784872
      1 201402 3 .24466986694852635
      1 201402 1 2.665922847870948
      1 201403 4 .9992317345177634
      1 201403 3 .24466986694852635
      end

      Thankyou


      Comment


      • #4
        So first there is the matter of getting a useful month variable:

        Code:
        gen byte mm = mod(month, 100)
        gen int yy = floor(month/100)
        gen sif_month = ym(yy, mm)
        format sif_month %tm
        assert missing(month) == missing(sif_month)
        Next there is the question of how to avail yourself of time-series operators and commands with this data. I see two ways you can proceed, and it's not clear to me which is more appropriate for you.

        The first approach is to convert the data to one observation per hh_id per month, with separate variables for the volumes of the four food groups.

        Code:
        // APPROACH 1
        reshape wide volume, i(hh_id sif_month) j(foodgr)
        tsset hh_id sif_month
        The other approach is to treat each food group as representing a separate time series:

        Code:
        //  APPROACH 2
        egen series = group(hh_id foodgr)
        tsset series sif_month
        Which of these is better really depends on what other analyses you have in mind going forward, so I cannot really advise you.



        Comment


        • #5
          thankyou Clyde, I'll try these approches

          Comment

          Working...
          X