Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Lag string variables

    Hello,

    I lagged (almost) all my variables, and I noticed that my one string variable is now all missing in the lagged variable.

    Is there a way to lag string variables?

    Many thanks,
    Alyssa

  • #2
    To answer this correctly requires understanding the command you tried that did not work. As the Statalist FAQ linked to from the top of the page recommends,

    12.1 What to say about your commands and your problem

    Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!
    So we know what Stata did - created missing values - but we don't know the command you gave. I can offhand think of at least three different commands you might have used, depending on whether you have cross sectional data or a single time series, and whether you used the time series variable list notation or not.

    Comment


    • #3
      Hi William Lisowski

      I did the following:

      Code:
      local vars "Garden_Active_ GardenZip_ r_number_adults_ r_number_children_ GardenType__ Site_Visit_Curr_or_Prior_ Cold_Crop_ Fall_Crop_ Hot_Crop_ Seeds_ Sold_GID_ Size__orig__ Pickups_ MGTP_Curr_Yr_or_Prior_ Total_Classes_ Social_ Volunteer_ UR_Curr_Yr_or_Prior_ SOD_Curr_or_Prior_ KGD_Curr_Or_Prior_ Community_Garden_ Family_Garden_ School_Garden_ Market_Garden_ Yr_Act_Prior_ r2_number_adults_ r2_number_children_ r_Size_in_Acres_ Soil_Test_Curr_or_Prior_ r_Do_you_own_ Size_Cat_";
      xtset Garden_ID Year;
      foreach var in `vars'{;
           gen L`var'= L.`var';
      };
      All of the variables lagged except for Size__orig__. The above command did create a lag variable for size, but all values are missing. I have read elsewhere that Stata has issues with lagging string variables, but I was wondering if there is a workaround

      Comment


      • #4
        So, why is Size__orig__ string at all? This is the real question.

        Code:
        tab Size__orig__
        is a first step here.

        Comment


        • #5
          Thank you. Now that I see you have cross-sectional data, I think the following will start you on your way to what you want.
          Code:
          bysort Garden_ID (Year): generate str16 LSize__orig__ = Size__orig__[_n-1] if Year[_n-1]==Year-1

          Comment


          • #6
            Hi,

            If you don't find a solution , one solution would be to add a variable period_fwd1 to the variable, then save the important variables + this period variable and then save it as fwd_data , then open it and then merge this files using the key and the period variable so to have your lagged variable

            original data

            Id period expenses
            A 2000m12 red
            A 2001m1 yellow
            A 2001m2 blue
            A 2001m3 green
            A 2001m4 violet
            A 2001m5 red
            A 2001m6 green
            A 2001m7 blue

            adding the variable :


            Id period expenses period_f1
            A 2000m12 red 2001m1
            A 2001m1 yellow 2001m2
            A 2001m2 blue 2001m3
            A 2001m3 green 2001m4
            A 2001m4 violet 2001m5
            A 2001m5 red 2001m6
            A 2001m6 green 2001m7
            A 2001m7 blue 2001m8


            then save it keeping id expenses period_f1


            Id expenses period_f1
            A red 2001m1
            A yellow 2001m2
            A blue 2001m3
            A green 2001m4
            A violet 2001m5
            A red 2001m6
            A green 2001m7
            A blue 2001m8
            then rename period_f1 to period to then merge this dataset to the original one

            I hope this will help you!


            Comment


            • #7
              If a lagged string variable makes sense, you really don't need the approach in #6. #5 gets you there directly.

              Comment


              • #8
                Hi Nick Cox

                Size__orig__ was a free text option and I am recoding it later in my do-file.

                Thanks,
                Alyssa

                Comment


                • #9
                  Previous value makes therefore what sense? For a categorical data analysis?

                  Comment


                  • #10
                    Hi Nick Cox ,

                    I am not sure if I am understanding your question. I am looking at how variables in one year influence the outcome variable in the following year. I am using multillevel longitudinal logistic esgression. William Lisowski I see that you said that I have cross sectional data, and I actually have longitudinal data.

                    Thanks,
                    Alyssa

                    Comment


                    • #11
                      How are you going to use lagged string variables in your analyses?

                      Comment


                      • #12
                        Hi Nick Cox ,

                        Thanks for the clarification. Yes, I am recoding into categorical variables.

                        Many thanks
                        Alyssa

                        Comment


                        • #13
                          Alyssa Beavers I was too hasty, I always mix up longitudinal and cross-sectional. The command is right (but keep[ reading, I think it's inappropriate), despite my incorrect comment.

                          With that said, I do not understand why you don't recode your string variable before lagging it.

                          I don't also don't understand why you are creating lagged variables, when most Stata modeling commands will accept (and prefer) that you use time series variable list notation to include them in the model. That is, this code
                          Code:
                          xtset Garden_ID Year
                          xtreg y L.X
                          is preferred to this code
                          Code:
                          xtset Garden_ID Year
                          generate LX = L.X
                          xtreg y LX
                          because, for example using the L. notation in your model lets postestimation commands know that L.X and X are related, while LX and X will be treated as two unrelated variables.

                          I also note that your variables all seem to end in at least one "_" character, which suggests you reshaped your data from a wide layout to a long layout. You can get rid of the unwanted trailing underscore characters with
                          Code:
                          rename (*_) (*)
                          as described in the output of help rename group.

                          Comment


                          • #14
                            Hi William,

                            Thanks for your response. With regards to your question on why I don't use time series: the command I am using for analysis (gllamm) does not accept time series.
                            Yes, you are right that I could recode before lagging, but it is nice for me to have both the string and categorized data together in the final, lagged dataset so I can see how I categorized each string entry. Also, thanks for letting me know how to get rid of the underscores after my variable names!

                            Many thanks,
                            Alyssa

                            Comment

                            Working...
                            X