Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel data with different variables for each year

    Hi everyone,

    I am currently working with panel data and can’t figure out how to relate variables from different years to each other.

    The data comes from surveys in which the same participants got asked different questions each year. So this isn’t a typical time series where you have the same kind of data for each year. Each participant has a unique ID and each observation in the data set represents a single survey. If a question wasn’t asked that year, the observations has a missing for that variable.

    So my Data looks somewhat like this:
    id year voted opinion1 opinion2
    1 2018 1 . .
    1 2019 . 7 .
    1 2020 . . 2
    2 2018 0 . .
    2 2019 . 7 .
    ….

    What I would like to do is to relate the data from different years to each other. For example to show the results of opinion1 (“how strongly do you agree with …”) only for people who voted in the year prior (voted == 1).
    Ideally that would work like this: tab opinion1 if voted == 1
    But that of course doesn’t work because within the observation that holds the data for voted there is no data on opinion1.

    How can I work around this? Do I have to somehow combine observations via their id or is there an easier way?
    thanks in advance!
    Last edited by Peter Grabow; 18 Sep 2022, 13:21.

  • #2
    For your example, using xtset and lagged values will work:

    Code:
    xtset id date
    tab opinion1 if l.voted
    
       opinion1 |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              7 |          1      100.00      100.00
    ------------+-----------------------------------
          Total |          1      100.00

    Comment


    • #3
      Thanks for your fast reply. Would that only work for the dummy variable voted? I would also like to compare different opinions to each other, e.g. sum opinion1 if opinion2 > 3

      Comment


      • #4
        Here are two alternative approaches:

        1) If there is just one observation per variable across the entire time series for each id, you can flatten the data, like so:

        Code:
        collapse (firstnm) voted opinion*, by(id)
        This results in:

        Code:
             +----------------------------------+
             | id   voted   opinion1   opinion2 |
             |----------------------------------|
          1. |  1       1          7          2 |
          2. |  2       0          7          . |
             +----------------------------------+
        Then you can do:

        Code:
        tab opinion1 if voted
        2) If you don't have that many year values, but there is a possibility that a variable could have more than one non-missing observation per id, you could reshape wide:

        Code:
        tostring year, replace
        replace year = "_"+year
        reshape wide voted opinion*, i(id) j(year) string
        This results in:

        Code:
             +-------------------------------------------------------------------------------------------------------------------------------------------+
             | id   voted_2018   opinion1_2018   opinion2_2018   voted_2019   opinion1_2019   opinion2_2019   voted_2020   opinion1_2020   opinion2_2020 |
             |-------------------------------------------------------------------------------------------------------------------------------------------|
          1. |  1            1               .               .            .               7               .            .               .               2 |
          2. |  2            0               .               .            .               7               .            .               .               . |
             +-------------------------------------------------------------------------------------------------------------------------------------------+
        Then you can do:

        Code:
        tab opinion1_2019 if voted_2018
        Last edited by Ali Atia; 18 Sep 2022, 14:07.

        Comment


        • #5
          I only have 3 different years and the condition was true, so I went with option 2. Typed in exactly what you wrote, but got an error message after the reshape line: variable year not found.
          Also I checked the Data Editor and all that it had done was transforming the numeric values for the year variable into a string of underscore and the year.
          I’m a bit confused now.

          Comment


          • #6
            Please paste the output of the commands -describe- and -dataex- into a reply to this post.

            Comment


            • #7
              There are 165 variables in my dataset the output for describe is therefore really long. But non of the variable names have changed. Only thing changed is the format of the year data.

              Comment


              • #8
                Please post the full output: it's very difficult to diagnose the problem without additional information about your data.

                I find it hard to believe that the command "replace year = "_"+year" worked, but that "reshape wide voted opinion*, i(id) j(year) string" returned an error stating that the variable year does not exist immediately afterward.

                Comment


                • #9
                  Exact output was:

                  . tostring year, replace
                  year was int now str4

                  . replace year = “_”+year
                  variable year was str4 now str5
                  (75.571 real changes made)

                  . reshape wide $allvars, i(id) j(year) string
                  (j = _2018 _2019 _2020)
                  variable year not found

                  EDIT: I really don’t know why it says the variable wasn’t found. It’s definitively still there under the same year. Only thing that changed is that it’s data is now called _2018, _2019, _2020.
                  Last edited by Peter Grabow; 18 Sep 2022, 15:34.

                  Comment


                  • #10
                    When you said "Typed in exactly what you wrote, but got an error message after the reshape line: variable year not found," that was incorrect. Your modification to the reshape command likely resulted in the error. I am not sure what is in the global $allvars, but I assume it contains a list of all the variables in the dataset. If that is true, you are including the id and year variables as stub variables in addition to i and j variables, which will cause an error of the type you are seeing.

                    Instead, use the following code after 'replace year = “_”+year':

                    Code:
                    ds id year, not
                    reshape wide `r(varlist)', i(id) j(year) string
                    In future, please post the exact code you used and the output at the outset if you modified anything from the original solution. Often, what seems like a minor alteration can have significant consequences for the code's ability to run without error.
                    Last edited by Ali Atia; 18 Sep 2022, 16:09.

                    Comment


                    • #11
                      That worked now. Thanks a lot!
                      just one more question: is there a way to do this while keeping the labels of the variables? Because the labels that included the questions have now been converted into _year + name of the variable.

                      Comment


                      • #12
                        Code:
                        tostring year, replace
                        levelsof year, local(years)
                        replace year = "_"+year
                        ds id year, not
                        local vlist = r(varlist)
                        foreach v of local vlist{
                            local `v': variable label `v'
                        }
                        reshape wide `vlist', i(id) j(year) string
                        foreach y of local years{
                            foreach v of local vlist{
                                label variable `v'_`y' "``v'' (`y')"
                            }
                        }
                        Output (with made up labels):

                        Code:
                        Contains data
                         Observations:             2                  
                            Variables:            10                  
                        --------------------------------------------------------------------------------------------------------------------------------------------------------------------
                        Variable      Storage   Display    Value
                            name         type    format    label      Variable label
                        --------------------------------------------------------------------------------------------------------------------------------------------------------------------
                        id              byte    %8.0g                 
                        voted_2018      byte    %8.0g                 Voted? (2018)
                        opinion1_2018   byte    %8.0g                 First Opinion (2018)
                        opinion2_2018   byte    %8.0g                 Second Opinion (2018)
                        voted_2019      byte    %8.0g                 Voted? (2019)
                        opinion1_2019   byte    %8.0g                 First Opinion (2019)
                        opinion2_2019   byte    %8.0g                 Second Opinion (2019)
                        voted_2020      byte    %8.0g                 Voted? (2020)
                        opinion1_2020   byte    %8.0g                 First Opinion (2020)
                        opinion2_2020   byte    %8.0g                 Second Opinion (2020)
                        --------------------------------------------------------------------------------------------------------------------------------------------------------------------
                        Sorted by: id
                             Note: Dataset has changed since last saved.
                        Last edited by Ali Atia; 19 Sep 2022, 08:20.

                        Comment


                        • #13
                          This is far more advanced than anything I ever had to do with Stata. But it did everything it was supposed to do. You were very helpful, thank you for taking the time

                          Comment

                          Working...
                          X