Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • RESHAPE with different number of variables per "year"

    Dear all,

    I looked for similar queries and couldn't find any, but of course, if you know of other "threads" let me know.
    It's been months that I've been working on the following, I'd be very very grateful if you could help me with this. So here's a brief description of my dataset:
    • It's in wide format: time runs horizontally.
    • KEY: the number of variables for each year CHANGE.
    Now, the goal is to make this dataset LONG as in a panel format: individual // all years // all variables.
    The dataset is renamed decently in the format (example):

    FAMID_1968 INDIVID_1968 // INCOME_1968 NCHILD_1968 EMPSTAT_1968 // INCOME_1969 NCHILD_1969 EMPSTAT_1969 [+N: PENSION_1969...] // ...

    Now there's my problem: the number of columns per year is not constant, and I don't think reshape is built for dealing with this situation. I wouldn't know how to implement (1) with variables emerging (for instance) the first time in 2013.

    If you need more details or the actual scripts just let me know and I'll promptly get back to you.
    Many many thanks in advance,

  • #2
    Have you already tried it? Reshape should work fine, it will just create missing values if a variable is not present for a certain year.

    Edit - example:
    Code:
    clear
    input id a2000 a2001 b2001
    1 1 2 3
    end
    
    reshape long a b, i(id) j(year)
    list, noobs
    
      +-------------------+
      | id   year   a   b |
      |-------------------|
      |  1   2000   1   . |
      |  1   2001   2   3 |
      +-------------------+
    Last edited by Wouter Wakker; 03 Dec 2019, 08:15.

    Comment


    • #3
      Hi Wouter, thanks for your reply

      I have tried right now with the following command, by including with ds all the variables except the id in the command.

      Code:
      ds, not(varl *reshape*)
      reshape long `r(varlist)', i(id) j(year)
      And it presents the usual error, attached:

      Click image for larger version

Name:	dele.JPG
Views:	1
Size:	40.9 KB
ID:	1527333



      I just realised that probably I should use input those variables after "reshape long" but without the suffix #### (year). Am I right?


      Many thanks

      Comment


      • #4
        I just realised that probably I should use input those variables after "reshape long" but without the suffix #### (year). Am I right?
        Correct. That should solve it.

        Comment


        • #5
          It's actually less obvious than I thought. I need to input all my variables without suffix, so let's make a list of them without number at the end.
          I tried to solve my problem with the following trick:

          Code:
          ds, not(varl *reshape*)
          foreach v of var `r(varlist)' {
                  local years = substr(`"`v'"',-4,4)
                  local stubs = subinstr(`"`v'"',"`years'","",1)        
          }
          However, even if it runs, the local "stubs" will contain all my variables (without suffix) N times (as many times as they are present throughout the years). So if I display "stubs", it says:
          Code:
          EDUCOMP ambiguous abbreviation
          That I believe is exactly because of the fact it appears more times (not sure why this specific variable though, there are some other before it).
          Edit: I have also tried macrolists as in:

          Code:
          local try : list uniq stubs
          But when I display try (with d `try') it gives me the same error

          Do you have suggestions?
          Last edited by Francesco Amodeo; 03 Dec 2019, 10:01.

          Comment


          • #6
            I don't work often with list manipulation in Stata, so maybe there is a more efficient way, but this seems to work:
            Code:
            rename *_* ** // remove underscore from variables
            ds var1 *reshape*, not
            local stubs
            foreach var of varlist `r(varlist)' {
                local stubs `stubs' `=substr("`var'", 1, strlen("`var'")-4)'  // remove years of each variable and append to list
            }
            local uniquestubs: list uniq stubs // list of unique stubs
            di "`uniquestubs'"

            Comment


            • #7
              It was though but I engineered a way to retrieve un-suffixed, unique stubs to perform RESHAPE. I felt like Nick for three seconds.
              Now another minor problem: now when I run the reshape, it says the variables for 1968 are too long.

              Click image for larger version

Name:	dele.JPG
Views:	2
Size:	27.6 KB
ID:	1527358


              However, here's my list of variables for '68:
              Code:
              ER1968 RELNUM1968 PERSID1968 REL1968 FAMINC1968 EMPH1968 RSHD1968 NCHILD1968 REL2HD1968 EDU1968
              How can these names be too long? Theoretically, the suffix _1968 should be also taken away in the RESHAPE LONG process, so the names would be even shorter, no?
              Thanks in advance.

              Comment


              • #8
                With regard to the technique in post #5, you should
                Code:
                display "`stubs'"  // note the quotation marks
                or better yet
                Code:
                macro list _stubs
                The result of
                Code:
                display `stubs'  // note no quotation marks
                after substitution for the local macro is something like
                Code:
                display EMPSTAT INCOME NCHILD
                and since the stubs are not surrounded in quotation marks, display attempts to treat them as expressions and evaluate them, looking for the first observation of EMPSTAT, which does not exist as a variable and is not constitute an unambiguous variable abbreviation.

                Comment


                • #9
                  With regard to the code in post #7, if you apply the advice of post #8 to the local macro uniqstubs, you will find that among the variable stubs it contains a period as a stub.

                  Comment


                  • #10
                    Originally posted by William Lisowski View Post
                    With regard to the code in post #7, if you apply the advice of post #8 to the local macro uniqstubs, you will find that among the variable stubs it contains a period as a stub.
                    I employed a slightly different approach that seems to work in creating a list of UNIQUE, un-suffixed variables that I will need for RESHAPE. Here's the short code:

                    Code:
                    gen x = .
                    tostring x, replace
                    local i = 1
                    label variable x "reshape"
                    ds, not(varl *reshape*)
                    foreach v of var `r(varlist)' {
                            local years = substr(`"`v'"',-4,4)
                            local stubs = subinstr(`"`v'"',`"`years'"',"",1)
                            replace x = "`stubs'" in `i++'
                    }
                    
                    forval i= 1/`=_N'{
                    local stubs `stubs' `=x[`i']'
                    }
                    local uniqstubs : list uniq stubs
                    drop x

                    Basically I create a variable, add in its observations the variable names, trime them, remove duplicates and store in a local.
                    Last step (which creates the error) is:

                    Code:
                    reshape long "`uniqstubs'", i(id) j(year)

                    However, as noted in my previous post, I still don't understand why the variables for the 1968 vintage seem to be too long.

                    Comment


                    • #11
                      However, as noted in my previous post, I still don't understand why the variables for the 1968 vintage seem to be too long.
                      Run
                      Code:
                      macro list _uniqstubs
                      and notice the period at the end of the list of stubs. The period does not belong there.

                      There are many problems with your code. I would take the following approach.
                      Code:
                      ds
                      foreach v of var `r(varlist)' {
                          local lv : strlen local v
                          local vv = substr("`v'",1,`lv'-4)
                          local stubs `stubs' `vv'
                      }
                      local uniqstubs : list uniq stubs
                      macro list _uniqstubs
                      which on some made-up data yields
                      Code:
                      . ds
                      price1999   mpg1999     weight1999  weight1998
                      
                      . foreach v of var `r(varlist)' {
                        2.     local lv : strlen local v
                        3.     local vv = substr("`v'",1,`lv'-4)
                        4.         local stubs `stubs' `vv'
                        5. }
                      
                      . local uniqstubs : list uniq stubs
                      
                      . macro list _uniqstubs
                      _uniqstubs:     price mpg weight

                      Comment


                      • #12
                        Hi WIlliam, thanks again.

                        I run what you said, and no, the periods are not there. because I remove them with the loop.

                        Click image for larger version

Name:	dele.JPG
Views:	1
Size:	39.2 KB
ID:	1527440


                        It's as it should be, no?
                        Thanks.

                        Comment


                        • #13
                          It's as it should be, no?
                          No. I wrote

                          notice the period at the end of the list os stubs
                          The next-to-last stub in uniqstubs is OTHERDEBT and it is followed by the last stub, a single period, which is not the stub of a variable name.
                          Last edited by William Lisowski; 04 Dec 2019, 05:13.

                          Comment


                          • #14
                            Perhaps less elegant but more easy to understand, consider:
                            Code:
                            * some example data
                            clear
                            set obs 5
                            gen FAMID_1968 = _n
                            expand 3
                            bys FAMID_1968 : gen INDIVID_1968= _n
                            
                            foreach stub in INCOME_ NCHILD_ EMPSTAT_PENSION_ {
                                foreach year in 1968 1969 1970 {
                                    gen `stub'`year'= runiform()
                                }
                            }
                            Followed by:
                            Code:
                            * reshaping  doubly long
                            ren * x*
                            ren xFAMID_1968 FAMID
                            ren xINDIVID_1968 INDIVID
                            reshape long x, i(FAMID INDIVID) j(varname) string
                            gen year = real(substr(varname, -4, 4))
                            replace varname = substr(varname,1, strlen(varname)-5)
                            * reshaping long
                            reshape wide x, i(FAMID INDIVID year) j(varname) string
                            ren x* *
                            And please do provide data examples with dataex (see the FAQ: https://www.statalist.org/forums/help#stata) and dont copy output as a screenshot. This is all to make it easier for people to copy your exact problem into their Stata.

                            Comment

                            Working...
                            X