Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshape long again such that whole rows of observations do not get dropped

    Dear Statalist,

    I have data on individuals' allocation of a budget across four categories. Each individual completes four rounds, each round has 4 decisions. As of now, I have four lines with each four decisions for every individual.

    So far, so good. Now my advisor suggested I drop every allocation of 0 or 100; for example, if, in one round, an individual decides to allocate 100 only to one category and leave the other three empty.

    The problem with drop is that the whole row (round) gets deleted even though the other three decisions in that round (row) have non-zero and non-100 data.

    Without dropping, I tried to use regress with the if !(alloc==0 | alloc==100) qualifier, however, judging from the number of observations, again, the whole row is not considered (not just the 0 or 100 part) (even though it is no longer dropped).

    Now, I'm thinking of using reshape long again, in essence, to have 16 rows per individual - with only one decision per row, such that if a 0 or 100 occurs, only that row is dropped without affecting the other decisions in that row.

    Is this the best way to tackle my problem?

    If so, how should I rename 1alloc, 2alloc, 3alloc, and 4alloc such that each allocation has its own row but is still part of round 1? I was thinking of 1alloc1, 2alloc2, 3alloc3, and 4alloc4 but I have a feeling that this is incorrect.

    I am most grateful for any advice and insights!

    Kind regards,
    Mary

    EDIT: reshape long (again) throws errors as the id variable does not uniquely identify the observations
    Last edited by Mary Burckhette; 09 Aug 2023, 07:21.

  • #2
    It is almost inconceivable that anybody can answer this question without example data. Please post back, and use -dataex- to show a representative example of your Stata data set.

    What can be said with certainty, however, is that you cannot have variables named 1alloc or 2alloc3 because Stata variable names must begin with either a letter or an underscore (_) character; digits not allowed at the start. But beyond that, your question as posed leaves too much to the imagination to provide useful advice.

    Comment


    • #3
      Thank you, Clyde. Here is my dataex code:

      All the required data would not fit into one dataex command, so I am using two.

      CASE uniquely identifies the participants, risk 1-4 measures different risk categories (each round consists of one particular risk category - so four decisions in that round refer to the same risk category). In total, there are four rounds. Each of the four decisions per round is associated with a particular return. Since there are 16 decisions, 16 return figures are provided. _sa, _ak, _ua, and _g are the shortcut names of the four categories across which the budget of 100 needs to be allocated

      Code:
       dataex CASE risk1 risk2 risk3 risk4 r_sa1 r_ak2 r_ua3 r_g4 r_sa5 r_ak6 r_ua7 r_g8 r_sa9 r_ak10 r_ua11 r_g12 r_sa13 r_ak14 r_ua15 r_g16
      ----------------------- copy starting from the next line -----------------------

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input byte CASE str4(risk1 risk2 risk3 risk4) str7 r_sa1 str8 r_ak2 str5 r_ua3 str7(r_g4 r_sa5) str6 r_ak6 str5 r_ua7 str8 r_g8 str7 r_sa9 str6 r_ak10 str5 r_ua11 str8 r_g12 str5 r_sa13 str8 r_ak14 str5 r_ua15 str7 r_g16
      17 "TB02" "TB03" "TB01" "TB04" " -0.31%" "15.79%"   "0.90%" "3.60%"   "1.24%"   "2.65%"  "3%"    "11.60%"   " -0.47%" "3.55%"  "1.70%" "14.30%"   "0.54%" "9.56%"    "2.40%" " -0.20%"
      18 "TB01" "TB03" "TB04" "TB02" "0.37%"   "12.51%"   "1.70%" " -0.90%" " -0.21%" "25.48%" "2.50%" "21.00%"   "1.63%"   "25.48%" "3.40%" " -30.90%" "0.46%" " -18.26%" "2.50%" "3.00%"  
      20 "TB04" "TB03" "TB02" "TB01" "1.19%"   " -12.35%" "3.30%" "5.79%"   "1.24%"   "2.65%"  "3%"    "11.60%"   " -0.31%" "15.79%" "0.90%" "3.60%"    "0.14%" "6.87%"    "2.10%" "12.10%"
      47 "TB01" "TB04" "TB02" "TB03" "0.46%"   " -18.26%" "2.50%" "3.00%"   " -0.31%" "15.79%" "0.90%" "3.60%"    "0.37%"   "12.51%" "1.70%" " -0.90%"  "0.54%" "9.56%"    "2.40%" " -0.20%"
      73 "TB03" "TB04" "TB01" "TB02" "0.54%"   "9.56%"    "2.40%" " -0.20%" "1.63%"   "25.48%" "3.40%" " -30.90%" "0.14%"   "6.87%"  "2.10%" "12.10%"   "0.37%" "12.51%"   "1.70%" " -0.90%"
      end
      Code:
       dataex T_C gender original_sa1 original_ak2 original_ua3 original_sa5 original_g4 original_ak6 original_ua7 original_g8 original_sa9 original_ak10 original_ua11 original_g12 original_sa13 original_ak14 original_ua1
      > 5 original_g16
      ----------------------- copy starting from the next line -----------------------
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input byte(T_C gender original_sa1 original_ak2 original_ua3 original_sa5 original_g4 original_ak6 original_ua7 original_g8 original_sa9 original_ak10 original_ua11 original_g12 original_sa13 original_ak14 original_ua15 original_g16)
      1 1 50 30 20 20  0  80  . . 70  10 20  .  0  70 30  .
      1 2 70 20  5 20  5  70  5 5  0 100  0  0 30  40 20 10
      2 2 10 80  5 10  5  80 10 . 30  40 20 10 30  40 20 10
      2 2 90  .  .  . 10 100  . . 40  25 20 15  . 100  .  .
      2 1  . 90  5  .  5  90  5 5  .  80 15  5  .  80 15  5
      end
      T_C measures whether the participant was allocated to the treatment or control group, and original 1-16 measures the initial allocation. Four allocations per round - 4 rounds exist - 16 decisions.

      Then I used the reshape command
      Code:
       reshape long risk r_sa r_ak r_ua r_g original_sa original_ak original_ua original_g, i(CASE) j(decision)
      I understand that I need to add risk5-16 and duplicate the entries from risk1-4 to reflect that four rounds make 16 decisions.

      The goal of this modification is that every decision has its own row so that when I delete allocations of 0 or 100, the whole round does not get deleted (i.e., the other three decisions in that round). Having used the reshape command, I see missing variables for the other three categories (which makes sense as the reshape command has worked). However, I am uncertain whether my regressions will still work.


      I would be most grateful for any advice.

      Comment


      • #4
        As I do not know what regressions or other analyses you plan to do with the data, it is not possible for me to advise what specific data organization is well-suited to the purpose. What you have done may be quite appropriate. Or you might be better off with a fully long layout, where each observation is a single decision:
        Code:
        local budget_categories sa ak ua g
        display "`budget_categories'"
        display `"`=word("`budget_categories'", 1)'"'
        
        rename (r_*#) r_#, renumber
        rename (original_*#) original_#, renumber
        
        reshape long risk r_ original_, i(CASE) j(decision)
        gen byte four_cycle = mod(decision, 4)
        replace four_cycle = 4 if four_cycle == 0
        gen budget_category = word( "`budget_categories'", four_cycle), after(decision)
        drop four_cycle
        destring r_, ignore("%") replace
        From that point it is easy to drop any observation where r_ is missing, 0, or 100 without disturbing anything else.

        But not knowing what you will be doing with the data from this point, I can't really say which is better.

        Added:
        By the way, here's another way to organize the data that might be what you need. Here each observation is a single round.
        Code:
        local budget_categories sa ak ua g
        display "`budget_categories'"
        display `"`=word("`budget_categories'", 1)'"'
        
        rename (r_*#) r_#, renumber
        rename (original_*#) original_#, renumber
        
        reshape long risk r_ original_, i(CASE) j(decision)
        gen byte four_cycle = mod(decision, 4)
        replace four_cycle = 4 if four_cycle == 0
        gen budget_category = word( "`budget_categories'", four_cycle), after(decision)
        gen int round = ceil(decision/4)
        drop four_cycle decision
        destring r_, ignore("%") replace
        rename risk risk_
        
        // IF YOU WANT TO REMOVE 0 AND 100% ALLOCATIONS, DO SO HERE.
        
        reshape wide risk_ r_ original_ , i(CASE round) j(budget_category) string
        Last edited by Clyde Schechter; 10 Aug 2023, 09:42.

        Comment


        • #5
          Dear Clyde, Thank you so very much for your suggested code. This is exactly what I was looking for!

          Comment

          Working...
          X