Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a loop to count across different groups of variables

    I am currently working with a wide data set and am trying to count the number of certain values across different groups of 40. The data looks like this:

    # S F w1 w2 w3 w4 w5

    1 w1 w5 1 10 13 9 10
    2 w3 w5 0 0 0 0 10
    3 w2 w4 13 0 11 0 0



    I am attempting to count the number of variables greater than 10 for each observation, but between different start and finish points. For example, for person 1, I need to know how many variables are exactly 10 between w1 and w5. For person 2, I need to know how many variables are greater than 10 between w3 and w5. Each person has the same number of variables to count (40) but different starting and ending points. For each individual person, I am using this:

    gen employedweeks = 0
    qui foreach v of var W0067000-W0106800 {
    replace employedweeks = employedweeks + (`v' > 10) if birth2 == 73
    }

    But am unsure if there is a way to avoid doing this for each person. Thanks for any help!






    Last edited by Anna Sillers; 20 Feb 2018, 21:22.

  • #2
    There are a few things about your question that are unclear. First, when you say "between" S and F, are the starting and finishing points included or excluded? In the code shown below, I assume they are included. Second, in the first observation you want the number of values that are exactly 10, but in the second one you want the number that are > 10. How are we supposed to know which observations get the exactly 10 treatment and which get the > 10 treatment? There is nothing obvious in the information given to answer this. I will assume that "exactly 10" is just a mistake and that you mean > 10 throughout. If that's wrong and if you can't figure out how to modify the code to accommodate it, post back with a response to the question and I'll see what I can do.

    Next, I guarantee you that your data do not look like what you've shown in #1. They can't. # is not a legal variable name. If your data are not yet in Stata, it is premature to ask for assistance with coding before you import them. If they are, then you should show them as they really are. It is possible that what I imagine your data to look like (and base my code on) is different from reality. That may mean that the code I show you doesn't actually work in your data, and we have both wasted our time. So the helpful way to show data is to start with a real Stata data set, and then use the -dataex- command. Please read and follow the advice in FAQ #12 about how to get (if you don't have it) and use -dataex-.

    With all of that, here is code that may work for you:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte seq str2(start finish) byte(w1 w2 w3 w4 w5)
    1 "w1" "w5"  1 10 13 9 10
    2 "w3" "w5"  0  0  0 0 10
    3 "w2" "w4" 13  0 11 0  0
    end
    
    //    GO TO LONG LAYOUT, WHERE LIFE IS EASY
    reshape long w, i(seq) j(_j)
    
    //    CONVERT START & FINISH TO NUMERIC
    destring start finish, replace ignore("w")
    
    //    THE ACTUAL COMPUTATION
    by seq (_j), sort: egen wanted = total(w > 10 & inrange(_j, start, finish))
    
    //    AND IF THERE IS SOME COMPELLING REASON TO GO BACK TO WIDE LAYOUT
    //    WHERE MOST THINGS ARE DIFFICULT OR IMPOSSIBLE
    reshape wide
    Note: The major obstacle to what you are trying to do is that you have the data in wide layout. There are very few analyses in Stata that are easily done with wide data; and many are entirely impossible that way. So the first step is to go to long layout. From there, the next issue is that once we are in long layout, having start and finish (your S and F) contain w is now a hindrance rather than a help. So we get rid of that and make it numeric. Now that the data are prepared, the calculation is a one-liner. I don't know what you plan to do next with this data, but the odds are strong that it, too, will be better done with the data in long layout than wide. There aren't very many things in Stata that are best done wide. So I suggest that you skip the final -reshape wide- command unless you have some compelling reason to go back to wide layout. Otherwise, whatever your next post to Statalist is will likely have a response that begins with -reshape long-!

    Comment

    Working...
    X