Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Store inverse of mean for dummy variable without recode

    I am trying to just get some simple means of indicator variables and store them to create a new data frame:

    Code:
    sysuse nlsw88, clear
    mean south
    eststo
    mean smsa
    eststo
    mean union
    eststo
    
    esttab using example.csv
    How would I get the inverse of that? So for -union-, the percent of non-missing who are in a union is 20.53, and percent not in a union are 63.09. With the code I have above, it will store and produce the 20.53%. Is there a way to get the 63.09 figure without creating a new reverse-coded variable?

  • #2
    "Create a new data frame"? Sounds like you need collapse.



    Anyways, no. At least, if I understand you well, no. You can only get the proportion by counting the number of non missing who are in one and the number of non missing who aren't in one. If the proportion is .5, that could be 4 and 8, 10 and 20... that is, if I understand you correctly.


    I guess what you'd really want then, now that I think of it, is to simply count the number of yes' and divide by the number of no's. Just collapse by it, and create a new data frame.

    Comment


    • #3
      To your last idea--I actually think what I might do is just create a second variable that is '1 - mean', and then do some wrangling on the back end to make the data frame I want. I generally don't love to collapse when I'm working with a lot of variables and breaking things down by various groups (which is what I have in my real data). Collapsing over and over and figuring out how to put it all in one big new data frame has never felt natural to me.

      Comment


      • #4
        Can you give your example data? I'll look at it before I go to bed

        Comment


        • #5
          The data are on a restricted server, but I can use the nlsw88 to easily come up with a similar example, will send some code in a moment...

          Comment


          • #6
            Okay, this is roughly what I'm doing. You'll see the one thing left blank at the end is % nonunion. I think I'll probably just destring and do 1 - mean to get the nonunion percent. The only problem is if I eventually want to incorporate SEs, for example, that won't cut it...


            Code:
            ssc install estout
            
            sysuse nlsw88, clear
            eststo clear
            // Get means of vars for union and non-union
            
            mean union
            eststo
            preserve
            keep if union == 1
            mean age
            eststo
            mean married
            eststo
            mean south
            eststo
            mean smsa
            eststo
            restore
            
            preserve
            keep if union == 0
            mean age
            eststo
            mean married
            eststo
            mean south
            eststo
            mean smsa
            eststo
            restore
            
            // Save and then import in csv
            esttab using "example.csv", mtitles replace cell("b(f(2))")
            
            import delimited "example.csv", clear stripquote(yes)
            
            foreach var of varlist _all {
                replace `var' = subinstr(`var', "=", "",.)
            }
            
            foreach var of varlist v3-v6 {
                replace v2 = `var' if `var' != ""
            }
            foreach var of varlist v7-v10 {
                replace v3 = `var' if `var' != ""
            }
            
            rename (v2 v3) (union nonunion)
            drop if v1 == ""
            drop v4-v10
            Last edited by Taylor Walter; 02 Sep 2022, 20:57.

            Comment


            • #7
              Inverse has a precise meaning in mathematics, the inverse of a is 1/a.

              I do not quite follow what problem you perceive, and it became complicated between your #1 and #6. But
              1. Check out the new -table- command, it might be able to do what you want.
              2. Why do you not want to creates new "not" variables? It seems to me that this would be the easiest thing in your procedure, if you do something on union and you want to esttab it, and then you want to do the same for not union, the easiest thing here seems to create a notunion variable and just follow your procedure.

              Comment


              • #8
                Here is what your data look like at the end of the code in #6:
                Code:
                . list
                
                     +----------------------------+
                     |      v1   union   nonunion |
                     |----------------------------|
                  1. |   union    0.25            |
                  2. |     age   39.28      39.21 |
                  3. | married    0.61       0.66 |
                  4. |   south    0.30       0.47 |
                  5. |    smsa    0.77       0.69 |
                     |----------------------------|
                  6. |       N     461       1417 |
                     +----------------------------+
                Here is a different way to produce the same result. The main bits here are
                • use -collapse- to obtain the requisite statistics, as mentioned in #2,
                • transpose the data using -xpose-
                • fix display formats. Notice your last row has a different display format from the others. To achieve this, we first convert numbers to strings.
                Code:
                sysuse nlsw88, clear
                
                collapse (percent) perc = idcode (mean) age married south smsa (count) idcode if !missing(union), by(union)
                
                replace perc = perc/100
                
                xpose, clear varname
                
                drop in 1
                rename v1 nonunion
                rename v2 union
                rename _varname v1
                order v1 union nonunion
                replace v1 = "union" if v1 == "perc"
                replace v1 = "N" if v1 == "idcode"
                
                foreach var in union nonunion {
                    gen `var'_str = strofreal(`var',"%4.2f"), after(`var')
                    replace `var'_str = strofreal(`var',"%9.0g") if v1 == "N"
                    drop `var'
                    rename `var'_str `var'
                }
                This gives us:
                Code:
                .list
                     +----------------------------+
                     |      v1   union   nonunion |
                     |----------------------------|
                  1. |   union    0.25       0.75 |
                  2. |     age   39.28      39.21 |
                  3. | married    0.61       0.66 |
                  4. |   south    0.30       0.47 |
                  5. |    smsa    0.77       0.69 |
                     |----------------------------|
                  6. |       N     461       1417 |
                     +----------------------------+
                Last edited by Hemanshu Kumar; 03 Sep 2022, 08:03. Reason: Edited to use simpler code

                Comment


                • #9
                  Hemanshu Kumar thanks! This is useful code. I generally avoid collapse because it's hard to work with after the collapse, but xpose works nicely here. Joro Kolev I don't want to create new variables because then it just adds more steps in wrangling the resulting data frame.

                  Comment


                  • #10
                    Hemanshu Kumar how would I also add the Standard errors to this code? I tried:

                    Code:
                     
                     collapse (percent) perc = idcode (mean) age married south smsa (semean) age married south smsa (count) idcode if !missing(union), by(union)
                    But get an error for 'name conflict', I think I may have to add some type of 'SEvar' = to the beginning of the (semean) part?

                    Comment


                    • #11
                      I think I may have to add some type of 'SEvar' = to the beginning of the (semean) part?
                      Actually, you will have to do a bit more than that. You will have to add SEvar = to all of the variables for which you are getting (semean), as all of them have already been mentioned in the (percent) varlists.

                      So:
                      Code:
                      collapse (percent) perc = idcode (mean) age married south smsa (semean) se_age = age se_married = married se_south = south se_smsa = smsa (count) idcode if !missing(union), by(union)
                      If you had a substantially larger number of variables to do this with I would suggest writing a loop to create these names. But for just four the amount of typing involved isn't all that terrible, and the direct coding shown here feels more transparent than a loop would be.

                      Comment


                      • #12
                        Yes, that happens when you ask collapse to give you multiple stats for the same variable, because by default collapse stores the outcome in a variable that has the same name as the original. You need to specify different names for the outcome variables that will store the different stats. You could do something like this instead of the current collapse command:

                        Code:
                        #delimit ;
                        collapse 
                            (percent) perc = idcode 
                            (mean) m_age = age m_married = married m_south = south m_smsa = smsa 
                            (semean) se_age = age se_married = married se_south = south se_smsa = smsa 
                            (count) idcode 
                                if !missing(union), by(union)
                            ;
                        #delimit cr

                        Comment


                        • #13
                          Oh I see now, that makes sense. Thanks to both of you!

                          Comment

                          Working...
                          X