Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combining the e(sample) from two different regression

    I have the same model that I run separately for two groups, (poor and less poor). and I want to get the descriptive statistic of each sub-sample, and also the descriptive statistics of the combine two subsample.

    I am aware that I could use sum var_list, if e(sample) for each regression. but I have no idea how to combine those two e(sample) and get the descriptive statistics.

    and simply using sum var_list also do not work, perhaps because I also include age and district fixed effect, so there could be the case some observations with particular district or age are omitted.


    I know this should be easy, but I just dont know how to do it.




    Thank You

  • #2
    After running a model you could do something like

    gen sample1 = e(sample)

    Having done that, I suppose you could create a variable coded sample 1 only, sample 2 only, and (if not mutually exclusive) sample 1 and 2, whatever.
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    StataNow Version: 19.5 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      I doubt the usefulness of descriptive statistics for a combined sample if there are no multivariate analysis for this combined sample, but you can just create permanent variables to mark the two subsamples, then add them. Here is an example

      Code:
      sysuse nlsw88 , clear
      
      regress wage hours i.union if collgrad
      generate byte subsample1 = e(sample)
      
      regress wage hours i.union if !collgrad
      generate byte subsample2 = e(sample)
      
      generate mysample = subsample1 + subsample2
      
      ta mysample
      
      summarize hours
      summarize hours if mysample
      Best
      Daniel

      Comment


      • #4
        Thank You very much Richard, Daniel.. have no idea that I could combine e(sample) with gen.. really appreciate your help :D
        Last edited by Dono Iskandar; 12 Aug 2015, 09:03.

        Comment


        • #5
          Hello All,

          Hope everyone is safe.

          I intend to get descriptives statistics from my regression sample but separated by Industries that are contained in the estimation. I have an unbalanced panel data set from various Industries spanning across several years. I am aware of using e(sample) and tabstat both in isolation but unable to combine them so as to get summary stat industry wise only from the regression sample used. Can someone please suggest how to proceed?

          thanks and regards,
          Mohina

          Comment


          • #6
            Code:
            preserve
            keep if e(sample)
            tabstat ...
            restore

            Comment


            • #7
              Let's imagine that we have two samples, which might overlap. So separately after each model fit we go

              Code:
              gen byte sample1 = e(sample)
              
              gen byte sample2 = e(sample)
              Now we could go

              Code:
              gen byte sample = sample1 + 2 * sample2
              so that sample is 0 if in neither sample, 1 if in sample 1 only, 2 if in sample 2 only, 3 if in both.

              With three such samples,

              Code:
              gen byte sample = sample1 + 2 * sample2 + 4 * sample3
              which gives 0 if in no samples all the way up to 7 for if in all samples.

              But hang on: these are just binary numbers in decimal. It's more direct to go

              Code:
              egen sample = concat(sample1 sample2)
              or

              Code:
              egen sample = concat(sample1 sample2 sample3)
              and so on, so that (in the last example) string values could be

              Code:
              000
              001
              010
              011
              100
              101
              110
              111
              and easy extensions give us the 2^k distinct subsets for k samples. It's not at all necessary that all the subsets occur in practice.

              Comment


              • #8
                Many Thanks Andrew, it got resolved.

                stay safe and regards,
                Mohina

                Comment

                Working...
                X