Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • indicator , missing problem, neutral value

    Good evening to everyone,
    I am working with firm-level data ( I have not workers microdata) and I have the following data:

    1. hours of training per capita for women = h_procap_ben_F = h_training_F / N_participant_F
    2. the same for males

    I would like to create an indicator which is the ratio between h_procap_ben_F and h_procap_ben_M

    The main problem is that when I divide for zero I get missing values. At first, I start thinking of replacing by 1 when the values are the same (eg. 0 and 0) , but I still not completely sure if makes sense (especially for the interpretation).

    I would like to create an indicator that took value zero when there is gender balance in training hours. How can I figure out an indicator with this data which collects gender divergences in training hours?

    Below are examples when the problem arises.

    Many thank in advance for your time

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(h_procap_ben_F h_training_F N_participant_F h_procap_ben_M h_training_M N_participant_M ratio)
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0  61.73585 3272 53 .
           .   0 0  6.473684  492 76 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0  7.142857  100 14 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0        54  108  2 .
           3   3 1         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0       3.4   68 20 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
          32 128 4         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0       100 6000 60 .
           .   0 0  62.22222 2800 45 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
          80  80 1         .    0  0 .
           .   0 0         .    0  0 .
         120 120 1         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0 11.809524  248 21 .
           .   0 0         .    0  0 .
    6.666667  20 3         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
          40  40 1         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0 2.2222223   20  9 .
           .   0 0        12  744 62 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0  22.04348 1014 46 .
           .   0 0     10.56  264 25 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
           .   0 0         .    0  0 .
    end
    Summaries statistics
    Code:
    Variable    Obs    Mean    Std. dev.    Min    Max
                        
    h_procap_ben_F   17.620    29,01955    137,3599    ,0000839    14025,39
    h_training_F    26.247    1910,504    92061,99    0    1,46e+07
    N_participant_F    26.247    55,25988    406,3644    0    34680
    h_procap_ben_M   18.814    29,31631    300,6491    -27597,11    22482,89
    h_training_M    26.247    1678,723    75299,81    -1,15e+07    1844383
                        
    N_participant_M    26.247    77,70225    492,0451    0    36309
    ratio    17.133    1,525592    19,58319    -79,33334    1469,125
    Last edited by Chiara Tasselli; 16 Nov 2023, 11:13.

  • #2
    Well, before you try creating derived variables, you need to clean up the original data. According to your summary statistics, some of the males have negative hours of training. Surely that must be an error.

    If you had the ability to calculate a M:F training ratio, you could easily create an index that is zero when there is balance: just take the logarithm.

    But with zero per-capita training hours in some of the firms, you simply can't do a ratio variable. Why not use the male-female difference instead? That also has the nice property that it is zero when there is balance.

    Comment


    • #3
      Dear Clyde,
      many thanks for your suggestion. Yes, you're right the dataset has some errors. Can you better explain where you will apply the log() ? You suggest the male-female difference as it is?he t nothing to normalize or standardize it, how I can Interpret the coefficient of this indicator if I use it in a regression? Many thanks for your time and suggestions!!

      Comment


      • #4
        As we all learned early, dividing by zero is something you shouldn't want to do.

        The small theme that differences can be helpful when ratios aren't helpful (or aren't even defined) is expanded on at https://www.stata.com/support/faqs/d...lity-measures/

        Comment


        • #5
          I said that you could log-transform the ratio and get an index that takes on the value zero when there is balance if you could calculate the ratio. But I then went on to point out that you cannot do a ratio with this data because of the zeroes. I'm sorry if raising that hypothetical solution that is not applicable to your data confused you.

          If you were to use the difference as a regressor, the interpretation of its coefficient would be straightforward: if the coefficient is b, then the interpretation is that each hour per capita of training for males in excess of females (or the other way around, as the case may be) is associated with an expected difference of b in the outcome variable.

          Comment


          • #6
            Thank you both; your assistance has been very helpful, and I'm beginning to grasp the point. Can I add a further detail that I omitted to simplify things? The data I have are as follows:

            -Total Number of Women (omitted in the previous explanation)
            -Number of women participating in training activities
            -Total number of training hours dedicated to women

            -The same for men, and of course, the totals (M+F)

            The suggestion provided here [https://www.stata.com/support/faqs/d...lity-measures/ ] seems very useful for the difference between men and women benefiting from training. However, how do I work on the number of hours per capita? Should I follow this formula?
            gen balance = 100 * ((h_percapita_f/(h_percapita_f + h_percapita_m)) - (h_percapita_m/(h_percapita_f + h_percapita_m))) Thank you both again for your very helpful advice.

            Comment


            • #7
              Your algebra simplifies to

              Code:
              100 * (h_percapita_f - h_percapita_m) / (h_percapita_f + h_percapita_m)
              but whether that measures what you want to measure is a different question. My post was just riffing on a small theme, not making a direct suggestion for what you should do.

              Comment


              • #8
                Thank you both for the useful advice! wishing you a great weekend ahead!

                Comment

                Working...
                X