Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Converting Variables to a dummy variable

    Hello, Actually I have three variables in percentage (Formal credit - semi-formal credit - Informal credit) between 0 -100%
    Code:
    sum Formal_credit informal_credit Semi_formal_credit
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
    Formal_credit |     81,342    15.10992    25.61789          0        100
    informal_credit |     82,974    62.47527    34.42988          0        100
    Semi_formal_credit |     78,439    1.368165    7.610706          0        100
    Now, I am trying to create a categorical variable (e.g. Formal credit = 1, informal credit = 2, semi-formal credit = 3), but the problem, a firm can use formal and informal credit in the same time. my question How I avoid this problem?

    Thanks a lot

  • #2
    The categorical variable you want to create assumes that these are mutually exclusive categories, and that does not seem to be the case. So that is that: you cannot create the variable you want. You can create three indicator variables: One for formal credit or not, one for informal credit or not, and one for semi-formal credit or not, and firms can score 1s on multiple variables. You could create a categorical variable with more categories: only formal, formal and informal, formal and semi-formal, etc.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Thanks for your replying, so what I did;

      Code:
      generate formal_dummy = .
      replace formal_dummy = 0 if Formal_credit == 0
      replace formal_dummy = 1 if Formal_credit > 0 
      
      generate informal_dummy = .
      replace informal_dummy = 0 if informal_credit == 0
      replace informal_dummy = 1 if informal_credit > 0
      
      generate Semi_formal_dummy = .
      replace Semi_formal_dummy = 1 if Semi_formal_credit  > 0
      replace Semi_formal_dummy = 0 if Semi_formal_credit  == 0
      So, I got These results:

      Code:
      sum formal_dummy informal_dummy Semi_formal_dummy
      
        Variable |        Obs        Mean    Std. Dev.       Min        Max
      -------------+---------------------------------------------------------
      formal_dummy |     81,342    .3695016    .4826728          0          1
      informal_d~y |     82,974    .9377998    .2415203          0          1
      Semi_forma~y |     78,439    .0534173    .2248656          0          1
      But for that case, I have the percentage of formal = 36% , informal = 93%, semi formal = 5% That higher to 100%.
      Is that right?

      Comment


      • #4
        You need to look carefully at missings too. With your definitions, each indicator (you say dummy) will be

        1 if the value is positive or missing

        0 if the value is zero

        missing is the value is negative.

        Is that what you want?

        Comment


        • #5
          The problem Nick pointed out is that a missing value in Stata is the highest possible number. So if Formal_credit is missing on an observation, then Formal_credit > 0 will evaluate to true for that observation, which is probably not what you want. One way of dealing with Nick's comment is:

          Code:
          generate formal_dummy = .
          replace formal_dummy = 0 if Formal_credit == 0 r
          eplace formal_dummy = 1 if Formal_credit > 0 & !missing(Formal_credit)
          
          generate informal_dummy = .
          replace informal_dummy = 0 if informal_credit == 0
          replace informal_dummy = 1 if informal_credit > 0 & !missing(informal_credit)  
          
          generate Semi_formal_dummy = .
          replace Semi_formal_dummy = 1 if Semi_formal_credit  > 0 & !missing(Semi_formal_credit)
          replace Semi_formal_dummy = 0 if Semi_formal_credit  == 0
          The missing() function returns a 1 (true) when its argument is a missing value and a 0 (false) otherwise. The ! negates, so true (1) becomes false (0) and false (0) becomes true (1). So in my mind I read !missing(Formal_credit) as "not missing on Formal_credit"

          After you have created the indicator variables correctly, the proportions will in all likelihood still not add up to 1. They would have added up to 1 if they were mutually exclusive, be we established in #1 that that is not the case. That is not a problem, this is why you created these variables this way.
          ---------------------------------
          Maarten L. Buis
          University of Konstanz
          Department of history and sociology
          box 40
          78457 Konstanz
          Germany
          http://www.maartenbuis.nl
          ---------------------------------

          Comment

          Working...
          X