Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with generating a new variable from categorical variables

    Please help me with the relevant command. I want to create a new variable that is equal to 1 if Moved_1 is equal to 2 and is STATE is the same as ADDRESS_YR1

    My data is as follows:-.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte MOVED_1 long STATE byte ADDRESS_YR1
    77 1 77
     1 1  1
     1 1  1
     2 1  1
     2 1  1
     2 1  1
     2 1  1
     2 1  1
    end
    label values MOVED_1 MOVED_1
    label def MOVED_1 1 "No", modify
    label def MOVED_1 2 "Yes,within australia", modify
    label def MOVED_1 77 "something else", modify
    label values STATE STATE
    label def STATE 1 "NSW", modify
    label values ADDRESS_YR1 ADDRESS_YR1
    label def ADDRESS_YR1 1 "NSW", modify
    label def ADDRESS_YR1 77 "something else", modify

    I use the following code
    Code:
    gen INSTATE=1 if ( MOVED_1 =2 & STATE=ADDRESS_YR1)
    But i keep getting the error MOVED_1 is an invalid name.

    Can anyone please help me figuring out the right code to correct the error?

    Thanks
    Sunganani Kalemba
    PhD Student.
    Queensland

  • #2
    Code:
    gen INSTATE=1 if ( MOVED_1 ==2 & STATE==ADDRESS_YR1)
    You need two equal signs to identify a value.

    Comment


    • #3
      Eric answered the question, spot on, but where did this fashion for (1, missing) indicators come from? They are awkward in several ways:

      1. It can be hard if not impossible to show them in graphs.

      2. It can be hard if not impossible to use them to get descriptive statistics. So, with the variable above

      Code:
      su INSTATE
      reduces to telling you that you have a mean of 1 and a SD of 0 for all the values of 1. Oh, and by the way the minimum and maximum are both 1. (The count of such values can be useful.)

      3. They are typically useless for modelling. Indicators that are 1 or missing fail twice over: the observations with missing values are omitted from the estimation and the other values are just a constant, which can't be useful either as a predictor or as a response.

      These problems are all consequences of a single over-arching fact: Stata tends to ignore missing values unless you ask otherwise. That's usually what you want when the values genuinely are missing, but that's not true here.

      So, working with (0, 1) indicators in advised: If you want more on 0 for false and 1 for true see https://www.stata.com/support/faqs/d...rue-and-false/

      Code:
       gen INSTATE = MOVED_1==2 & STATE==ADDRESS_YR1
      This is usually an even better idea.

      Code:
      gen INSTATE = MOVED_1==2 & STATE==ADDRESS_YR1 if !missing(MOVED_1, STATE, ADDRESS_YR)
      Then you have

      0 for false condition
      1 for true condition
      missing if can't tell on whether true or false.

      PS Never say dummy. That's my advice. I have heard too many stories about presentations in which the word dummy was misconstrued.

      Vince Wiggins of StataCorp had a story where most people ended up laughing: skip to the PS at the end in https://www.stata.com/statalist/arch.../msg00594.html

      In other stories some people got very angry.

      The word indicator is, so far as I know, perfectly neutral and has a splendid mathematical pedigree.

      Comment


      • #4
        Thanks Eric and Nick for your response.

        Nick what you have said has me thinking. In fact questioning my approach. I wonder then if I should proceed as planned to generate my next variable

        Code:
        gen OSTATE= MOVED_1==2 & STATE !=ADDRESS_YR1 !(MOVED_1 STATE ADDRESS_YR1)
        But am thinking there must be a way (not completely sure how yet) that I can generate a single variable for “migrate” with INSTATE and OSTATE as the categorical labels right? It won’t greatly improve the usefulness of the data but I will be able to get decent proportions right?

        Sunganani Kalemba
        PhD Student.
        Queensland

        Comment


        • #5
          sunga: Sorry, but I can't follow what you're suggesting. Your line of code is meaningless as well as illegal. Was it meant just to be what I suggestd.

          Please note https://www.statalist.org/forums/help#stata (give example data) and https://www.statalist.org/forums/help#realnames

          Comment


          • #6
            Nick,

            understood. Thanks again.
            Sunganani Kalemba
            PhD Student.
            Queensland

            Comment

            Working...
            X