Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • r(109) keep if type mismatch

    hello everybody, I am new with Stata and I have a problem I am not able to solve also with the help of your precious forum.

    I want to regress 2 variables: one about age (s2a_03) and another about the emplyment status (ocupado). Below the variables description:

    storage display value
    variable name type format label variable label
    s2a_03 byte %8.0g S2 3. �Cu�ntos a�os cumplidos tiene?SI TIENE MENOS DE 1 A�O ANOTE 00.SI TIENE 98

    storage display value
    variable name type format label variable label
    ocupado byte %8.0g OCUPADO Poblaci�n Ocupada
    This last variable is controversial for me becasue it is displayed to be in a numeric format, but the options are dummy: YES (Si) or NO (NO)

    Now, the problem is that I need a subset with 2 criteria:
    1. people aged higher than 7 and lower than 13 years old
    2. the status of employment as YES (ocupado="Si")

    No problems for the first one, as I run:
    keep if (s2a_03>=7)
    keep if (s2a_03<13)

    Lot of problems with the second one, as I run:

    keep if ocupado=="Si"

    Result is: type mismatch r(109)

    I have tried with a lot of other option (drop if, encode...). But I am getting crazy. Please if you can help me, it would be great.

    Thank you in advance

  • #2
    ocupado is numeric, not string. So you can't test ocupado for equality with one of its value labels like that. Consider this as a parallel problem.

    Code:
    . sysuse auto, clear
    (1978 Automobile Data)
    
    . keep if foreign == "Foreign"
    type mismatch
    r(109);
    
    . keep if foreign == "Foreign":origin
    (52 observations deleted)
    So, using this syntax you can go (I guess)

    Code:
    keep if ocupado=="Si":OCCUPADO
    What most experienced Stata users would do here is different. Assuming that you coded 1 for Si and 0 for No you could just go something like

    Code:
    regress s2a_03 occupado if inrange(s2a_03, 7, 12) & occupado == 1
    except that now all that would do is calculate the mean age in question. A constant predictor is no predictor.

    That's a different question. The main point is that there is no need to throw out observations you don't want to use.



    Comment


    • #3
      Alessandro:
      welcome to this forum.
      Your -ocupado- numeric variable is -label-led.
      Just type
      Code:
      label list Poblacion Ocupada
      from within Stata and the -label-led numeric values will appear.
      Then you can do something like:
      Code:
      keep if ocupado==1*assuming that 1 identifies employed people*
      to reduce your dataset to the observations you're interested in.

      As an aside, it's probably better not to delete observations (as they might be useful for future, currently unexpected analyses), but to flag them with a categorical variable to be used as an -if- qualifier for further analyses.
      In your case, you may want to consider:
      Code:
      gen flag=1 if s2a_03>=7 & s2a_03<13 *amended as per Nick's comment (see #4) on my previous mistake*
      replace flag=0 if flag==.
      regress s2a_03 ocupado if flag==1 & ocupado==1
      PS: Crossed in the cyberspace with Nick's reply; his codes are, as usual, much more efficient than mine!
      Last edited by Carlo Lazzaro; 30 Jun 2018, 03:25.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Carlo:

        Code:
         
         gen flag=1 if s2a_03>=7|s2a_03<13
        isn't what is wanted as all values qualify as less than 13 or more than 6. You need & not |.

        Comment


        • #5
          Nick is correct:
          his code are also much more effective than mine!
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            thank you so much Nick!

            :OCUPADO now works for my subset (I can list the variable and see all variable in "Si")

            However, running the regression now I have no result that I was expected to find. Thanks to an analysis in absolute terms, I now that the average age is increasing in presence of the status of employed. However, neither with regress, nor with correlate commands I can demonstrate it.
            Source SS df MS Number of obs = 372
            F(0, 371) = 0.00
            Model 0 0 . Prob > F = .
            Residual 924.731183 371 2.49253688 R-squared = 0.0000
            Adj R-squared = 0.0000
            Total 924.731183 371 2.49253688 Root MSE = 1.5788
            s2a_03 Coef. Std. Err. t P>t [95% Conf. Interval]
            ocupado 0 (omitted)
            _cons 10.13978 .0818558 123.87 0.000 9.978825 10.30074
            ocupado s2a_03
            ocupado .
            s2a_03 . 1.0000
            Do you have some hints?

            Comment


            • #7
              Alessandro:
              your post has severe formatting issues that make it unreadable as far as your regression model is concerned.
              Please share what you typed and what Stata gave you back via CODE delimiters (see the FAQ on this). Thanks.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Sorry Carlo, I am gonna try

                Code:
                 
                Source SS df MS Number of obs = 372
                F(0, 371) = 0.00
                Model 0 0 . Prob > F = .
                Residual 924.731183 371 2.49253688 R-squared = 0.0000
                Adj R-squared = 0.0000
                Total 924.731183 371 2.49253688 Root MSE = 1.5788
                s2a_03 Coef. Std. Err. t P>t [95% Conf. Interval]
                ocupado 0 (omitted)
                _cons 10.13978 .0818558 123.87 0.000 9.978825 10.30074

                [CODE][ | ocupado s2a_03
                -------------+------------------
                ocupado | .
                s2a_03 | . 1.0000
                /CODE]


                Comment


                • #9
                  correlate output:

                  Code:
                               |  ocupado   s2a_03
                  -------------+------------------
                       ocupado |        .
                        s2a_03 |        .   1.0000

                  Comment


                  • #10
                    Alessandro:
                    are all your variables in numeric format now?
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      Yes Carlo, below the details:

                      Code:
                       
                      describe ocupado
                      storage display value
                      variable name type format label variable label
                      ocupado byte %8.0g OCUPADO Poblaci�n Ocupad

                      Code:
                       
                      describe s2a_03
                      storage display value
                      variable name type format label variable label
                      s2a_03 byte %8.0g S2 3. �Cu�ntos a�os cumplidos tiene?SI TIENE MENOS DE 1 A�O ANOTE 00.SI TI

                      Comment


                      • #12
                        You still have problems with formatting output (please keep reading FAQ Advice #12 and don't try variations of your own) but the problem reported in #6 and #8 and #9 is just the problem warned about in #2. Your predictor has just one distinct value. You threw out all the observations for the other value. So, your set-up is analogous to trying to fit a regression to data like this:


                        Code:
                        sysuse auto, clear 
                        scatter mpg foreign if foreign, xla(1, valuelabel)

                        Click image for larger version

Name:	meanonly.png
Views:	1
Size:	17.2 KB
ID:	1451332

                        So, remember that regression with one predictor means that you are trying to fit a straight line. That's easy: the straight line has intercept the mean of the response or outcome and slope zero. Regression makes perfect sense, but it's equally true that the predictor has no predictive value.

                        Further, the correlation is indeterminate because the variance of one variable, one term in the denominator, is precisely zero.

                        At a minimum you need both values for the predictor and perhaps other predictors too.

                        Comment

                        Working...
                        X