Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fail to create dummy by using forvalues with multiple construct variables in a panel dataset.

    Hi community,

    I want to create a dummy variable with a varlist (two categories, 3 non-continuous waves of construct variables each individually) via forvalues. But it reported "Invalid syntax". The dummy variable is expected to show as a three-section functions with three values. But I failed to do so with a loop function of 'forvalues' firstly. Secondly, when I use a hand-way of 'replace', it only shows the value of one section, but failed to show another two sections.

    Could experts help me to check it and figure out a solution for me? Thanks,

    //----------- My code is as below" ------------------
    Code:
    use data.dta
    
    tab r1shlt, m
    
    tab s1shlt, m
    
    
    forvalues i = 1/2,4 {
        foreach var of varlist r`i'shlt s`i'shlt {
            tab `var', m
            replace `var' = 1 if `var' == 5 & !missing(`var')
            replace `var' = 2 if `var' >= 3 & `var' <= 4 & !missing(`var')
            replace `var' = 3 if `var' >= 1 & `var' <= 2 & !missing(`var')
            replace `var' = . if missing(`var')
            rename `var' r`i'SelfH 
            label define selfh_labl`i' 1 "poor" 2 "fair" 3 "good"
            label val r`i'SelfH selfh_labl`i'
        }
    }
    //------------ The result is as below: -----------------
    . use data.dta

    .
    end of do-file

    . tab r1shlt, m

    r1shlt | Freq. Percent Cum.
    ------------+-----------------------------------
    | 7,796 30.57 30.57
    . | 5,095 19.98 50.55
    1.Excellent | 106 0.42 50.96
    2.Very good | 1,146 4.49 55.45
    3.Good | 2,274 8.92 64.37
    4.Fair | 5,918 23.20 87.57
    5.Poor | 3,169 12.43 100.00
    ------------+-----------------------------------
    Total | 25,504 100.00

    .
    end of do-file


    . tab s1shlt, m

    s1shlt | Freq. Percent Cum.
    ------------+-----------------------------------
    | 7,796 30.57 30.57
    . | 6,983 27.38 57.95
    1.Excellent | 91 0.36 58.30
    2.Very good | 980 3.84 62.15
    3.Good | 1,962 7.69 69.84
    4.Fair | 5,072 19.89 89.73
    5.Poor | 2,620 10.27 100.00
    ------------+-----------------------------------
    Total | 25,504 100.00

    .
    end of do-file

    . forvalues i = 1/2,4 {
    2. foreach var of varlist r`i'shlt s`i'shlt {
    3. tab `var', m
    4. replace `var' = 1 if `var' == 5 & !missing(`var')
    5. replace `var' = 2 if `var' >= 3 & `var' <= 4 & !missing(`var')
    6. replace `var' = 3 if `var' >= 1 & `var' <= 2 & !missing(`var')
    7. replace `var' = . if missing(`var')
    8. rename `var' r`i'SelfH
    9. label define selfh_labl`i' 1 "poor" 2 "fair" 3 "good"
    10. label val r`i'SelfH selfh_labl`i'
    11. }
    12. }
    invalid syntax
    r(198);

    end of do-file

    r(198);

    //---------- Change my code (below) to spot where the wrong with it -------------------.
    [CODE]
    tab r1shlt, m
    tab s1shlt, m

    gen r_r1SelfH = .
    encode r1shlt, generate(r1shlt_num)
    encode s1shlt, generate(s1shlt_num)
    tab r1shlt_num, m
    tab s1shlt_num, m

    replace r_r1SelfH = 1 if (r1shlt_num == 5 | s1shlt_num ==5) & !missing(r1shlt_num) | !missing(s1shlt_num)
    replace r_r1SelfH = 2 if (r1shlt_num == 3/4 | s1shlt_num == 3/4) & !missing(r1shlt_num) | !missing(s1shlt_num)
    replace r_r1SelfH = 3 if (r1shlt_num == 1/2 | s1shlt_num == 1/2) & !missing(r1shlt_num) | !missing(s1shlt_num)
    replace r_r1SelfH = . if (r1shlt_num == .| s1shlt_num == .)

    label define r_r1selfh_labl 1 "poor" 2 "fair" 3 "good"
    label val r_r1SelfH r_r1selfh_labl1
    tab r_r1SelfH, m

    tab r2shlt, m
    gen r_r2SelfH = .
    encode r2shlt, generate(r2shlt_num)
    encode s2shlt, generate(s2shlt_num)
    replace r_r2SelfH = 1 if (r2shlt_num == 5 | s2shlt_num ==5) & !missing(r2shlt_num) & !missing(s2shlt_num)
    replace r_r2SelfH = 2 if (r2shlt_num == 3/4 | s2shlt_num == 3/4) & !missing(r2shlt_num) & !missing(s2shlt_num)
    replace r_r2SelfH = 3 if (r2shlt_num == 1/2 | s2shlt_num == 1/2) & !missing(r2shlt_num) & !missing(s2shlt_num)

    label define r_r2selfh_labl 1 "poor" 2 "fair" 3 "good"
    label val r_r2SelfH r_r2selfh_labl1

    tab r_r2SelfH, m

    [CODE]

    //--------- It shows unexpected outcome as below ---------------------
    . tab r1shlt, m

    r1shlt | Freq. Percent Cum.
    ------------+-----------------------------------
    | 7,796 30.57 30.57
    . | 5,095 19.98 50.55
    1.Excellent | 106 0.42 50.96
    2.Very good | 1,146 4.49 55.45
    3.Good | 2,274 8.92 64.37
    4.Fair | 5,918 23.20 87.57
    5.Poor | 3,169 12.43 100.00
    ------------+-----------------------------------
    Total | 25,504 100.00

    . tab s1shlt, m

    s1shlt | Freq. Percent Cum.
    ------------+-----------------------------------
    | 7,796 30.57 30.57
    . | 6,983 27.38 57.95
    1.Excellent | 91 0.36 58.30
    2.Very good | 980 3.84 62.15
    3.Good | 1,962 7.69 69.84
    4.Fair | 5,072 19.89 89.73
    5.Poor | 2,620 10.27 100.00
    ------------+-----------------------------------
    Total | 25,504 100.00

    .
    . gen r_r1SelfH = .
    (25,504 missing values generated)

    . encode r1shlt, generate(r1shlt_num)

    . encode s1shlt, generate(s1shlt_num)

    . tab r1shlt_num, m

    r1shlt_num | Freq. Percent Cum.
    ------------+-----------------------------------
    . | 5,095 19.98 19.98
    1.Excellent | 106 0.42 20.39
    2.Very good | 1,146 4.49 24.89
    3.Good | 2,274 8.92 33.80
    4.Fair | 5,918 23.20 57.01
    5.Poor | 3,169 12.43 69.43
    . | 7,796 30.57 100.00
    ------------+-----------------------------------
    Total | 25,504 100.00

    . tab s1shlt_num, m

    s1shlt_num | Freq. Percent Cum.
    ------------+-----------------------------------
    . | 6,983 27.38 27.38
    1.Excellent | 91 0.36 27.74
    2.Very good | 980 3.84 31.58
    3.Good | 1,962 7.69 39.27
    4.Fair | 5,072 19.89 59.16
    5.Poor | 2,620 10.27 69.43
    . | 7,796 30.57 100.00
    ------------+-----------------------------------
    Total | 25,504 100.00

    .
    end of do-file


    . replace r_r1SelfH = 1 if (r1shlt_num == 5 | s1shlt_num ==5) & !missing(r1shlt_num) | !missing
    > (s1shlt_num)
    (17,708 real changes made)

    .
    end of do-file


    . replace r_r1SelfH = 2 if (r1shlt_num == 3/4 | s1shlt_num == 3/4) & !missing(r1shlt_num) | !mi
    > ssing(s1shlt_num)
    (17,708 real changes made)

    .
    end of do-file

    . do "C:\Users\ACER\AppData\Local\Temp\STD400c_000000.t mp"

    . replace r_r1SelfH = 3 if (r1shlt_num == 1/2 | s1shlt_num == 1/2) & !missing(r1shlt_num) | !mi
    > ssing(s1shlt_num)
    (17,708 real changes made)

    .
    end of do-file


    . replace r_r1SelfH = . if (r1shlt_num == .| s1shlt_num == .)
    (0 real changes made)

    .
    end of do-file


    . label define r_r1selfh_labl 1 "poor" 2 "fair" 3 "good"

    . label val r_r1SelfH r_r1selfh_labl1

    . tab r_r1SelfH, m

    r_r1SelfH | Freq. Percent Cum.
    ------------+-----------------------------------
    3 | 17,708 69.43 69.43
    . | 7,796 30.57 100.00
    ------------+-----------------------------------
    Total | 25,504 100.00

    .
    end of do-file


    . tab r2shlt, m

    r2shlt | Freq. Percent Cum.
    ------------+-----------------------------------
    | 6,892 27.02 27.02
    . | 1,021 4.00 31.03
    1.Excellent | 241 0.94 31.97
    2.Very good | 1,774 6.96 38.93
    3.Good | 2,512 9.85 48.78
    4.Fair | 9,232 36.20 84.97
    5.Poor | 3,832 15.03 100.00
    ------------+-----------------------------------
    Total | 25,504 100.00

    . gen r_r2SelfH = .
    (25,504 missing values generated)

    . encode r2shlt, generate(r2shlt_num)

    . encode s2shlt, generate(s2shlt_num)

    . replace r_r2SelfH = 1 if (r2shlt_num == 5 | s2shlt_num ==5) & !missing(r2shlt_num) & !missin
    > g(s2shlt_num)
    (12,714 real changes made)

    . replace r_r2SelfH = 2 if (r2shlt_num == 3/4 | s2shlt_num == 3/4) & !missing(r2shlt_num) & !mi
    > ssing(s2shlt_num)
    (0 real changes made)

    . replace r_r2SelfH = 3 if (r2shlt_num == 1/2 | s2shlt_num == 1/2) & !missing(r2shlt_num) & !mi
    > ssing(s2shlt_num)
    (0 real changes made)

    .
    . label define r_r2selfh_labl 1 "poor" 2 "fair" 3 "good"

    . label val r_r2SelfH r_r2selfh_labl1

    .
    . tab r_r2SelfH, m

    r_r2SelfH | Freq. Percent Cum.
    ------------+-----------------------------------
    1 | 12,714 49.85 49.85
    . | 12,790 50.15 100.00
    ------------+-----------------------------------
    Total | 25,504 100.00

    .
    end of do-file

    .
    .
    .

  • #2
    Your coding assumes generalizations of Stata syntax that are simply not allowed in Stata.

    Code:
    forvalues i = 1/2, 4 {
    is illegal: no commas are allowed in the list of numbers. This is what causes the "invalid syntax" error message. The entire loop is therefore skipped and nothing that you do after the loop will make any sense as a result. General principle: never ignore error messages. If you are running code and get an error message, execution stops (except under -capture noisily-; let's leave that aside). To then go ahead and run the rest of the code is just inviting garbage in to turn into garbage out. Whenever you get an error message, do not proceed until you identify and fix the problem that caused it. Don't ignore it. And don't build in a "work around" that suppresses the error message but does not fix the problem that caused it. Either way you are simply crunching garbage from that point on.

    Code:
    replace r_r2SelfH = 2 if (r2shlt_num == 3/4 | s2shlt_num == 3/4) & !missing(r2shlt_num) & !mi
    > ssing(s2shlt_num)
    Here we have an instance of legal syntax, but it does not do what I believe you think it does. The variables r2shlt_num and s2shlt_num, by the way they were constructed, take on only integer values. The comparison of either of these variables to 3/4 will always be false because 3/4 means three-fourths (0.75). It does not mean == 3 or == 4. Yes, in some contexts Stata allows 3/4 to refer to the numlist consisting of 3 and 4. But logical expressions are not among those contexts. If you were thinking that this command would compare those two variables to both 3 or 4, then you have misunderstood the code; it is not doing that. If you really did intend a comparison to the fraction three-fourths, then it seems you do not understand how you created those variables, because they are necessarily integers and cannot equal 0.75.

    Comment


    • #3
      Thanks for your advice, Clyde. I remove the comma in my loop, but the loop doesn't function well. Could you help to check how to turn on a loop with two-dimensional construct variables? Thanks,
      Code:
      . forvalues i = 1(1)3 {
        2.     foreach var of varlist r`i'shlt s`i'shlt {
        3.         encode `var', generate(`var'_num)
        4.         recode `var'_num (5=1)(3/4=2)(1/2=3) ///
      >                 (else=.), gen(r`i'SelfH)
        5.         label define selfh_labl`i' 1 "poor" 2 "fair" 3 "good"
        6.         label val r`i'SelfH selfh_labl`i'
        7.         drop `var'_num
        8.     }
        9. }
      (17708 differences between r1shlt_num and r1SelfH)
      variable r1SelfH already defined
      r(110);
      //-----------Paralysed loop below-----------------
      . forvalues i = 1(1)3 {
      2. foreach var of varlist r`i'shlt s`i'shlt {
      3. encode `var', generate(`var'_num)
      4. recode `var'_num (5=1)(3/4=2)(1/2=3) ///
      > (else=.), gen(r`i'SelfH)
      5. label define selfh_labl`i' 1 "poor" 2 "fair" 3 "good"
      6. label val r`i'SelfH selfh_labl`i'
      7. drop `var'_num
      8. }
      9. }
      (17708 differences between r1shlt_num and r1SelfH)
      variable r1SelfH already defined
      r(110);

      end of do-file

      r(110);

      . drop r1SelfH

      . drop r2SelfH
      variable r2SelfH not found
      r(111);

      end of do-file

      r(111);

      . drop r3SelfH
      variable r3SelfH not found
      r(111);

      end of do-file

      r(111);

      . drop s1shlt_num

      .
      end of do-file

      . drop s2shlt_num
      variable s2shlt_num not found
      r(111);

      end of do-file

      r(111);

      . drop s3shlt_num
      variable s3shlt_num not found
      r(111);

      end of do-file

      r(111);

      . drop var_num
      variable var_num not found
      r(111);

      .

      Comment


      • #4
        First time around the outer loop with i set to 1, you are looking at variables

        Code:
        r1shlt s1shlt
        and in each case you are trying to produce a new variable

        Code:
        r1SelfH
        that works with r1shlt but not with s1shlt as r1SelfH already exists, which is the explicit error message.

        In the second case you perhaps would prefer a new variable

        Code:
        s1SelfH 
        but your code doesn't produce it,

        It might be easier not to loop over two variables and just write code for those two cases.
        Last edited by Nick Cox; 05 Aug 2023, 05:57.

        Comment


        • #5
          Your two loops can be rewritten as follows, I think, although I can't test anything.

          Code:
          label def selfh_labl 1 poor 2 fair 3 good
          
          foreach v in r1 r2 r3 s1 s2 s3 {
          
              encode `v'shlt, gen(`v'SelfH)
              recode `v'SelfH (5=1) (3/4=2) (1/2=3) (else=.)
              label var `v'SelfH selfh_labl
          
          }
          There are perhaps three points of programming or Stata principle worth commenting on.

          1. I create one set of value labels and apply it in turn to the results of the loop, the 6 new variables. I see no point in 6 identical sets of value labels. You could do the application outside the loop.

          2. A loop over 3 cases and a loop over 2 cases can both be trivial. In this case if we combine them, you get simpler code.

          3. You don't have to generate a new variable with recode. Its purpose is to recode an existing variable.
          Last edited by Nick Cox; 05 Aug 2023, 06:30.

          Comment


          • #6
            Thanks Nick, you were coding elegantly. But the outcome variable r1SelfH (built on r1shlt & slshlt), r2SelfH (built on r2shlt & s2shlt) and r3SelfH (built on r3shlt & s3shlt) are my targeted results. If you said I need to introduce s1SelfH, s2SelfH and s3SelfH in between, how can I map, or perhaps rename them into rISelfH? It would be ambiguous thereby. Besides, when running your code, the value of 3 disappear in the generated var, which seems strange. Do you have any idea about it?

            //--------- Running results in Stata ------------------
            . label def selfh_labl 1 poor 2 fair 3 good

            .
            end of do-file

            . foreach v in r1 r2 r3 s1 s2 s3 {
            2.
            . encode `v'shlt, gen(`v'SelfH)
            3. recode `v'SelfH (5=1) (3/4=2) (1/2=3) (else=.)
            4. label var `v'SelfH selfh_labl
            5.
            . }
            (r1SelfH: 17708 changes made)
            (r2SelfH: 18612 changes made)
            (r3SelfH: 21097 changes made)
            (s1SelfH: 17708 changes made)
            (s2SelfH: 18604 changes made)
            (s3SelfH: 21093 changes made)

            .
            end of do-file

            . tab r1SelfH, m

            selfh_labl | Freq. Percent Cum.
            ------------+-----------------------------------
            . | 4,035 15.82 15.82
            1.Very good | 11,581 45.41 61.23
            2.Good | 1,246 4.89 66.12
            . | 8,642 33.88 100.00
            ------------+-----------------------------------
            Total | 25,504 100.00

            .
            end of do-file


            . tab s1SelH, m
            variable s1SelH not found
            r(111);

            end of do-file

            r(111);


            . tab r2SelfH, m

            selfh_labl | Freq. Percent Cum.
            ------------+-----------------------------------
            . | 9,232 36.20 36.20
            1.Excellent | 4,286 16.81 53.00
            2.Very good | 1,262 4.95 57.95
            . | 10,724 42.05 100.00
            ------------+-----------------------------------
            Total | 25,504 100.00

            .
            end of do-file


            . tab s2SelH, m
            variable s2SelH not found
            r(111);

            end of do-file

            r(111);


            . tab r3SelfH, m

            selfh_labl | Freq. Percent Cum.
            ------------+-----------------------------------
            . | 10,537 41.32 41.32
            1.Excellent | 4,828 18.93 60.25
            2.Very good | 1,675 6.57 66.81
            . | 8,464 33.19 100.00
            ------------+-----------------------------------
            Total | 25,504 100.00

            .
            end of do-file

            .
            . tab s3SelfH, m

            selfh_labl | Freq. Percent Cum.
            ------------+-----------------------------------
            . | 9,064 35.54 35.54
            1.Excellent | 4,194 16.44 51.98
            2.Very good | 4,587 17.99 69.97
            . | 7,659 30.03 100.00
            ------------+-----------------------------------
            Total | 25,504 100.00

            .
            end of do-file

            .

            Comment


            • #7
              You asked for

              Code:
              tab s2SelH, m
              which doesn't exist because the code created s2SelfH instead. So that was a typo.

              Otherwise your aim is now unclear. How are you going to combine e.g. r1SelfH and s1SelfH if that is what you want? This has not been explained as an aim, and how you want to do that is not clear to me.

              Now I see your results I notice a typo in my own code

              Code:
               
               label var 
              should be
              Code:
              label val
              Sorry about that.


              Comment


              • #8
                Thanks, Nick. My outcome variable should be one vector rather than two vectors. The outcome variables of r1SelfH (constructed based on r1shlt & s1shlt1) for wave 1, r2SelfH (constructed based on r2shlt & s2shlt1) for wave 2, r3SelfH (constructed based on r3shlt & s3shlt1) for wave 3 is my aim. But it turned out that r1SelfH (constructed based on r1shlt), slSelfH(constructed based on s1shlt) for wave 1; r2SelfH (constructed based on r2shlt), s2SelfH(constructed based on s2shlt) for wave 2; r3SelfH (constructed based on r3shlt), s3SelfH(constructed based on s3shlt) happened after running your modified code.

                //------------ Running results in Stata -----------------
                . foreach v in r1 r2 r3 s1 s2 s3 {
                2.
                . encode `v'shlt, gen(`v'SelfH)
                3. recode `v'SelfH (5=1) (3/4=2) (1/2=3) (else=.)
                4. label val `v'SelfH selfh_labl
                5.
                . }
                (r1SelfH: 17708 changes made)
                (r2SelfH: 18612 changes made)
                (r3SelfH: 21097 changes made)
                (s1SelfH: 17708 changes made)
                (s2SelfH: 18604 changes made)
                (s3SelfH: 21093 changes made)

                .
                end of do-file

                . tab r1SelfH, m

                r1SelfH | Freq. Percent Cum.
                ------------+-----------------------------------
                poor | 4,035 15.82 15.82
                fair | 11,581 45.41 61.23
                good | 1,246 4.89 66.12
                . | 8,642 33.88 100.00
                ------------+-----------------------------------
                Total | 25,504 100.00

                .
                end of do-file

                . tab s1SelfH, m

                s1SelfH | Freq. Percent Cum.
                ------------+-----------------------------------
                poor | 5,072 19.89 19.89
                fair | 2,942 11.54 31.42
                good | 7,074 27.74 59.16
                . | 10,416 40.84 100.00
                ------------+-----------------------------------
                Total | 25,504 100.00

                . tab r2SelfH, m

                r2SelfH | Freq. Percent Cum.
                ------------+-----------------------------------
                poor | 9,232 36.20 36.20
                fair | 4,286 16.81 53.00
                good | 1,262 4.95 57.95
                . | 10,724 42.05 100.00
                ------------+-----------------------------------
                Total | 25,504 100.00

                .
                end of do-file

                . tab s2SelfH, m

                s2SelfH | Freq. Percent Cum.
                ------------+-----------------------------------
                poor | 7,858 30.81 30.81
                fair | 3,648 14.30 45.11
                good | 3,952 15.50 60.61
                . | 10,046 39.39 100.00
                ------------+-----------------------------------
                Total | 25,504 100.00

                .
                end of do-file

                . tab r3SelfH, m

                r3SelfH | Freq. Percent Cum.
                ------------+-----------------------------------
                poor | 10,537 41.32 41.32
                fair | 4,828 18.93 60.25
                good | 1,675 6.57 66.81
                . | 8,464 33.19 100.00
                ------------+-----------------------------------
                Total | 25,504 100.00

                .
                end of do-file

                . tab s3SelfH, m

                s3SelfH | Freq. Percent Cum.
                ------------+-----------------------------------
                poor | 9,064 35.54 35.54
                fair | 4,194 16.44 51.98
                good | 4,587 17.99 69.97
                . | 7,659 30.03 100.00
                ------------+-----------------------------------
                Total | 25,504 100.00

                .
                end of do-file


                Comment


                • #9
                  I can't tell from this whether you have any remaining question. "based on r1shlt & s1shlt1" in #8 (where I think you mean s1shlt) is is not clearer to me than "built on r1shlt & slshlt" in #6 (where agaom I think you mean s1shlt).

                  If one variable is to be formed based on, built on or by combining two variables, you need a rule, perhaps using max(,) or min(,), for how o do that.

                  Comment


                  • #10
                    You're right Nick. It was a typo that it should be s1shlt other than slshlt1. I'll use them as two separate vectors. Thanks, Nick.

                    Comment

                    Working...
                    X