Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replace missing value in one variable with value in another variable

    PLEASE HELP...

    I have 100+ observations. I have the following 7 variables but for each observation I have data for only 4 of the variables. I don't want to eliminate any data, I just want to tell STATA that if the value in Avg(period 0, section 1)t=0 is missing then it should replace that missing value with the value in the variable below it i.e. Avg(period 0, section 1)t=-1,
    if the value in Avg(period 1, section 1)t=0 is missing then it should replace that missing value with the value in the variable below it i.e. Avg(period 1, section 1)t=-1,
    if the value in Avg(period 1, section 2)t=0 is missing then it should replace that missing value with the value in the variable below it i.e. Avg(period 1, section 2)t=-1,

    Avg(period 0, section 0)t=0 Avg(period 0, section 1)t=0 Avg(period 1, section 1)t=0 Avg(period 1, section 2)t=0
    Avg(period 0, section 1)t=-1 Avg(period 1, section 1)t=-1 Avg(period 1, section 2)t=-1

    How can I tell STATA to do that so that it would not think that I have missing data and would allow me to conduct my regression?

    I am hoping to be able to get a coefficient for each of the 7 variables, and a standard error value. Would that be possible if I bunged in a command that tells STATA to replace missing values in one variables with values from another variable?

    I would be so incredibly grateful if any of you can shed any light on this dilemma.

  • #2
    Variable below it or record below it? i.e. is this panel data? Listing a few records might help clarify how your data is set up and what you want.

    Incidentally Statalist etiquette is to have your user name be your real full name. Otherwise we'd have users with names like Struggling with Stata 17, Struggling with Stata 264, Struggling with Stata 592...;-) If you'd like to change your username you can click on the Contact Us button on the lower right.
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    StataNow Version: 19.5 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      The answer to your question depends on how you have named your variables and how they are arrayed. Things like "Avg(period 0, section 0)t=0" are not legal Stata variable names, so they can't be the names in your data set. Please show us a sample of your actual data, and then another version showing the results you would like to get from that data. Then somebody may be able to help you.

      Also, please click on "CONTACT US" and change your user name to your real first and last name. It is the expectation on this forum that people will use their real names.

      Comment


      • #4
        Thank you for the quick replies Richard and Clyde. I have taken note of what you said regarding my username and have submitted a request for that to be altered via CONTACT US.

        As for the Stata dilemma:
        P(A)(0)(0)d=0 P(A)(0)(1)d=0 P(A)(1)(1)d=0 P(A)(1)(2)d=0 P(A)(0)(1)d=-1 P(A)(1)(1)d=-1 P(A)(1)(2)d=-1
        Above are the variable names used in Stata.

        Here is an example of the data that I have:
        P(A)(0)(0)d=0 P(A)(0)(1)d=0 P(A)(1)(1)d=0 P(A)(1)(2)d=0 P(A)(0)(1)d=-1 P(A)(1)(1)d=-1 P(A)(1)(2)d=-1
        0.1667 0.6875 1.0000 0.3125
        0.5000 0.3667 0.3750 0.4375
        0.3333 0.2813 0.6667 0.3438
        0.0000 0.1333 0.1250 0.0882
        0.6667 0.5313 0.0882 0.1667
        0.6667 0.2333 0.3750 0.3000
        0.3750 0.3929 0.2000 0.6000
        0.3750 0.2000 0.2500 0.2000
        0.2500 0.5667 0.1875 0.3571
        0.1250 0.6875 1.0000 0.2667
        0.3750 0.5000 0.4000 0.3000
        0.5000 0.2143 0.4000 0.2857
        0.1667 0.2000 0.5000 0.2333
        0.3333 0.3333 0.2500 0.1875
        0.1667 0.4375 0.6667 0.0882
        0.6250 0.4333 0.2500 0.3929
        0.2500 0.4333 0.6250 0.0882
        0.3750 0.3667 0.5000 0.3000
        0.0000 0.0000 0.1250 0.1429
        0.2500 0.3333 0.2500 0.2000
        0.0000 0.2500 0.0000 0.1071
        At the moment, when I try to run a regression I get an error message stating that there are no observations. I need to somehow tell stata to do the above stated replacements whilst still keeping track of the fact that there are 7 variables because I would like it to generate 7 coefficients.

        I am looking to generate a table that looks like the following, except mine would only contain the 'current division (d=0) and 'one division lower (d=-1)'.


        1. Win ratios over previous 24 months ( P d , Pd )


        [IMG]file:///page6image3128[/IMG] Home team (i)[IMG]file:///page6image4352[/IMG] [IMG]file:///page6image4512[/IMG] [IMG]file:///page6image4672[/IMG]


        0–12 months (y=0) 12–24 months (y=1)



        Current Last Last Two
        Matches played season season season seasons
        (s=0) (s=1) (s=1) ago (s=2)



        Two divisions higher (d = 2) 0.362
        (0.610)
        One division higher (d = 1) 1.915*** 0.828*** 0.493**
        (0.256) (0.213) (0.206)

        Current division (d = 0) 1.769*** 1.207*** 0.659*** 0.464***
        (0.153) (0.139) (0.131) (0.131)

        One division lower (d = -1) 0.888*** 0.470*** 0.421***
        (0.124) (0.117)
        (0.109)

        Two divisions lower (d = -2) -0.051
        (0.191)

        [IMG]file:///page6image21984[/IMG] [IMG]file:///page6image22408[/IMG] [IMG]file:///page6image22568[/IMG] [IMG]file:///page6image22728[/IMG] [IMG]file:///page6image22888[/IMG] [IMG]file:///page6image23312[/IMG]


        2. Most recent match results (Ri,m, Ri,n, R,j,n, R,j,m)









        Last edited by Owen Keating; 31 Jul 2014, 21:30.

        Comment


        • #5
          Sorry about the margins. That happened in a hurry. The order of the column is still consistent with the order of the headings but if visually the above is distracting I can fix it up and repost.

          Comment


          • #6
            The variable names you give above cannot possibly be the variable names in your Stata data set. To quote from the on-line help about variable names:
            Variable names may be 1 to 32 characters long and must start with a-z, A-Z, or _, and the remaining characters may be a-z, A-Z, _, or 0-9.
            Perhaps what you are showing is from a spreadsheet application? When you import that data to Stata, those names will necessarily be changed.

            But at least I glean from this that you have your data in wide format. I'm going to pretend that the variable names are pa_0_0_0, pa_0_1_0, pa_1_1_0, pa_1_2_0, pa_0_1_1, pa_1_1_1, and pa_1_2_1. Then, to replace the missing values of the *_0 variables with the corresponding values of the *_1 variables you can do:
            Code:
            foreach x in pa_0_1 pa_1_1 pa_1_2 {
                 replace `x'_0 = `x'_1 if missing(`x'_0)
            }
            That should accomplish what you originally requested. But I do not understand the last sentence of your most recent post, so I can't help you with that.

            This will still leave you with very few observations that have non-missing values for the *_1 (your d=-1) variables, so you will probably end up with a very small sample for your regression. I also note that if there are a large number of observations with missing values in the *-0 (your d=0) variables, then you will also have near-collinearity between the *_0 and *-1 variables which, especially in a total sample of only 100, is likely to lead to uninterpretable results. (But if only a few observations are affected, then this won't be a big problem.)

            Comment


            • #7
              Thank you for the feedback. You are right about Stata changing the names - it removed the parenthesis and the d=x at the end of the variables. Sorry about that. You can probably already guess that I am a bit of a novice when it comes to Stata.
              As for the last part - the image I pasted into the above comment wasn't showing up so I typed up the way the results are supposed to look but the alignment has changed completely. I have now attached a document with what the results table should look like. Do you think you could maybe have a look at it and give me some more feedback?


              Attached Files

              Comment


              • #8
                I'm glad Clyde has stuck with this. I have a feeling this may not be that hard, but part of the problem is that everything is super hard to read. I suggest that you get a Stata file created if you haven't already. Make sure the var names are intelligible. Then, give a command like: list in 1/10. Copy the output. Then, click on the Advanced button (next to the smiley on the upper right), and choose code. Then, paste the output you got from Stata. That will be much, much easier to read. Use this kind of procedure whenever you want to present syntax and output. There is a Sandbox forum if you want to try out different forum and formatting features.
                -------------------------------------------
                Richard Williams, Notre Dame Dept of Sociology
                StataNow Version: 19.5 MP (2 processor)

                EMAIL: [email protected]
                WWW: https://www3.nd.edu/~rwilliam

                Comment


                • #9
                  Code:
                  | FullTi~t PL_y0_s0 PL_y0_s1 PL_y1_s1 PL_y1_s2 CL_y0_s1 CL_y1_s1 CL_y1_s2 |
                  |----------------------------------------------------------------------------------------------|
                  1. | 1 1 .91666667 1 .72222222 . . . |
                  2. | 1 .5 .66666667 0 .58333333 . . . |
                  3. | 1 .5 .67647059 .5 .41666667 . . . |
                  4. | 1 .5 .79411765 .5 .80555556 . . . |
                  5. | 2 .5 .76470588 .5 .64705882 . . . |
                  |----------------------------------------------------------------------------------------------|
                  6. | 2 0 .55882353 0 .58333333 . . . |
                  7. | 2 1 . . .47222222 .76190476 .75 . |
                  8. | 1 0 .5 .66666667 .63888889 . . . |
                  9. | 2 .75 .65625 .66666667 .44117647 . . . |
                  10. | 3 .5 .76470588 .5 .64705882 . . . |
                  |----------------------------------------------------------------------------------------------|
                  11. | 1 1 .85294118 1 .91666667 . . . |
                  12. | 3 .16666667 .53125 .33333333 .58823529 . . . |
                  13. | 1 .75 .29411765 0 .6875 . . . |
                  14. | 2 .75 .67647059 .5 .61764706 . . . |
                  15. | 2 .75 .78125 .66666667 .8125 . . . |
                  |----------------------------------------------------------------------------------------------|
                  16. | 3 1 .82352941 1 .67647059 . . . |
                  17. | 3 .5 . . .44117647 .86842105 1 . |
                  18. | 1 .5 .8 .75 .75 . . . |
                  19. | 2 .66666667 .63333333 .75 .40625 . . . |
                  20. | 1 1 .90625 1 .75 . . . |

                  Comment


                  • #10
                    Code:
                    FullTi~t    PL_y0_s0    PL_y0_s1    PL_y1_s1    PL_y1_s2    CL_y0_s1    CL_y1_s1    CL_y1_s2 
                    ----------------------------------------------------------------------------------------------
                    1.         1           1   .91666667           1   .72222222           .           .           . 
                    2.         1          .5   .66666667           0   .58333333           .           .           . 
                    3.         1          .5   .67647059          .5   .41666667           .           .           . 
                    4.         1          .5   .79411765          .5   .80555556           .           .           . 
                    5.         2          .5   .76470588          .5   .64705882           .           .           . 
                    ----------------------------------------------------------------------------------------------
                    6.         2           0   .55882353           0   .58333333           .           .           . 
                    7.         2           1           .           .   .47222222   .76190476         .75           . 
                    8.         1           0          .5   .66666667   .63888889           .           .           . 
                    9.         2         .75      .65625   .66666667   .44117647           .           .           . 
                    10.         3          .5   .76470588          .5   .64705882           .           .           . 
                    ----------------------------------------------------------------------------------------------
                    11.         1           1   .85294118           1   .91666667           .           .           . 
                    12.         3   .16666667      .53125   .33333333   .58823529           .           .           . 
                    13.         1         .75   .29411765           0       .6875           .           .           . 
                    14.         2         .75   .67647059          .5   .61764706           .           .           . 
                    15.         2         .75      .78125   .66666667       .8125           .           .           . 
                    ----------------------------------------------------------------------------------------------
                    16.         3           1   .82352941           1   .67647059           .           .           . 
                    17.         3          .5           .           .   .44117647   .86842105           1           . 
                    18.         1          .5          .8         .75         .75           .           .           . 
                    19.         2   .66666667   .63333333         .75      .40625           .           .           . 
                    20.         1           1      .90625           1         .75           .           .           .

                    Comment


                    • #11
                      Thank you for the instructions Richard. It finally looks presentable.

                      Comment


                      • #12
                        Looking much better. But, how did you get that listing of the data? Usually the output from list looks even better, e.g.

                        Code:
                        . webuse nhanes2f, clear
                        
                        . list age height weight in 1/10
                        
                             +------------------------+
                             | age    height   weight |
                             |------------------------|
                          1. |  54   174.598    62.48 |
                          2. |  41   152.297    48.76 |
                          3. |  21   164.098    67.25 |
                          4. |  63   162.598    94.46 |
                          5. |  64   163.098    74.28 |
                             |------------------------|
                          6. |  63   147.098       66 |
                          7. |  67   153.898    54.55 |
                          8. |  57       160    58.97 |
                          9. |  68       164    68.95 |
                         10. |  68   176.598    65.43 |
                             +------------------------+
                        -------------------------------------------
                        Richard Williams, Notre Dame Dept of Sociology
                        StataNow Version: 19.5 MP (2 processor)

                        EMAIL: [email protected]
                        WWW: https://www3.nd.edu/~rwilliam

                        Comment


                        • #13
                          Getting back to the reason you came here in the first place -- did you try Clyde's code and if so did it work? I am not sure if we are part way to where you want to be or still at square 1.
                          -------------------------------------------
                          Richard Williams, Notre Dame Dept of Sociology
                          StataNow Version: 19.5 MP (2 processor)

                          EMAIL: [email protected]
                          WWW: https://www3.nd.edu/~rwilliam

                          Comment


                          • #14
                            Explanations involving descriptions like this:
                            i.e. Avg(period 0, section 1)t=-1,
                            if the value in Avg(period 1, section 1)t=0 is missing then it should replace that missing value with the value in the variable below it i.e. Avg(period 1, section 1)t=-1,
                            if the value in Avg(period 1, section 2)t=0 is missing then it should replace that missing value with the value in the variable below it i.e. Avg(period 1, section 2)t=-1,

                            Avg(period 0, section 0)t=0 Avg(period 0, section 1)t=0 Avg(period 1, section 1)t=0 Avg(period 1, section 2)t=0
                            Avg(period 0, section 1)t=-1 Avg(period 1, section 1)t=-1 Avg(period 1, section 2)t=-1
                            and variable names like this
                            FullTi~t PL_y0_s0 PL_y0_s1 PL_y1_s1 PL_y1_s2 CL_y0_s1 CL_y1_s1 CL_y1_s2
                            are really hard to follow (for me, at least). If you want to get people's attention, you should put more effort in explaining coherently your problem. This includes reading the complete advice on posting questions, found in the forum FAQ.

                            Richard was asking about the listing. Here is one from -list-:

                            Code:
                            . list
                            
                                 +-----------------------------------------------------------------------------------+
                                 | Full   PL_y0_s0   PL_y0_s1   PL_y1_s1   PL_y1_s2   CL_y0_s1   CL_y1_s1   CL_y1_s2 |
                                 |-----------------------------------------------------------------------------------|
                              1. |    1          1   .9166667          1   .7222222          .          .          . |
                              2. |    1         .5   .6666667          0   .5833333          .          .          . |
                              3. |    1         .5   .6764706         .5   .4166667          .          .          . |
                              4. |    1         .5   .7941176         .5   .8055556          .          .          . |
                              5. |    2         .5   .7647059         .5   .6470588          .          .          . |
                                 |-----------------------------------------------------------------------------------|
                              6. |    2          0   .5588235          0   .5833333          .          .          . |
                              7. |    2          1          .          .   .4722222   .7619048        .75          . |
                              8. |    1          0         .5   .6666667   .6388889          .          .          . |
                              9. |    2        .75     .65625   .6666667   .4411765          .          .          . |
                             10. |    3         .5   .7647059         .5   .6470588          .          .          . |
                                 |-----------------------------------------------------------------------------------|
                             11. |    1          1   .8529412          1   .9166667          .          .          . |
                             12. |    3   .1666667     .53125   .3333333   .5882353          .          .          . |
                             13. |    1        .75   .2941177          0      .6875          .          .          . |
                             14. |    2        .75   .6764706         .5   .6176471          .          .          . |
                             15. |    2        .75     .78125   .6666667      .8125          .          .          . |
                                 |-----------------------------------------------------------------------------------|
                             16. |    3          1   .8235294          1   .6764706          .          .          . |
                             17. |    3         .5          .          .   .4411765   .8684211          1          . |
                             18. |    1         .5         .8        .75        .75          .          .          . |
                             19. |    2   .6666667   .6333333        .75     .40625          .          .          . |
                             20. |    1          1     .90625          1        .75          .          .          . |
                                 +-----------------------------------------------------------------------------------+
                            You should:

                            1. Read the FAQ carefully.

                            2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

                            3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

                            4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

                            Comment


                            • #15
                              So, I think my earlier approach will work, with appropriate modification to reflect the actual variable names:

                              Code:
                              foreach x in y0_s1    y1_s1    y1_s2     {
                                   replace PL_`x' = CL_`x' if missing(PL_`x')
                              }

                              Comment

                              Working...
                              X