Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using loop to replace missing value of the expenditure of an item with the median value of the particular item within a PSU

    Dear All

    I am new to the group and I would like to know how can we use loop to replace missing value of the different food item with the median value of the food item within a PSU. If I use "replace" I have to type the command 132 times since there are 132 items.

    Please advise if there is an easier way to do
    I wanted to attach my dataset but the attachment does not allow me to do. In my data set there is a variable called "missing" which identifies the six HH with missing info for all food items, For these HH I want to replace the values of each item by the median value of the item within that PSU. The item names are fexp_1111, fexp_1112........till fexp_2232
    The names of the PSU with missing info are :
    1510506
    2040002
    7080001
    10030003
    11510404
    17020003
    Thank you in advance. If there is way where we can upload the STATA dataset please let me know.



  • #2
    Dear All

    I was able to upload it in excel so please find attached my STATA file exported in Excel

    Thank you
    Attached Files

    Comment


    • #3
      The best way to attach an example of your data is with dataex. See more explanation in point 12.2 of the FAQ: https://www.statalist.org/forums/help#stata

      Comment


      • #4
        Thank you but I was able to export it in excel and the have attached my dataset

        Comment


        • #5
          Yes, but using dataex is much preferred.
          This gives us all the required info on your data, and makes importing it into Stata a whole lot easier for people on this forum.
          Pleae have another look at that section of the FAQ, this is all explained there.

          Comment


          • #6
            I have uploaded an example set using dataex. In houseid =="127b3ea7729b4e048da45f75378193d8" all items have missing values so I want to replace all items with the median value of that item within that psu (in this HH the psu_num==1010001) and similarly for other HHs with missing values for all items replace with the median value of the item within that PSU. How to do that .

            Thank you for showing how to use dataex.
            Attached Files

            Comment


            • #7
              Example data:
              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input str96 houseid long psu_num float(fexp1111_1 fexp1111_2 fexp1111_3 fexp1113_1 fexp1113_2 fexp1113_3 fexp1114_1 fexp1114_2 fexp1114_3 fexp1115_2 fexp1115_3 fexp1116_2 fexp1116_3 fexp1117_1 fexp1117_2 fexp1117_3 fexp1118_2 fexp1118_3)
              "127b3ea7729b4e048da45f75378193d8" 1010001         .         .         .         .         .         .        .         .         .        .         .         .        .         .        .     .         .         .
              "4b98219f2a8241c0af3f825bc9acc4e9" 1010001  260.7143       250         .         .         .         .        .         . 1086.3096        .         .       200        .       100        .     .       200         .
              "e6cec72958af415284fdea43f64174a8" 1010001  816.6667       500         .  466.6667         .         .        .         .       250        .         . 166.66667        .       300        .     .         .       600
              "43f9a06ef3f543b7acfbba2533b70e53" 1010001  2666.667       375         .         .         .       250        .         .       200        .         .      87.5        .        70        .     .         .       450
              "d1084dfc6b35470c919694abcfeb7966" 1010001 1191.6666       150         .         .         .         .        .         .      1500        .         .  83.33334        .       525        .     .       250         .
              "81245b4540a848e9a711f6db435e8118" 1010002         .         .         .         .         .         .        .         .         .        .         .         .        .         .        .     .         .         .
              "f32a3f29ba324b9598b6d7b18e25fdc9" 1010002    1237.5       625         .  566.6667         .         .        .         .       625        .         .       625        .       345        .     .         .       875
              "7299386bd3a0429890c443416c3aef80" 1010002 1533.3334       700         .         .         . 108.33334        .         .      1620        .         .  333.3333        .         .        . 287.5         .       625
              "1556b5383482446f85ee5e6e0cd05bac" 1010002      1350 2172.6191         .       225         .       500        .         .  651.7857        .         .  869.0476        .       750        .     .         .       300
              "da62f6abf63241d0ada1ebb86101b0f9" 1010002         .         .         .         .         .         .        .         .         .        .         .         .        .         .        .     .         .         .
              "337d60584ab04e2a8b48417780e7294c" 1010002         .         .         .         .         .         .        .         .         .        .         .         .        .         .        .     .         .         .
              "94fd9fce8eb845c4b45978a98e9f8edb" 1010002         .         .         .         .         .         .        .         .         .        .         .         .        .         .        .     .         .         .
              "46be0fa372c64415bad8b0afa093757a" 1010003       750       300         .         .         .         .        .         .       160        .         .       200        .       180        .     .         .         .
              "24ea49f6ee2c4a17a8afb7a964de8027" 1010003      1300         .      1000         .         .         .        .         .       300      100         .       220        .        90        .     .         .        80
              "6190349c4ab945c7a6e85322a30a8e21" 1010003      1300      1000         .         .         .         .        .         .       250        .         .      1000        .      1250        .     .         .       300
              "79a10522d5fc4f948b618768db84edfa" 1010003      2000         .         .         .         .         .        .         .       500      200         .      1000        .       700        .     .         .      1200
              "849e16354df84a2eb7dca2d699b3706e" 1010003      1250  5214.286         .         .         .         .        . 1303.5714  869.0476        .         . 1738.0952        . 208.33333        .     . 1738.0952         .
              "f18a3dcdba0e4674a9df6e7b06535f34" 1010003      1200  83.33334         .         .         .       500        .         .       400        .         .        50        .       700        .     .         .       750
              "de7ac9416a394561b29eaf8d8ccb1fd5" 1010003       750 1303.5714         .       150         .         .        .         . 325.89285        .         .         .       25  391.0714        .     .         . 325.89285
              "779749672d3047f48273e45d8c504848" 1010003         .       250         .         .         .         .        .       500       600        .         .         .        .       105        .     .         .         .
              "d418df9ad99e43bea2c3d1e167e00169" 1010003      2500  869.0476         .         .         .         .        . 1303.5714  651.7857        .         .       200        .       250        .     .  41.66667         .
              "ce7ebca88de64eb89c35db3e1428eb03" 1010003      1400       250         .         .         .  733.3333        .  46.66667     112.5        .         . 133.33333        . 166.66667        .     .         .       250
              "a8aa1e77b6e34365b28a2ac623d7fdfa" 1010004  260.7143       500         .         .         .         .        .       300         .        .         .       300        . 116.66666        .     .         .         .
              "26985645e02d4995a6ce853c61273cec" 1010004       750  651.7857         .         .         .  217.2619        .         .  651.7857        .         . 108.63095        .  391.0714        .     .         .  217.2619
              "9a3ccb94199f465392028a4ed5eb4dbc" 1010004       680         .         .         .         .         .        .         . 33.333332        .         .      37.5        .        70        .     .         .         .
              "65b801d64b304381864458391f7ce681" 1010004      1350       500         .         .         .         .        .         .        75        .         .       105        .       350        .     .       285         .
              "c3c53c4663ad4c71911a04933964292e" 1010004       600      1000         .         .         .         .        .         .       300        .         .       400        .       350        .     .         .         .
              "a252391a1a644ee7b30e071c53fd6025" 1010004      1350  83.33334         .         .         .         .        .       240         .        .         .         .        .       350        .     .       110         .
              "d7d232d7763844b4ba353abea0b0e46e" 1010004  391.0714      2500         .         .         .         .        .       500         . 869.0476         . 1738.0952        .      1000        .     .         .       300
              "ba13c21e75764cc88447d72706288478" 1010004       870       450         .         .         .         .        .         .      1000        .         . 66.666664        .       100        .     .         .       125
              "a2e2e2164d314f83b95bdadf9cd4bd84" 1010004       650         .         .         .         .         .        .         .         .        .         .         .        .        35        .     .         .         .
              "31884813f65e4da1b76f905a1d465fb7" 1010004       890  83.33334         .         .         .         .        .       100         .        .         .        75        .       245        .     .         .         .
              "5caae12b154f4ec9b9e9146cfddb3115" 1010005      1400 325.89285         .         .         . 325.89285        .         .  434.5238        .         .         .        .  195.5357        .     .         .  651.7857
              "c8089623857149efb72d91f103c70bbe" 1010005      1400       200         .         .         .         .        .       120         .     1000         .       400        .       200        .     . 66.666664         .
              "b3dbe494399441e48e233a2733b38885" 1010005      1600  1955.357         .         .         .  651.7857 869.0476  651.7857         . 260.7143         .  651.7857        .  195.5357        .     .         . 1303.5714
              "fe7d360957454132801f7f154d2d03c0" 1010005       250         .         .         .         .         .        .        80         .      180         .         .        .        40        .     .         .         .
              "43a4f7a30f7f41bdabeac9b309cec6d6" 1010005  583.3333         .         .         .         .         .        .         .  83.33334        .         .         .        .  5.833333        .     .         .       150
              "fced044d186e4d28b9f3bebd165dca2e" 1010005      1450  83.33334         .      8400         .         .        .         .      1050        .         .       200        .       125        .     .         .       250
              "8217a955ad0b487c901b6e5a51681b34" 1010005       880      1500         .  604.1667         .         .        .       150       125        .         . 33.333332        .       525        .     .        25       250
              end
              Code:
              Code:
              foreach var of varlist fexp1111_1-fexp1118_3{
              bysort psu_num: egen `var'_meanbypsu=mean(`var')
              replace `var'=`var'_meanbypsu if `var'==.
              drop `var'_meanbypsu
              }
              Note the line
              Code:
              drop `var'_meanbypsu
              is optional and only needs to be included if you wouldnt use these variables afterwards.

              Also note that attaching a dta is already a lot better than an xls file, but see how I used dataex here. Please do the same in future posts. People on here will answer your questions much quicker if you do so.


              Edit: I also see you wrote median, not mean. If that is indeed what you want, then the synatx is the same except for that one word:
              Code:
              foreach var of varlist fexp1111_1-fexp1118_3{
              bysort psu_num: egen `var'_medbypsu=median(`var')
              replace `var'=`var'_medbypsu if `var'==.
              drop `var'_medbypsu
              }
              Last edited by Jorrit Gosens; 16 Aug 2018, 03:53.

              Comment


              • #8
                Thank you so much but the code you have written seems to replace all items with missing value. I want to replace only if all items in a particular HHS have missing values (for example houseid =="27b3ea7729b4e048da45f75378193d8) have all items missing. How can i include that in the loop. I could not upload all my dataset since the sample is very large so the HH with all items have been missed out. I have about 100 HH with all items missing info so for only those HHs I want to replace the median of the item in the PSU where the HH is so the replacement should be for only 100 hhs.

                Thank you

                Comment


                • #9
                  Okay, I missed that bit of info.
                  Code:
                  egen nonmissing = rownonmiss(fexp1111_1-fexp1118_3)
                  foreach var of varlist fexp1111_1-fexp1118_3{
                  bysort psu_num: egen `var'_medbypsu=median(`var')
                  replace `var'=`var'_medbypsu if nonmissing==0
                  drop `var'_medbypsu
                  }

                  Comment


                  • #10
                    Thank you so much. Working now

                    Comment

                    Working...
                    X