Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Filling missing price values for specific observations

    Hi everyone,
    I have a dataset of 2304 observation.
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str39 market str46 commodity float price int year
    "Alto Molócuè" "Maize (white)"                  6.8571 2014
    "Alto Molócuè" "Maize (white)"                 20.5714 2016
    "Alto Molócuè" "Maize meal (white with bran) "       . 2014
    "Alto Molócuè" "Maize meal (white with bran) "    18.5 2016
    "Alto Molócuè" "Groundnuts (small shelled) "     43.75 2015
    "Alto Molócuè" "Maize meal (white with bran) "       . 2016
    "Alto Molócuè" "Sugar (brown local) "                . 2015
    "Alto Molócuè" "Groundnuts (large shelled) "         . 2015
    "Alto Molócuè" "Oil (vegetable local) "             75 2014
    "Alto Molócuè" "Groundnuts (large shelled) "       130 2016
    "Alto Molócuè" "Maize (white)"                  9.1429 2014
    "Alto Molócuè" "Oil (vegetable local) "           82.5 2016
    "Alto Molócuè" "Groundnuts (large shelled) "        20 2015
    "Alto Molócuè" "Oil (vegetable local) "            150 2016
    "Alto Molócuè" "Cowpeas"                            35 2014
    "Alto Molócuè" "Groundnuts (large shelled) "        30 2014
    "Alto Molócuè" "Sugar (brown local) "          35.3333 2014
    "Alto Molócuè" "Maize (white)"                 28.5714 2016
    "Alto Molócuè" "Cowpeas"                       18.6667 2015
    "Alto Molócuè" "Maize meal (white with bran) "      35 2016
    "Alto Molócuè" "Oil (vegetable local) "             50 2014
    "Alto Molócuè" "Groundnuts (small shelled) "        25 2014
    "Alto Molócuè" "Rice (imported)"                    25 2014
    "Alto Molócuè" "Maize (white)"                 10.2857 2015
    "Alto Molócuè" "Rice (imported)"               26.6667 2015
    "Alto Molócuè" "Oil (vegetable local) "            130 2016
    "Alto Molócuè" "Maize meal (white with bran) "       . 2014
    "Alto Molócuè" "Maize (white)"                       . 2016
    "Alto Molócuè" "Oil (vegetable local) "             50 2015
    "Alto Molócuè" "Maize meal (white with bran) "       . 2016
    "Alto Molócuè" "Maize (white)"                      10 2014
    "Alto Molócuè" "Groundnuts (large shelled) "   41.6667 2014
    "Alto Molócuè" "Cowpeas"                             . 2016
    "Alto Molócuè" "Rice (imported)"                    25 2014
    "Alto Molócuè" "Cowpeas"                            30 2016
    "Alto Molócuè" "Rice (imported)"                    25 2014
    "Alto Molócuè" "Groundnuts (small shelled) "        40 2014
    "Alto Molócuè" "Maize meal (white with bran) "   12.25 2015
    "Alto Molócuè" "Oil (vegetable local) "             80 2015
    "Alto Molócuè" "Maize meal (white with bran) "      35 2016
    "Alto Molócuè" "Oil (vegetable local) "           50.5 2014
    "Alto Molócuè" "Rice (imported)"                  32.5 2016
    "Alto Molócuè" "Groundnuts (large shelled) "       100 2016
    "Alto Molócuè" "Sugar (brown local) "               70 2016
    "Alto Molócuè" "Maize (white)"                 22.8571 2016
    "Alto Molócuè" "Groundnuts (small shelled) "     41.25 2015
    "Alto Molócuè" "Cowpeas"                            17 2015
    "Alto Molócuè" "Sugar (brown local) "                . 2015
    "Alto Molócuè" "Rice (imported)"                    40 2016
    "Alto Molócuè" "Groundnuts (large shelled) "       120 2016
    "Alto Molócuè" "Oil (vegetable local) "             70 2014
    "Alto Molócuè" "Rice (imported)"                    25 2014
    "Alto Molócuè" "Maize (white)"                       8 2014
    "Alto Molócuè" "Cowpeas"                             . 2015
    "Alto Molócuè" "Oil (vegetable local) "             70 2014
    "Alto Molócuè" "Sugar (brown local) "               35 2014
    "Alto Molócuè" "Maize (white)"                 20.5714 2016
    "Alto Molócuè" "Rice (imported)"                    45 2016
    "Alto Molócuè" "Groundnuts (large shelled) "         . 2015
    "Alto Molócuè" "Maize meal (white with bran) "      24 2016
    "Alto Molócuè" "Groundnuts (large shelled) "       130 2016
    "Alto Molócuè" "Groundnuts (large shelled) "        20 2014
    "Alto Molócuè" "Cowpeas"                            20 2015
    "Alto Molócuè" "Groundnuts (small shelled) "        35 2014
    "Alto Molócuè" "Sugar (brown local) "               45 2015
    "Alto Molócuè" "Groundnuts (large shelled) "        31 2015
    "Alto Molócuè" "Cowpeas"                          37.5 2016
    "Alto Molócuè" "Sugar (brown local) "                . 2015
    "Alto Molócuè" "Oil (vegetable local) "             73 2014
    "Alto Molócuè" "Groundnuts (small shelled) "         . 2016
    "Alto Molócuè" "Cowpeas"                          32.5 2016
    "Alto Molócuè" "Oil (vegetable local) "              . 2015
    "Alto Molócuè" "Rice (imported)"                  26.5 2015
    "Alto Molócuè" "Maize meal (white with bran) "      15 2015
    "Alto Molócuè" "Rice (imported)"               27.3333 2014
    "Alto Molócuè" "Groundnuts (large shelled) "        20 2014
    "Alto Molócuè" "Rice (imported)"                 25.25 2015
    "Alto Molócuè" "Oil (vegetable local) "             50 2014
    "Alto Molócuè" "Cowpeas"                            30 2016
    "Alto Molócuè" "Maize meal (white with bran) "    11.2 2015
    "Alto Molócuè" "Maize meal (white with bran) "      14 2015
    "Alto Molócuè" "Maize meal (white with bran) "       . 2014
    "Alto Molócuè" "Oil (vegetable local) "             75 2015
    "Alto Molócuè" "Groundnuts (large shelled) "        90 2016
    "Alto Molócuè" "Sugar (brown local) "               45 2015
    "Alto Molócuè" "Groundnuts (large shelled) "      32.5 2015
    "Alto Molócuè" "Oil (vegetable local) "             50 2014
    "Alto Molócuè" "Maize meal (white with bran) "       . 2014
    "Alto Molócuè" "Cowpeas"                            15 2015
    "Alto Molócuè" "Sugar (brown local) "               38 2014
    "Alto Molócuè" "Sugar (brown local) "               35 2014
    "Alto Molócuè" "Maize (white)"                       . 2016
    "Alto Molócuè" "Groundnuts (small shelled) "        43 2015
    "Alto Molócuè" "Maize meal (white with bran) "      12 2015
    "Alto Molócuè" "Groundnuts (small shelled) "        41 2014
    "Alto Molócuè" "Rice (imported)"               51.3333 2016
    "Alto Molócuè" "Maize meal (white with bran) "      15 2015
    "Alto Molócuè" "Oil (vegetable local) "             75 2014
    "Alto Molócuè" "Rice (imported)"                    25 2014
    "Alto Molócuè" "Groundnuts (small shelled) "        50 2014
    end
    In order to proceed with my analysis I have to fill the missing observations in the dataset. They consist in prices of different commodities.
    I tried this command
    Code:
    ipolate price year if commodity=="Cowpeas", generate(cowpeas_price)
    but this is not exactly what I want, since I am asking STATA to estimate the price if the commodity is "mycommodity".
    I want to fill the missing price for the different commodities, taking into account their specific trend, and I don't know how to.

    Thank you
    Last edited by Fabio Delisio; 21 Aug 2023, 04:06.

  • #2
    I don't know what mycommodity is, but I would imagine that at best it's only sensible to interpolate for the same market and commodity

    You need a by: prefix such as

    Code:
    bysort market commodity:
    See help ipolate, especially the last example.

    For prices, interpolating on log price and then exponentiating may be a better idea.

    Comment


    • #3
      Nick Cox Thank you for the answer. I wrote "mycommodity" to indicate a commodity that could be "cowpeas", "maize" or something else. I will check the last example.

      Comment


      • #4
        Nick Cox I followed the last example on "help ipolate" and i wrote this code
        Code:
        bysort price year: ipolate price year, gen(price_i) epolate
        The output of the code didn't generated any estimation, the code generated missing values in correspondence with the original missing values. I attach here a part of the dataset as an example

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input str33 admin1 float price int year double price_i
        "Niassa"   . 2015 .
        "Zambezia" . 2016 .
        "Niassa"   . 2016 .
        "Niassa"   . 2014 .
        "Zambezia" . 2014 .
        "Zambezia" . 2014 .
        "Niassa"   . 2016 .
        "Niassa"   . 2016 .
        "Zambezia" . 2014 .
        "Niassa"   . 2015 .
        "Zambezia" . 2016 .
        "Manica"   . 2015 .
        "Niassa"   . 2016 .
        "Zambezia" . 2015 .
        "Manica"   . 2016 .
        "Niassa"   . 2016 .
        "Zambezia" . 2014 .
        "Niassa"   . 2016 .
        "Niassa"   . 2016 .
        "Manica"   . 2015 .
        "Niassa"   . 2015 .
        "Zambezia" . 2014 .
        "Niassa"   . 2016 .
        "Manica"   . 2014 .
        "Zambezia" . 2016 .
        "Manica"   . 2015 .
        "Zambezia" . 2014 .
        "Zambezia" . 2014 .
        "Zambezia" . 2016 .
        "Manica"   . 2015 .
        "Zambezia" . 2015 .
        "Zambezia" . 2014 .
        "Niassa"   . 2016 .
        "Niassa"   . 2014 .
        "Niassa"   . 2016 .
        "Niassa"   . 2016 .
        "Zambezia" . 2015 .
        "Niassa"   . 2016 .
        "Niassa"   . 2015 .
        "Niassa"   . 2016 .
        "Zambezia" . 2016 .
        "Niassa"   . 2016 .
        "Niassa"   . 2016 .
        "Niassa"   . 2015 .
        "Zambezia" . 2015 .
        "Manica"   . 2014 .
        "Niassa"   . 2016 .
        "Niassa"   . 2016 .
        "Manica"   . 2014 .
        "Zambezia" . 2015 .
        "Zambezia" . 2016 .
        "Manica"   . 2015 .
        "Niassa"   . 2015 .
        "Niassa"   . 2016 .
        "Zambezia" . 2015 .
        "Zambezia" . 2016 .
        "Manica"   . 2015 .
        "Niassa"   . 2014 .
        "Niassa"   . 2016 .
        "Niassa"   . 2015 .
        "Zambezia" . 2015 .
        "Zambezia" . 2015 .
        "Niassa"   . 2015 .
        "Zambezia" . 2014 .
        "Niassa"   . 2016 .
        "Niassa"   . 2015 .
        "Niassa"   . 2016 .
        "Zambezia" . 2016 .
        "Zambezia" . 2016 .
        "Zambezia" . 2016 .
        "Zambezia" . 2014 .
        "Niassa"   . 2015 .
        "Zambezia" . 2016 .
        "Niassa"   . 2016 .
        "Zambezia" . 2015 .
        "Niassa"   . 2016 .
        "Niassa"   . 2015 .
        "Zambezia" . 2015 .
        "Manica"   . 2015 .
        "Niassa"   . 2016 .
        "Niassa"   . 2016 .
        "Zambezia" . 2014 .
        "Niassa"   . 2015 .
        "Zambezia" . 2015 .
        "Zambezia" . 2015 .
        "Zambezia" . 2016 .
        "Zambezia" . 2014 .
        "Zambezia" . 2014 .
        "Manica"   . 2015 .
        "Niassa"   . 2014 .
        "Manica"   . 2014 .
        "Zambezia" . 2015 .
        "Niassa"   . 2016 .
        "Niassa"   . 2016 .
        "Zambezia" . 2016 .
        "Niassa"   . 2016 .
        "Manica"   . 2015 .
        "Niassa"   . 2016 .
        "Zambezia" . 2014 .
        "Niassa"   . 2014 .
        end

        Comment


        • #5
          That was not at all what I recommended. Consider
          Code:
           
           bysort market commodity: ipolate price year, generate(ipolate_price)

          Comment


          • #6
            Nick Cox I tried the code you wrote before, but it gives me strange results. It estimates the prices even if they are observed.

            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input str39 market str46 commodity float price int year double ipolate_price
            "Alto Molócuè" "Cowpeas"                      31.6667 2014  25.82610829671224
            "Alto Molócuè" "Cowpeas"                           30 2016  37.43333015441895
            "Alto Molócuè" "Cowpeas"                            . 2016  37.43333015441895
            "Alto Molócuè" "Cowpeas"                         32.5 2016  37.43333015441895
            "Alto Molócuè" "Cowpeas"                           20 2015 20.101518284190785
            "Alto Molócuè" "Cowpeas"                           41 2016  37.43333015441895
            "Alto Molócuè" "Cowpeas"                      28.3333 2014  25.82610829671224
            "Alto Molócuè" "Cowpeas"                           30 2014  25.82610829671224
            "Alto Molócuè" "Cowpeas"                         37.5 2016  37.43333015441895
            "Alto Molócuè" "Cowpeas"                         16.2 2015 20.101518284190785
            "Alto Molócuè" "Cowpeas"                           15 2014  25.82610829671224
            "Alto Molócuè" "Cowpeas"                           15 2014  25.82610829671224
            "Alto Molócuè" "Cowpeas"                        16.25 2014  25.82610829671224
            "Alto Molócuè" "Cowpeas"                           15 2015 20.101518284190785
            "Alto Molócuè" "Cowpeas"                      23.3333 2014  25.82610829671224
            "Alto Molócuè" "Cowpeas"                           19 2015 20.101518284190785
            "Alto Molócuè" "Cowpeas"                           35 2014  25.82610829671224
            "Alto Molócuè" "Cowpeas"                        28.33 2014  25.82610829671224
            "Alto Molócuè" "Cowpeas"                        17.75 2015 20.101518284190785
            "Alto Molócuè" "Cowpeas"                           30 2015 20.101518284190785
            "Alto Molócuè" "Cowpeas"                           17 2015 20.101518284190785
            "Alto Molócuè" "Cowpeas"                           37 2014  25.82610829671224
            "Alto Molócuè" "Cowpeas"                           15 2014  25.82610829671224
            "Alto Molócuè" "Cowpeas"                           20 2015 20.101518284190785
            "Alto Molócuè" "Cowpeas"                      38.3333 2016  37.43333015441895
            "Alto Molócuè" "Cowpeas"                         32.5 2015 20.101518284190785
            "Alto Molócuè" "Cowpeas"                           30 2016  37.43333015441895
            "Alto Molócuè" "Cowpeas"                           35 2014  25.82610829671224
            "Alto Molócuè" "Cowpeas"                           15 2015 20.101518284190785
            "Alto Molócuè" "Cowpeas"                      18.6667 2015 20.101518284190785
            "Alto Molócuè" "Cowpeas"                           35 2016  37.43333015441895
            "Alto Molócuè" "Cowpeas"                            . 2015 20.101518284190785
            "Alto Molócuè" "Cowpeas"                            . 2016  37.43333015441895
            "Alto Molócuè" "Cowpeas"                           30 2016  37.43333015441895
            "Alto Molócuè" "Cowpeas"                           60 2016  37.43333015441895
            "Alto Molócuè" "Cowpeas"                           40 2016  37.43333015441895
            "Alto Molócuè" "Groundnuts (large shelled) "       20 2014  27.89555819829305
            "Alto Molócuè" "Groundnuts (large shelled) "      120 2016 112.91667022705079
            "Alto Molócuè" "Groundnuts (large shelled) "       30 2015  33.72222222222222
            "Alto Molócuè" "Groundnuts (large shelled) "        . 2015  33.72222222222222
            "Alto Molócuè" "Groundnuts (large shelled) "       35 2014  27.89555819829305
            "Alto Molócuè" "Groundnuts (large shelled) "       20 2014  27.89555819829305
            "Alto Molócuè" "Groundnuts (large shelled) "       20 2014  27.89555819829305
            "Alto Molócuè" "Groundnuts (large shelled) "      130 2016 112.91667022705079
            "Alto Molócuè" "Groundnuts (large shelled) "       31 2015  33.72222222222222
            "Alto Molócuè" "Groundnuts (large shelled) "       90 2016 112.91667022705079
            "Alto Molócuè" "Groundnuts (large shelled) "     37.5 2015  33.72222222222222
            "Alto Molócuè" "Groundnuts (large shelled) "      100 2016 112.91667022705079
            "Alto Molócuè" "Groundnuts (large shelled) "  86.6667 2016 112.91667022705079
            "Alto Molócuè" "Groundnuts (large shelled) "        . 2016 112.91667022705079
            "Alto Molócuè" "Groundnuts (large shelled) "       20 2014  27.89555819829305
            "Alto Molócuè" "Groundnuts (large shelled) "    122.5 2016 112.91667022705079
            "Alto Molócuè" "Groundnuts (large shelled) "        . 2016 112.91667022705079
            "Alto Molócuè" "Groundnuts (large shelled) "       30 2015  33.72222222222222
            "Alto Molócuè" "Groundnuts (large shelled) "        . 2015  33.72222222222222
            "Alto Molócuè" "Groundnuts (large shelled) "     27.5 2015  33.72222222222222
            "Alto Molócuè" "Groundnuts (large shelled) "       70 2015  33.72222222222222
            "Alto Molócuè" "Groundnuts (large shelled) "       25 2015  33.72222222222222
            "Alto Molócuè" "Groundnuts (large shelled) "     32.5 2015  33.72222222222222
            "Alto Molócuè" "Groundnuts (large shelled) "    28.33 2014  27.89555819829305
            "Alto Molócuè" "Groundnuts (large shelled) "       30 2014  27.89555819829305
            "Alto Molócuè" "Groundnuts (large shelled) "     42.5 2014  27.89555819829305
            "Alto Molócuè" "Groundnuts (large shelled) "       20 2014  27.89555819829305
            "Alto Molócuè" "Groundnuts (large shelled) "       20 2015  33.72222222222222
            "Alto Molócuè" "Groundnuts (large shelled) "       36 2014  27.89555819829305
            "Alto Molócuè" "Groundnuts (large shelled) "    21.25 2014  27.89555819829305
            "Alto Molócuè" "Groundnuts (large shelled) "        . 2015  33.72222222222222
            "Alto Molócuè" "Groundnuts (large shelled) "      105 2016 112.91667022705079
            "Alto Molócuè" "Groundnuts (large shelled) "  41.6667 2014  27.89555819829305
            "Alto Molócuè" "Groundnuts (large shelled) "      115 2016 112.91667022705079
            "Alto Molócuè" "Groundnuts (large shelled) "      130 2016 112.91667022705079
            "Alto Molócuè" "Groundnuts (large shelled) "      130 2016 112.91667022705079
            "Alto Molócuè" "Groundnuts (small shelled) "      125 2016  131.5370330810547
            "Alto Molócuè" "Groundnuts (small shelled) "      180 2016  131.5370330810547
            "Alto Molócuè" "Groundnuts (small shelled) "    33.33 2014  33.93722518285116
            "Alto Molócuè" "Groundnuts (small shelled) "       40 2015  42.36363636363637
            "Alto Molócuè" "Groundnuts (small shelled) "        . 2016  131.5370330810547
            "Alto Molócuè" "Groundnuts (small shelled) "    147.5 2016  131.5370330810547
            "Alto Molócuè" "Groundnuts (small shelled) "        . 2016  131.5370330810547
            "Alto Molócuè" "Groundnuts (small shelled) "       38 2015  42.36363636363637
            "Alto Molócuè" "Groundnuts (small shelled) "       35 2015  42.36363636363637
            "Alto Molócuè" "Groundnuts (small shelled) "  26.6667 2014  33.93722518285116
            "Alto Molócuè" "Groundnuts (small shelled) "    41.25 2015  42.36363636363637
            "Alto Molócuè" "Groundnuts (small shelled) "       25 2014  33.93722518285116
            "Alto Molócuè" "Groundnuts (small shelled) "       25 2014  33.93722518285116
            "Alto Molócuè" "Groundnuts (small shelled) "    43.75 2015  42.36363636363637
            "Alto Molócuè" "Groundnuts (small shelled) " 142.3333 2016  131.5370330810547
            "Alto Molócuè" "Groundnuts (small shelled) "       41 2014  33.93722518285116
            end

            Comment


            • #7
              The results should not seem surprising if you glance at the code to see exactly what ipolate does. In fact the help explains:

              Because interpolation requires that yvar be a function of xvar, yvar is also interpolated for tied values of xvar.
              If that is a little too terse, the manual entry spells it out:

              If there are multiple observations with the same value for x_0, then y_0 is taken as the average of
              the corresponding y values for those observations. (x_1; y_1) is handled in the same way.
              (The manual uses TeX/LaTeX so that _0 and _1 are printed as subscript 0 and 1 respectively.)

              You have at least in some instances multiple prices for the same market, commodity and year. In such cases, ipolate averages first and then interpolates.

              Here are the cowpeas subset from your data example:

              Click image for larger version

Name:	cowpeas.png
Views:	2
Size:	28.5 KB
ID:	1724568

              The jittering of data points is just cosmetic to shake identical or similar points apart and nothing to do with the interpolation.

              You can't do better than that, I suspect, without a different recipe (geometric means?) or more information (time of year?).

              Comment


              • #8
                The results should not seem surprising if you glance at the code to see exactly what ipolate does. In fact the help explains:

                Because interpolation requires that yvar be a function of xvar, yvar is also interpolated for tied values of xvar.
                If that is a little too terse, the manual entry spells it out:

                If there are multiple observations with the same value for x_0, then y_0 is taken as the average of
                the corresponding y values for those observations. (x_1; y_1) is handled in the same way.
                (The manual uses TeX/LaTeX so that _0 and _1 are printed as subscript 0 and 1 respectively.)

                You have at least in some instances multiple prices for the same market, commodity and year. In such cases, ipolate averages first and then interpolates.

                Here are the cowpeas subset from your data example:

                Click image for larger version

Name:	cowpeas.png
Views:	2
Size:	28.5 KB
ID:	1724571

                The jittering of data points is just cosmetic to shake identical or similar points apart and nothing to do with the interpolation.

                You can't do better than that, I suspect, without a different recipe (geometric means?) or more information (time of year?).

                Comment


                • #9
                  Sorry for the repetition. The forum software takes a while to add a new post and even if you know that, you can get confused about whether you have posted yet.

                  Comment


                  • #10
                    Nick Cox Thank you for the answer and the clarification. Actually, in the dataset there is another variable which describes when the price is observed (monthly date).
                    How would you suggest me to fill the gaps? I thought about the mean between the previous and the next observation between months. But what if the values are missing for an entire year?

                    Comment


                    • #11
                      Naturally you should use the month too if you think it's relevant. If there are multiple sellers of a commodity in a market at any one time, the most the interpolation can provide is (some kind of) average. Also, the whole point of interpolation is to fill in gaps as best as one can. How well that can be done depends on how smooth the variation is in time, as well as on how much variability there is at any one time.

                      With agricultural products I would expect some seasonality in prices.

                      I don't know how far you're going to get with this approach. Is it really a modelling problem?

                      Comment

                      Working...
                      X