Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16

    On the full dataset (so about a million on the Y-axis). The dataex provided came straight from that dataset — if that’s what you're asking.

    Comment


    • #17
      Okay, I think I understand what is going on. The problem comes from observations where share_region is missing. Which in turn, occurs when tot_pat_cpt is zero or missing, which in turn occurs when all observations of sh_pat for a combination of country ipc2 year are zero or missing. You should decide how to handle these cases, but one simple option might be to return missing values of the inequality measures. If this is what you want, you just need to incorporate a check for this missingness in your code.

      The data extract below reproduces your problem:
      Code:
      clear
      input str5 nuts3 int year str4 ipc4 float sh_pat
      "BE211" 2020 "ABCD" 197
      "BE211" 2021 "ABCD" 202
      "BE211" 2022 "ABCD" 163
      "BE211" 2023 "ABCD" 194
      "BE211" 2024 "ABCD" 186
      "BE211" 2025 "ABCD" 198
      "BE213" 2020 "ABCD" 207
      "BE213" 2021 "ABCD" 189
      "BE213" 2022 "ABCD" 199
      "BE213" 2023 "ABCD" 206
      "BE213" 2024 "ABCD" 198
      "BE213" 2025 "ABCD" 176
      "FRJ22" 2020 "ABCD" 173
      "FRJ22" 2021 "ABCD" 197
      "FRJ22" 2022 "ABCD" 199
      "FRJ22" 2023 "ABCD" 198
      "FRJ22" 2024 "ABCD" 227
      "FRJ22" 2025 "ABCD" 191
      "FRK14" 2020 "ABCD" 209
      "FRK14" 2021 "ABCD" 217
      "FRK14" 2022 "ABCD" 207
      "FRK14" 2023 "ABCD" 223
      "FRK14" 2024 "ABCD" 192
      "FRK14" 2025 "ABCD" 194
      "BE211" 2020 "EFGH" 228
      "BE211" 2021 "EFGH" 206
      "BE211" 2022 "EFGH" 0
      "BE211" 2023 "EFGH" 197
      "BE211" 2024 "EFGH" 219
      "BE211" 2025 "EFGH" 207
      "BE213" 2020 "EFGH" 178
      "BE213" 2021 "EFGH" 199
      "BE213" 2022 "EFGH" .
      "BE213" 2023 "EFGH" 206
      "BE213" 2024 "EFGH" 195
      "BE213" 2025 "EFGH" 210
      "FRJ22" 2020 "EFGH" 201
      "FRJ22" 2021 "EFGH" 202
      "FRJ22" 2022 "EFGH" 184
      "FRJ22" 2023 "EFGH" 204
      "FRJ22" 2024 "EFGH" 216
      "FRJ22" 2025 "EFGH" 196
      "FRK14" 2020 "EFGH" 220
      "FRK14" 2021 "EFGH" 196
      "FRK14" 2022 "EFGH" 195
      "FRK14" 2023 "EFGH" 189
      "FRK14" 2024 "EFGH" 202
      "FRK14" 2025 "EFGH" 180
      end
      which gives the error:
      Code:
      no observations
      r(2000);
      Below I modify the step 2 code to check for missingness in share_region:

      Code:
      * STEP 2: Theil and Gini Indices
      tempfile tgresults
      tempname tgpost
      postfile `tgpost' double cod_tech int year double theil gini using `tgresults', replace
      
      levelsof cod_tech, local(countries)
      levelsof year, local(years)
      
      foreach c of local countries {
          foreach y of local years {
              local if_cond cod_tech == `c' & year == `y'
              qui count if `if_cond' & share_region > 0 & !missing(share_region)
              if r(N) >= 2 {
                  qui ineqdeco share_region if `if_cond'
                  local gini = r(gini)
                  local theil = r(ge1)
              }
              else {
                  local gini = .
                  local theil = .
              }
              post `tgpost' (`c') (`y') (`theil') (`gini')
          }
      }
      
      postclose `tgpost'
      use `tgresults', clear
      save theil_gini_cod_tech.dta, replace
      And this code runs just fine. Here is the output file listed:
      Code:
      . list, noobs sepby(cod_tech)
      
        +----------------------------------------+
        | cod_tech   year       theil       gini |
        |----------------------------------------|
        |        1   2020   .00030637   .0123762 |
        |        1   2021   .00055282    .016624 |
        |        1   2022   .00495308   .0497237 |
        |        1   2023   .00045007       .015 |
        |        1   2024   .00048836    .015625 |
        |        1   2025    .0017311   .0294118 |
        |----------------------------------------|
        |        2   2020   .00760258   .0615764 |
        |        2   2021   .00014937    .008642 |
        |        2   2022           .          . |
        |        2   2023   .00024939   .0111663 |
        |        2   2024   .00168126   .0289855 |
        |        2   2025   .00002588   .0035971 |
        |----------------------------------------|
        |        3   2020   .00444726   .0471204 |
        |        3   2021   .00116734   .0241546 |
        |        3   2022   .00019415   .0098522 |
        |        3   2023   .00176417   .0296912 |
        |        3   2024   .00349288   .0417661 |
        |        3   2025   .00003036   .0038961 |
        |----------------------------------------|
        |        4   2020   .00101873   .0225653 |
        |        4   2021   .00011364   .0075377 |
        |        4   2022   .00042125   .0145119 |
        |        4   2023   .00072857    .019084 |
        |        4   2024   .00056099   .0167464 |
        |        4   2025   .00090566   .0212766 |
        +----------------------------------------+

      Comment


      • #18
        Hi again, thank you so much for your help. Out of interest, would you have any idea how long a command like this would take? It has not ran successfully yet due to sheer processing power - is this normal?

        Comment

        Working...
        X