Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • "Missing" predictor error using allsynth / synth_runner

    I'm trying to do a synthetic control analysis with multiple treated units and treatment timings. I have decennial county data from 1970 to 2010, and treatment is at the county level. Below is the code I run to make it panel data:

    tsset uniqueid year, yearly delta(10)

    It tells me that the data is strongly balanced, so there shouldn't be any issues there.

    Below is a simplified example of the structure of my data:
    year state county log_med_house_value treat treat_year
    1970 01 001 10.5 0 .
    1980 01 001 10.6 0 .
    1990 01 001 10.7 0 .
    2000 01 001 10.8 0 .
    2010 01 001 10.8 0 .
    1970 03 012 10.1 1 2000
    1980 03 012 10.1 1 2000
    1990 03 012 10.3 1 2000
    2000 03 012 10.5 1 2000
    2010 03 012 10.5 1 2000
    1970 10 003 11.1 0 .
    1980 10 003 11.4 0 .
    1990 10 003 11.4 0 .
    2000 10 003 11.5 0 .
    2010 10 003 11.3 0 .
    1970 24 092 10.9 1 2010
    1980 24 092 11.1 1 2010
    1990 24 092 11.2 1 2010
    2000 24 092 11.2 1 2010
    2010 24 092 11.1 1 2010
    This is very simplified as my actual dataset has 184 variables, but none of the variables that I plan on using as either outcomes or predictors have any missing values. This is one example I've tried running from allsynth:

    allsynth log_med_house_value log_med_house_value(1970) log_med_house_value(1980) log_med_house_value(1990), transform(log_med_house_value, normalize) bcorrect(merge) keep(Results, replace) stacked(trunits(treat) trperiods(treat_year), clear eventtime(-3 1) figure(classic bcorrect, save(Results/ate, replace) xtitle(Year relative to treatment)))

    It returns the error:

    "control units: for at least one unit, predictor log_med_house_value(1970_ is missing for ALL periods specified"

    This clearly isn't the case in my data. I also don't know why in the error the closing parenthesis is shown as an underscore, but I checked my code and I did indeed close the parentheses. Is allsynth somehow not reading my data correctly since it's decennial instead of annual? I've tried other outcomes and predictors and I get the same error for every variable I've tried. Before using allsynth I was using synth_runner and the same was happening with that. Any help would be really appreciated!

    In addition to that issue, I was wondering how I can make the years for the predictors differ depending on the treatment timing with allsynth. With synth_runner it seemed I could do something like below:

    program my_pred, rclass
    args tyear
    return local preds "log_med_house_value(`=`tyear'-30'(10)`=`tyear'-10') totalpop(`=`tyear'-30'(10)`=`tyear'-10')"
    end


    I would then use pred_prog(my_pred) as an option in synth_runner. Is there a way to do something similar in allsynth? I haven't found a way. Thanks in advance for any help with either issue!

  • #2
    This clearly isn't the case in my data
    How can you tell? Either way, you'll need to provide a minimal worked example- not a table, a real dataset that others can work with such that we can reproduce your error. Use the dataex command for the data provision, but make sure that we can reproduce the error. Otherwise, we can't help you. Please, do see the FAQ to see how to ask a question.

    Welcome to Statalist!

    Comment


    • #3
      Thank you for the reply! Apologies for not including actual data. Below are some observations from Ohio using dataex:

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input int year str24 state str2 statea str57 county str3 countya long uniqueid float log_med_house_value byte treat float treat_year
      1970 "Ohio" "39" "Adams"            "001" 39001 11.027865 0    .
      1980 "Ohio" "39" "Adams"            "001" 39001  11.26958 0    .
      1990 "Ohio" "39" "Adams"            "001" 39001  11.08128 0    .
      2000 "Ohio" "39" "Adams"            "001" 39001 11.396032 0    .
      2010 "Ohio" "39" "Adams County"     "001" 39001 11.388495 0    .
      1970 "Ohio" "39" "Allen"            "003" 39003 11.480237 0    .
      1980 "Ohio" "39" "Allen"            "003" 39003 11.600936 0    .
      1990 "Ohio" "39" "Allen"            "003" 39003 11.426234 0    .
      2000 "Ohio" "39" "Allen"            "003" 39003 11.589664 0    .
      2010 "Ohio" "39" "Allen County"     "003" 39003  11.57308 0    .
      1970 "Ohio" "39" "Ashland"          "005" 39005 11.532816 0    .
      1980 "Ohio" "39" "Ashland"          "005" 39005 11.631242 0    .
      1990 "Ohio" "39" "Ashland"          "005" 39005 11.454618 0    .
      2000 "Ohio" "39" "Ashland"          "005" 39005 11.748693 0    .
      2010 "Ohio" "39" "Ashland County"   "005" 39005 11.694413 0    .
      1970 "Ohio" "39" "Ashtabula"        "007" 39007 11.456345 0    .
      1980 "Ohio" "39" "Ashtabula"        "007" 39007 11.616204 0    .
      1990 "Ohio" "39" "Ashtabula"        "007" 39007 11.297353 0    .
      2000 "Ohio" "39" "Ashtabula"        "007" 39007  11.63156 0    .
      2010 "Ohio" "39" "Ashtabula County" "007" 39007 11.583384 0    .
      1970 "Ohio" "39" "Athens"           "009" 39009  11.35023 0    .
      1980 "Ohio" "39" "Athens"           "009" 39009  11.39998 0    .
      1990 "Ohio" "39" "Athens"           "009" 39009 11.335902 0    .
      2000 "Ohio" "39" "Athens"           "009" 39009  11.61977 0    .
      2010 "Ohio" "39" "Athens County"    "009" 39009 11.643076 0    .
      1970 "Ohio" "39" "Auglaize"         "011" 39011 11.451454 0    .
      1980 "Ohio" "39" "Auglaize"         "011" 39011 11.646057 0    .
      1990 "Ohio" "39" "Auglaize"         "011" 39011 11.542096 0    .
      2000 "Ohio" "39" "Auglaize"         "011" 39011  11.69184 0    .
      2010 "Ohio" "39" "Auglaize County"  "011" 39011 11.786762 0    .
      1970 "Ohio" "39" "Belmont"          "013" 39013  11.24383 0    .
      1980 "Ohio" "39" "Belmont"          "013" 39013   11.4696 0    .
      1990 "Ohio" "39" "Belmont"          "013" 39013 11.213117 0    .
      2000 "Ohio" "39" "Belmont"          "013" 39013 11.353601 0    .
      2010 "Ohio" "39" "Belmont County"   "013" 39013 11.404226 0    .
      1970 "Ohio" "39" "Brown"            "015" 39015 11.312754 0    .
      1980 "Ohio" "39" "Brown"            "015" 39015 11.542872 0    .
      1990 "Ohio" "39" "Brown"            "015" 39015 11.368962 0    .
      2000 "Ohio" "39" "Brown"            "015" 39015 11.684085 0    .
      2010 "Ohio" "39" "Brown County"     "015" 39015 11.650075 0    .
      1970 "Ohio" "39" "Butler"           "017" 39017 11.637247 0    .
      1980 "Ohio" "39" "Butler"           "017" 39017 11.849398 0    .
      1990 "Ohio" "39" "Butler"           "017" 39017  11.76353 0    .
      2000 "Ohio" "39" "Butler"           "017" 39017 11.999196 0    .
      2010 "Ohio" "39" "Butler County"    "017" 39017 11.959533 0    .
      1970 "Ohio" "39" "Champaign"        "021" 39021  11.35792 0    .
      1980 "Ohio" "39" "Champaign"        "021" 39021 11.553683 0    .
      1990 "Ohio" "39" "Champaign"        "021" 39021  11.48765 0    .
      2000 "Ohio" "39" "Champaign"        "021" 39021 11.744514 0    .
      2010 "Ohio" "39" "Champaign County" "021" 39021 11.704372 0    .
      1970 "Ohio" "39" "Lucas"            "095" 39095  11.66294 1 2000
      1980 "Ohio" "39" "Lucas"            "095" 39095  11.67266 1 2000
      1990 "Ohio" "39" "Lucas"            "095" 39095  11.52137 1 2000
      2000 "Ohio" "39" "Lucas"            "095" 39095 11.692945 1 2000
      2010 "Ohio" "39" "Lucas County"     "095" 39095  11.57402 1 2000
      1970 "Ohio" "39" "Pike"             "131" 39131 11.032483 1 2010
      1980 "Ohio" "39" "Pike"             "131" 39131 11.361637 1 2010
      1990 "Ohio" "39" "Pike"             "131" 39131 11.217856 1 2010
      2000 "Ohio" "39" "Pike"             "131" 39131 11.534374 1 2010
      2010 "Ohio" "39" "Pike County"      "131" 39131  11.44143 1 2010
      end
      format %ty year
      There aren't any missing values for log_med_house_value, but I still get the error. Also, is it even possible to get a treatment effects for observations with a treatment year of 2010, or would I need 2020 data to get those treatment effects since eventtime() needs a positive value as the second number in the numbest?

      Comment


      • #4
        I figured out what the issue was when using allsynth. I just ran it for only the treated units with a treatment year of 2000, and I noticed that since I specified transform(log_med_house_value, normalize), it said it was normalizing the values of log_med_house_value with respect to their 1999 values. Since I have decennial data, this was creating missing values. It doesn't seem there are any ways of specifying that the values should be normalized with respect to the 1990 values instead, so I can just normalize the variable before running allsynth.

        When I remove transform() from the allsynth specification it runs properly, but I have 1598 observations in the donor pool and it's putting weights of 0.001 on a ton of these observations. The synthetic control also always has much lower house values than the treated units in the pre-periods. I suppose I just need to add predictors and hopefully that would narrow down the counties that allsynth puts weights on? Apologies for rambling but if anyone has advice on this or any of my previous questions I'd greatly appreciate it.

        Comment


        • #5
          So wait, let me ensure I'm understanding... are the weights not sparse? Or, is it the case that many units are getting very small weights? 1598 observations in the donor pool? Good lord man! That's how many donors you have, 1598? By the way, I'm one of the more knowledgeable folks on SC here, so I'm equipped to give methodological advice in this area

          Comment

          Working...
          X