Hi,
I have a parish register dataset with 3.3 million burials from four different centuries. My main struggle is its inconsistency, particularly in the sporadically missing years (deathyear). The first variable in the data extract (pid) is the only thing that provides a chronology I can rely on, but it is only chronological by separate parishes (kommnr2).
I tried filling in the missing years using:
gen fillyear = deathyear
replace fillyear = fillyear[_n-1] if fillyear == .
This worked partially, but only gave me an accurate year if the next parish in the list had a year in its first observation, and if the gaps within the chronology were not too big.
I have tried using "egen" to work around this, without luck. I cannot think of any way to conditionally limit "fillyear[_n-1]" so that it does not fill in the years where the gaps are too big or the parish changes.
Considering the number of observations I have, I am willing to sacrifice some accuracy for filling in more of these. At this point, I have through different methods, assigned 61% of the "deathyears", but the issue again is inconsistency, some parishes have no years to identify, while others are only missing a few, but I cannot condition it based on other variables as those are inconsistent too.
Please advise.
Thank you.
I have a parish register dataset with 3.3 million burials from four different centuries. My main struggle is its inconsistency, particularly in the sporadically missing years (deathyear). The first variable in the data extract (pid) is the only thing that provides a chronology I can rely on, but it is only chronological by separate parishes (kommnr2).
I tried filling in the missing years using:
gen fillyear = deathyear
replace fillyear = fillyear[_n-1] if fillyear == .
This worked partially, but only gave me an accurate year if the next parish in the list had a year in its first observation, and if the gaps within the chronology were not too big.
I have tried using "egen" to work around this, without luck. I cannot think of any way to conditionally limit "fillyear[_n-1]" so that it does not fill in the years where the gaps are too big or the parish changes.
Considering the number of observations I have, I am willing to sacrifice some accuracy for filling in more of these. At this point, I have through different methods, assigned 61% of the "deathyears", but the issue again is inconsistency, some parishes have no years to identify, while others are only missing a few, but I cannot condition it based on other variables as those are inconsistent too.
Please advise.
Thank you.
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str19 pid int(kommnr2 deathyear) float fillyear "pg00000002539927" 613 . 1810 "pg00000002539929" 613 . 1810 "pg00000002539931" 613 1810 1810 "pg00000002539933" 613 1810 1810 "pg00000002539935" 613 1810 1810 "pg00000002539936" 613 1810 1810 "pg00000002539937" 613 . 1810 "pg00000002539939" 613 1810 1810 "pg00000002539940" 613 . 1810 "pg00000002539942" 613 1810 1810 "pg00000002539943" 613 . 1810 "pg00000002539945" 613 . 1810 "pg00000002539946" 613 . 1810 "pg00000002539948" 613 . 1810 "pg00000002539950" 613 . 1810 "pg00000002539951" 613 . 1810 "pg00000002539952" 613 . 1810 "pg00000002539954" 613 1810 1810 "pg00000002539956" 613 1810 1810 "pg00000002539957" 613 1810 1810 "pg00000002539958" 613 . 1810 "pg00000002539959" 613 1810 1810 "pg00000002539960" 613 1810 1810 "pg00000002539961" 613 1810 1810 "pg00000002539962" 613 1810 1810 "pg00000002539963" 613 1810 1810 "pg00000002539964" 613 1810 1810 "pg00000002539965" 613 1810 1810 "pg00000002539966" 613 . 1810 "pg00000002539967" 613 1810 1810 "pg00000002539969" 613 1810 1810 "pg00000002539970" 613 1810 1810 "pg00000002539971" 613 1810 1810 "pg00000002539972" 613 . 1810 "pg00000002539974" 613 1810 1810 "pg00000002539975" 613 . 1810 "pg00000002539977" 613 1810 1810 "pg00000002539978" 613 1810 1810 "pg00000002539980" 613 1811 1811 "pg00000002539981" 613 . 1811 "pg00000002539983" 613 1811 1811 "pg00000002539984" 613 1811 1811 "pg00000002539985" 613 1810 1810 "pg00000002539986" 613 1810 1810 "pg00000002539987" 613 1810 1810 "pg00000002539988" 613 1810 1810 "pg00000002539990" 613 . 1810 "pg00000002539992" 613 . 1810 "pg00000002539994" 613 1810 1810 "pg00000002539995" 613 1810 1810 "pg00000002539997" 613 1811 1811 "pg00000002539998" 613 . 1811 "pg00000002540000" 613 1811 1811 "pg00000002540002" 613 1811 1811 "pg00000002540004" 613 1811 1811 "pg00000002540005" 613 . 1811 "pg00000002540007" 613 . 1811 "pg00000002540009" 613 . 1811 "pg00000002540011" 613 . 1811 "pg00000002540013" 613 . 1811 "pg00000002540015" 613 1811 1811 "pg00000002540016" 613 . 1811 "pg00000002540018" 613 . 1811 "pg00000002540019" 613 . 1811 "pg00000002540021" 613 1811 1811 "pg00000002540023" 613 1811 1811 "pg00000002540024" 613 1811 1811 "pg00000002540025" 613 1811 1811 "pg00000002540026" 613 . 1811 "pg00000002540028" 613 1811 1811 "pg00000002540029" 613 . 1811 "pg00000002540031" 613 . 1811 "pg00000002540032" 613 1811 1811 "pg00000002540033" 613 . 1811 "pg00000002540034" 613 1811 1811 "pg00000002540035" 613 1811 1811 "pg00000002540036" 613 1811 1811 "pg00000002540037" 613 1811 1811 "pg00000002540038" 613 1811 1811 "pg00000002540039" 613 1811 1811 "pg00000002540041" 613 . 1811 "pg00000002540043" 613 1811 1811 "pg00000002540044" 613 1811 1811 "pg00000002540045" 613 . 1811 "pg00000002540047" 613 . 1811 "pg00000002540049" 613 1811 1811 "pg00000002540050" 613 1811 1811 "pg00000002540051" 613 1811 1811 "pg00000002540052" 613 1811 1811 "pg00000002540053" 613 1811 1811 "pg00000002540055" 613 1811 1811 "pg00000002540057" 613 1811 1811 "pg00000002540058" 613 1811 1811 "pg00000002540059" 613 . 1811 "pg00000002540061" 613 1811 1811 "pg00000002540062" 613 . 1811 "pg00000002540064" 613 1811 1811 "pg00000002540065" 613 1811 1811 "pg00000002540067" 613 1811 1811 "pg00000002540068" 613 1811 1811 "pg00000002540069" 613 1811 1811 "pg00000002540070" 613 . 1811 "pg00000002540071" 613 . 1811 "pg00000002540073" 613 . 1811 "pg00000002540075" 613 . 1811 "pg00000002540075" 613 . 1811 "pg00000002540078" 613 . 1811 "pg00000002540080" 613 1811 1811 "pg00000002540081" 613 . 1811 "pg00000002540083" 613 1811 1811 "pg00000002540084" 613 1811 1811 "pg00000002540086" 613 1811 1811 "pg00000002540087" 613 1811 1811 "pg00000002540089" 613 1811 1811 "pg00000002540090" 613 1811 1811 "pg00000002540091" 613 1811 1811 "pg00000002540092" 613 1811 1811 "pg00000002540093" 613 1811 1811 "pg00000002540094" 613 1811 1811 "pg00000002540096" 613 . 1811 "pg00000002540099" 613 . 1811 "pg00000002540102" 613 1811 1811 "pg00000002540103" 613 1811 1811 "pg00000002540104" 613 1811 1811 "pg00000002540105" 613 1811 1811 "pg00000002540106" 613 . 1811 "pg00000002540108" 613 1811 1811 "pg00000002540109" 613 . 1811 "pg00000002540111" 613 1811 1811 "pg00000002540112" 613 . 1811 "pg00000002540114" 1934 . 1811 "pg00000002540115" 1934 . 1811 "pg00000002540116" 1934 . 1811 "pg00000002540117" 1934 . 1811 "pg00000002540119" 1934 . 1811 "pg00000002540120" 1934 . 1811 "pg00000002540121" 1934 . 1811 "pg00000002540122" 1934 . 1811 "pg00000002540123" 1934 . 1811 "pg00000002540125" 1934 . 1811 "pg00000002540126" 1934 . 1811 "pg00000002540127" 1934 . 1811 "pg00000002540128" 1934 . 1811 "pg00000002540129" 1934 . 1811 "pg00000002540131" 1934 . 1811 "pg00000002540133" 1934 . 1811 "pg00000002540134" 1934 . 1811 "pg00000002540135" 1934 . 1811 "pg00000002540136" 1934 . 1811 "pg00000002540138" 1934 . 1811 "pg00000002540140" 1934 . 1811 "pg00000002540142" 1934 . 1811 "pg00000002540144" 1934 . 1811 "pg00000002540145" 1934 . 1811 "pg00000002540147" 1934 . 1811 "pg00000002540148" 1934 . 1811 "pg00000002540150" 1934 . 1811 "pg00000002540152" 1934 . 1811 "pg00000002540154" 1934 . 1811 "pg00000002540156" 1934 . 1811 "pg00000002540158" 1934 . 1811 "pg00000002540160" 1934 . 1811 "pg00000002540162" 1934 . 1811 "pg00000002540164" 1934 . 1811 "pg00000002540166" 1934 . 1811 "pg00000002540167" 1934 . 1811 "pg00000002540169" 1934 . 1811 "pg00000002540171" 1934 . 1811 "pg00000002540173" 1934 . 1811 "pg00000002540175" 1934 . 1811 "pg00000002540176" 1934 . 1811 "pg00000002540177" 1934 . 1811 "pg00000002540178" 1934 . 1811 "pg00000002540180" 1934 . 1811 "pg00000002540181" 1934 . 1811 "pg00000002540183" 1934 . 1811 "pg00000002540184" 1934 . 1811 "pg00000002540186" 1934 . 1811 "pg00000002540188" 1934 . 1811 "pg00000002540190" 1934 . 1811 "pg00000002540191" 1934 . 1811 "pg00000002540193" 1934 . 1811 "pg00000002540195" 1934 . 1811 "pg00000002540197" 1934 . 1811 "pg00000002540199" 1934 . 1811 "pg00000002540200" 1934 . 1811 "pg00000002540201" 1934 . 1811 "pg00000002540202" 1934 . 1811 "pg00000002540203" 1934 . 1811 "pg00000002540205" 1934 . 1811 "pg00000002540206" 1934 . 1811 "pg00000002540207" 1934 . 1811 "pg00000002540209" 1934 1914 1914 "pg00000002540211" 1934 . 1914 "pg00000002540213" 1934 . 1914 "pg00000002540215" 1934 . 1914 "pg00000002540217" 1934 . 1914 "pg00000002540220" 1934 . 1914 "pg00000002540221" 1934 . 1914 "pg00000002540223" 1934 . 1914 "pg00000002540225" 1934 . 1914 "pg00000002540227" 1934 . 1914 "pg00000002540228" 1934 . 1914 "pg00000002540230" 1934 . 1914 "pg00000002540232" 1934 . 1914 "pg00000002540233" 1934 . 1914 "pg00000002540236" 1934 . 1914 "pg00000002540238" 1934 . 1914 "pg00000002540239" 1934 . 1914 "pg00000002540240" 1934 . 1914 "pg00000002540241" 1934 . 1914 "pg00000002540242" 1934 . 1914 "pg00000002540243" 1934 . 1914 "pg00000002540245" 1934 . 1914 "pg00000002540247" 1934 . 1914 "pg00000002540249" 1934 . 1914 "pg00000002540250" 1934 . 1914 "pg00000002540251" 1934 . 1914 "pg00000002540253" 1934 . 1914 "pg00000002540255" 1934 . 1914 "pg00000002540256" 1934 . 1914 "pg00000002540258" 1934 . 1914 "pg00000002540260" 1934 . 1914 "pg00000002540262" 1934 . 1914 "pg00000002540263" 1934 . 1914 "pg00000002540265" 1934 . 1914 "pg00000002540267" 1934 . 1914 "pg00000002540268" 1934 . 1914 "pg00000002540269" 1934 . 1914 "pg00000002540270" 1934 . 1914 "pg00000002540271" 1934 . 1914 "pg00000002540272" 1934 . 1914 "pg00000002540273" 1934 . 1914 "pg00000002540274" 1934 . 1914 "pg00000002540276" 1934 . 1914 "pg00000002540278" 1934 . 1914 "pg00000002540280" 1934 . 1914 "pg00000002540282" 1934 . 1914 "pg00000002540283" 1934 . 1914 "pg00000002540285" 1934 . 1914 "pg00000002540287" 1934 . 1914 "pg00000002540289" 1934 . 1914 "pg00000002540291" 1934 . 1914 "pg00000002540293" 1934 . 1914 "pg00000002540295" 1934 . 1914 "pg00000002540296" 1934 . 1914 "pg00000002540297" 1934 . 1914 "pg00000002540299" 1934 . 1914 "pg00000002540300" 1934 . 1914 "pg00000002540302" 1934 . 1914 "pg00000002540304" 1934 . 1914 "pg00000002540306" 1934 . 1914 "pg00000002540308" 1934 . 1914 "pg00000002540310" 1934 . 1914 "pg00000002540312" 1934 . 1914 "pg00000002540314" 1934 . 1914 "pg00000002540316" 1934 . 1914 "pg00000002540318" 1934 . 1914 "pg00000002540320" 1934 . 1914 "pg00000002540322" 1934 . 1914 "pg00000002540324" 1934 . 1914 "pg00000002540325" 1934 . 1914 "pg00000002540326" 1934 . 1914 "pg00000002540327" 1934 . 1914 "pg00000002540329" 1934 . 1914 "pg00000002540330" 1934 . 1914 "pg00000002540331" 1934 . 1914 "pg00000002540332" 1934 . 1914 "pg00000002540333" 1934 . 1914 "pg00000002540336" 1934 . 1914 "pg00000002540338" 1934 . 1914 "pg00000002540339" 1934 1915 1915 "pg00000002540341" 1934 1915 1915 "pg00000002540343" 1934 . 1915 "pg00000002540345" 1934 . 1915 "pg00000002540347" 1934 . 1915 "pg00000002540349" 1934 . 1915 "pg00000002540350" 1934 . 1915 "pg00000002540351" 1934 . 1915 "pg00000002540353" 1934 . 1915 "pg00000002540355" 1934 . 1915 "pg00000002540357" 1934 . 1915 "pg00000002540359" 1934 . 1915 "pg00000002540360" 1934 . 1915 "pg00000002540361" 1934 . 1915 "pg00000002540362" 1934 . 1915 "pg00000002540364" 1934 . 1915 "pg00000002540366" 1934 . 1915 "pg00000002540367" 1934 . 1915 "pg00000002540369" 1934 . 1915 "pg00000002540370" 1934 . 1915 "pg00000002540372" 1934 . 1915 "pg00000002540374" 1934 . 1915 "pg00000002540376" 1934 . 1915 "pg00000002540378" 1934 . 1915 "pg00000002540379" 1934 . 1915 "pg00000002540381" 1934 . 1915 "pg00000002540383" 1934 . 1915 "pg00000002540385" 1934 . 1915 "pg00000002540387" 1934 . 1915 end