Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Summing over a value of variable for repeating county

    Hello respected stata community,

    I have a dummy variable which takes the value 0 and 1. The observation for different counites are given below in a single year 2017. Here, the counties get repeated several times in a single year. I need to find the combined value of the variable length for each unique county of a unique state in a single year when dummy variable takes the value 1. And, if in any observation of a unique county never takes the value 1 , then the desired value of variable combined length will show zero.

    Can anyone kindly tell me how I can do it ? I've thought for couple of days but havent come up with any efficient coding yet.

    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int Year_Recor byte State_Code int County_Cod byte dummy float length
    2017 1  89 0    .1
    2017 1   1 1  .043
    2017 1 123 0    .1
    2017 1  45 0  .097
    2017 1  97 0  .438
    2017 1 123 0    .1
    2017 1 117 1    .1
    2017 1  25 0    .1
    2017 1 123 1    .1
    2017 1  51 0    .1
    2017 1 123 0    .1
    2017 1   3 0    .1
    2017 1 115 0    .1
    2017 1  71 0    .1
    2017 1  77 0  .034
    2017 1  61 0    .1
    2017 1  39 0    .1
    2017 1 125 0    .1
    2017 1  59 0    .1
    2017 1  15 0    .1
    2017 1  89 1    .1
    2017 1  81 1    .1
    2017 1  77 0    .1
    2017 1 127 0  .036
    2017 1 103 0    .1
    2017 1  49 0  .018
    2017 1  81 0    .1
    2017 1 101 0    .1
    2017 1  91 0    .1
    2017 1  91 0  .087
    2017 1  49 1    .1
    2017 1 115 0    .1
    2017 1 107 0    .1
    2017 1  89 0  .004
    2017 1  43 0  2.22
    2017 1  31 0    .1
    2017 1  79 0 1.013
    2017 1 125 1  .056
    2017 1 117 0    .1
    2017 1   3 1   .04
    2017 1  49 1    .1
    2017 1 125 1  .052
    end


  • #2
    Tariq:
    I'm not sure I've fully understood your question but you may want to try what follows:
    Code:
    . bysort State_Code County_Cod Year_Recor: egen wanted=sum( length) if dummy==1
    
    . bysort State_Code County_Cod Year_Recor: egen wanted2=max( length)
    
    . bysort State_Code County_Cod Year_Recor: egen wanted3=min( length)
    
    . bysort State_Code County_Cod : replace wanted=0 if wanted2==0 & wanted3==0
    
    . drop wanted2 wanted3
    
    .
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Mr. Lazzaro,

      I am so humbled for you taking the time to guide me despite my vague description of my goal.

      Let me be a little more clear about what I'm trying to do: Here, the length is road length and I need to sum up the total interstate road length in different counties. Since there are a lot of interstate highways in a county.

      So, a county like 123, 125, and so on show up several times. My target is to find the combined length of the interstate highway ( that's when dummy ==1, 0 = otherwise ). So, after combining the length of the interstate highways of a single county. I want that county to show up only once with the combined value of the length of the highway so that I can create a panel of counties in a single year. I tried your code but I believe that's not giving me the combined value of length for a unique county when dummy ==1 nor it's showing up once ( the repeated presence of the same county is still there ).

      If you again kindly guide me to get my desired outcome, that'd be immensely kind of you!

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input int Year_Recor byte State_Code int County_Cod byte dummy float(length wanted)
      2017 1 0 0  .029      .
      2017 1 0 0    .1      .
      2017 1 0 0    .1      .
      2017 1 0 0    .1      .
      2017 1 0 0  .001      .
      2017 1 0 0    .1      .
      2017 1 0 0  .001      .
      2017 1 1 0    .1      .
      2017 1 1 0  .091      .
      2017 1 1 1    .1 88.989
      2017 1 1 0  .004      .
      2017 1 1 0    .1      .
      2017 1 1 0    .1      .
      2017 1 1 0    .1      .
      2017 1 1 1    .1 88.989
      2017 1 1 0    .1      .
      2017 1 1 0  .016      .
      2017 1 1 1  .055 88.989
      2017 1 1 0    .1      .
      2017 1 1 0    .1      .
      2017 1 1 0  .824      .
      2017 1 1 1    .1 88.989
      2017 1 1 1    .1 88.989
      end
      Last edited by Tariq Abdullah; 25 Jun 2022, 12:39.

      Comment


      • #4
        Code:
        egen x = total(length) if dummy, by(County_Cod)
        bysort County_Cod (x): gen wanted = cond(x[1]<., x[1], 0)

        Comment


        • #5
          Mr. Snilsberg,

          Thanks so much for your kind guidance to solve the issue I was having.

          I'll have a panel data with 50 states with 20 years with thier multiple counties. So, I need to sort by state and county once I merge all the data. After following your code, my data looks like the folowing - which gives me the combined length of the interstate highway when dummy takes the value 1 but it's still not showing the panel data form ( like there is still repetition of the same county in the dataset ). Do I need to change your code in a certain way to get the desired result ? Any guidance is highly appreciated!

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input int Year_Recor byte State_Code int County_Cod byte dummy float(length x wanted)
          2017 1 0 0   .1      .      0
          2017 1 0 0   .1      .      0
          2017 1 0 0   .1      .      0
          2017 1 0 0   .1      .      0
          2017 1 0 0 .001      .      0
          2017 1 0 0 .001      .      0
          2017 1 0 0 .029      .      0
          2017 1 1 1   .1 88.989 88.989
          2017 1 1 1   .1 88.989 88.989
          2017 1 1 1   .1 88.989 88.989
          2017 1 1 1   .1 88.989 88.989
          2017 1 1 1   .1 88.989 88.989
          2017 1 1 1   .1 88.989 88.989
          2017 1 1 1   .1 88.989 88.989
          2017 1 1 1 .056 88.989 88.989
          2017 1 1 1 .038 88.989 88.989
          2017 1 1 1   .1 88.989 88.989
          2017 1 1 1   .1 88.989 88.989
          2017 1 1 1   .1 88.989 88.989
          2017 1 1 1   .1 88.989 88.989
          2017 1 1 1   .1 88.989 88.989
          2017 1 1 1 .013 88.989 88.989
          2017 1 1 1   .1 88.989 88.989
          2017 1 1 1   .1 88.989 88.989
          2017 1 1 1   .1 88.989 88.989
          2017 1 1 1   .1 88.989 88.989
          2017 1 1 1   .1 88.989 88.989
          2017 1 1 1   .1 88.989 88.989
          2017 1 1 1 .039 88.989 88.989
          2017 1 1 1   .1 88.989 88.989
          2017 1 1 1 .009 88.989 88.989
          2017 1 1 1   .1 88.989 88.989
          2017 1 1 1   .1 88.989 88.989
          end

          Comment


          • #6
            Code:
            egen x = total(length) if dummy, by(State_Code County_Cod)
            bysort State_Code County_Cod (x): gen wanted = cond(x[1]<., x[1], 0)

            Comment


            • #7
              MY DATA AFTER RUNNING THE CODE - where you can see the repetition of the same county
              Code:
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input int Year_Recor byte State_Code int County_Cod byte dummy float(length x wanted)
              2017 1 0 0   .1      .      0
              2017 1 0 0   .1      .      0
              2017 1 0 0   .1      .      0
              2017 1 0 0   .1      .      0
              2017 1 0 0 .001      .      0
              2017 1 0 0 .001      .      0
              2017 1 0 0 .029      .      0
              2017 1 1 1   .1 88.989 88.989
              2017 1 1 1   .1 88.989 88.989
              2017 1 1 1   .1 88.989 88.989
              2017 1 1 1   .1 88.989 88.989
              2017 1 1 1   .1 88.989 88.989
              2017 1 1 1   .1 88.989 88.989
              2017 1 1 1   .1 88.989 88.989
              2017 1 1 1 .056 88.989 88.989
              2017 1 1 1 .038 88.989 88.989
              end
              After running my code I get my combined length for each county, but I need to delete the repeated value of same counties to make it a panel data.

              Comment


              • #8
                Tariq:
                you may want to try:
                Code:
                bysort State_Code County_Cod (year): keep if _n==1
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment

                Working...
                X