Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Importing and Assigning Data By Year

    Hello,

    I'm working on a research paper in which I'm examining the relationship between sea surface temperature and lobster landings in the Gulf of Maine from 2000 - 2021. I've isolated three regions of ocean off the coast of Maine and designated them areas 1, 2, and 3, as seen below. I then found the average temperature by year for each separate region.

    replace area = 1 if (lat>40 & lat<45) & (lon<-69.2 & lon>-71)
    replace area = 2 if (lat>40 & lat<45) & (lon<-68.25 & lon>-69.2)
    replace area = 3 if (lat>40 & lat<45) & (lon<-67 & lon>-68.25)

    keep if (area==1) | (area==2) | (area==3)

    bys year: egen mean_temp1 = mean(v) if (area==1)
    bys year: egen mean_temp2 = mean(v) if (area==2)
    bys year: egen mean_temp3 = mean(v) if (area==3)

    My data appears in Stata as seen below.

    Year month lat lon v(temp) area mean_temp1 mean_temp2 mean_temp3
    2003 1 40.5 -70.5 9.07 1 11.03568
    2003 1 40.5 -69.5 7.32 1 11.03568
    2003 1 40.5 -68.5 9.94 2 10.59574
    2003 1 40.5 -67.5 10.67 3 10.34583
    2003 1 41.5 -70.5 5.28 1 11.03568
    2003 1 41.5 -69.5 6.89 1 11.03568


    Now, I'm attempting to import county level landings data to find a correlation between sea surface temperature and landings from year to year. I'm examining 7 counties (Cumberland, Hancock, Knox, Lincoln, Sagadahoc, Washington, and York). Due to my limited experience in Stata, I'm having two issues:

    1.What's the best way to import the county data? I have each county's data in a separate CSV file within which the data is presented as seen below (2000-2021). How can I import the data so that the landings (pounds/value) is arranged next to the years already in my dataset (to make for smooth regressions)?
    Year Pounds Value
    2000 9,519,281 $30,514,219
    2001 6,520,260 $19,803,951
    2002 9,591,508 $31,220,418
    2003 5,595,642 $20,519,959
    2004 9,255,280 $36,591,790
    2. Once each county's dataset is imported, what command do I use to assign counties to each of the three temperature areas (1,2,3) that I described above? For context, I'm trying to show that counties with ports in region 3, which has the coldest sea surface temperatures, have experienced the most growth in lobster landings over the past 20 years.


    Apologies if this was a confusing explanation of my questions-- I have limited experience doing research in Stata and any advice is appreciated. Happy to answer any clarifying questions. Thank you.

  • #2
    This is a reupload of a table demonstrating how my Stata results appear because the first depiction is confusing.
    Click image for larger version

Name:	Screen Shot 2021-12-05 at 3.27.24 PM.png
Views:	1
Size:	70.4 KB
ID:	1639549

    Comment


    • #3
      My data appears in Stata as seen below.

      Year month lat lon v(temp) area mean_temp1 mean_temp2 mean_temp3
      2003 1 40.5 -70.5 9.07 1 11.03568
      2003 1 40.5 -69.5 7.32 1 11.03568
      2003 1 40.5 -68.5 9.94 2 10.59574
      2003 1 40.5 -67.5 10.67 3 10.34583
      2003 1 41.5 -70.5 5.28 1 11.03568
      2003 1 41.5 -69.5 6.89 1 11.03568
      No, it doesn't. It can't possibly look like that because v(temp) is not a legal variable name, and you show names for 9 variables but values only for 3.

      1.What's the best way to import the county data? I have each county's data in a separate CSV file within which the data is presented as seen below (2000-2021). How can I import the data so that the landings (pounds/value) is arranged next to the years already in my dataset (to make for smooth regressions)?
      Take it one step at a time. First import each county's CSV file separately into Stata and save as a Stata .dta file, preferably named after the county. Inspect the resulting .dta files to assure that the data are truly compatible in the different data sets in terms of having the same measurement units, identical value labeling (if any), and identical data storage types (i.e. string vs. numeric). If you are unsure about the latter, Mark Chatfield's -precombine- command, available from Stata Journal (-findit precombine-) is immensely helpful in warning you of incompatibilities that aren't always obvious. Fix up any incompatibilities, and then combine them into a single file, adding in a variable that identifies the name of the county. This is done by -appending- them in a loop. The code for that would look something like this:

      Code:
      local counties Cumberland Hancock Knox Lincoln Sagadahoc Washington York
      tempfile building
      save `building', emptyok
      foreach c of local counties {
          use `"`c'"', clear
          gen county = `"`c'"'
          append using `buliding'
          save `"`building'"', replace
      }
      save all_counties_temperature_data, replace
      Now, you face another problem, on which I cannot advise you. You need to know which counties belong in which area. I can't advise you on how to find that information if you don't know. Assuming you have it, it needs to be added to the all_counties_data.dta set as a new variable. If you have the information already available in some other Stata data set, -merge-ing them on county will do the job. If not, perhaps you can create a small data set with just 6 observations, one for each county, and just 2 variables: county and area. Then -merge- that with all_counties_data.dta.

      That takes care of the temperature data.

      Do something similar with the landing data: import each county's data separately into a Stata data set, and then append them in a loop, adding a variable identifying the county. Be sure to use the same spelling and case for the county names both times! Save the combined data as a Stata data set, let's call it all_counties_landing_data.dta.

      Now you can combine them:
      Code:
      use all_counties_temperature_data, clear
      merge m:1 county year using all_counties_landing_data
      That will bring together all of the variables you have mentioned in a layout suitable for the kind of analysis you seem to want to do.

      Now you have a design question to confront. You have monthly temperature data but yearly landing data. You can aggregate the temperature data up to the yearly level, perhaps taking means or medians or max or min, or whatever. The -collapse- command is your friend for that. Or, you can keep the combined data at the monthly level, with the same temperature values for a given year being shown for every month (which is what the data look like at this point if you do nothing further to them). There are pros and cons to either approach and the decision between them depends on substantive non-statistical issues--so you might want to consult the literature or a colleague in your field about that.

      In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

      Added: Crossed with #2.

      Last edited by Clyde Schechter; 05 Dec 2021, 13:52.

      Comment


      • #4
        Seeing more clearly the data in #2, I realize that some of the advice given in #3 is incorrect.

        I see now that the first data set identifies the areas but not the counties, and already contains information on each area. So for the temperature data, all you need to do is add a county variable. This requires first creating a Stata data set with 6 observations (1 for each county) and 2 variables (county and area). Let's call that one county_area_crosswalk.dta. Then you

        Code:
        use all_areas_temperature_data, clear
        merge 1:m area using county_area_crosswalk
        drop _merge
        save all_counties_temperature_data, replace
        Then you need to import the landing data for each county into a separate Stata data set, named after the county. Then verify that they are all compatible for combining, and run an append loop:

        Code:
        local counties Cumberland Hancock Knox Lincoln Sagadahoc Washington York
        tempfile building
        save `building', emptyok
        foreach c of local counties {
             use `"`c'"', clear gen county = `"`c'"'
             append using `buliding'
             save `"`building'"', replace
        }
        save all_counties_landing_data, replace
        Then you can combine that with the temperature data:

        Code:
        use all_counties_temperature_data, clear
        merge m:1 county year using all_counties_landing_data
        save all_data, replace
        Again, you still have to decide how to handle the discrepancy between annual landing data and monthly temperature data.

        Comment

        Working...
        X