Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question about formatting for logistic regression

    Hi all,

    I am working on a dataset with 37 variables and 380 observations. Individuals from three non-randomized groups were given 0, 1 or 2 interventions.

    Each individual was surveyed twice, once after the intervention, and once again two years later. During the second survey the same individuals were contacted and identified as the same with a unique caseid and a TimeSeries variable (0 if first survey, 1 if second). The second survey only repeated 4 questions from the first survey (the dependent measures) and collected demographic information including gender, age, education, occupation.

    The data is inputted currently as follows in the long format. (REdithSS REdithWealth and Age are completely filled for all observations TimeSeries == 1).

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long Village float nQ19 long(REdithSS REdithWealth) int Age float TimeSeries
    2 1 . . . 0
    1 3 . . . 0
    1 2 . . . 0
    3 3 . . . 0
    1 1 . . . 0
    2 1 . . . 0
    3 1 . . . 0
    1 2 . . . 0
    2 2 . . . 0
    2 2 . . . 0
    1 2 . . . 0
    2 3 . . . 0
    3 4 . . . 0
    1 1 . . . 0
    1 4 . . . 0
    end
    label values Village village
    label def village 1 "Bugembe", modify
    label def village 2 "Kijinjomi", modify
    label def village 3 "Kyakabuzi", modify
    label values REdithSS LEdithSS
    label values REdithWealth LEdithWealth
    The team I am with wants to make a new dependent variable that is the measure of the change for nQ19 between time 0 and time 1 and then run a multiple linear regression with Village (intervention received) and demographics as the predictors.

    From what I understand this would require reshaping to wide format. However, when I try reshape this is what I get:

    Code:
    reshape wide nQ19 nQ20 nQ21 nQ22, i(caseid) j(TimeSeries)
    (note: j = 0 1)
    variable Age not constant within caseid
    variable gender not constant within caseid
    variable Rtribe not constant within caseid
    variable Roccupation not constant within caseid
    variable Reducation not constant within caseid
    variable REdithWealth not constant within caseid
    variable REdithSS not constant within caseid
        Your data are currently long.  You are performing a reshape wide.  You typed something like
    
            . reshape wide a b, i(caseid) j(TimeSeries)
    
        There are variables other than a, b, caseid, TimeSeries in your data.  They must be constant within caseid because that is the only way they can fit into wide
        data without loss of information.
    
        The variable or variables listed above are not constant within caseid.  Perhaps the values are in error.  Type reshape error for a list of the problem
        observations.
    
        Either that, or the values vary because they should vary, in which case you must either add the variables to the list of xij variables to be reshaped, or drop
        them.
    r(9);
    The Stata output seems to suggest that I would have to manually copy the responses from time 1 to time 0 for the same caseid to make the variables constant (the research team has already decided that they consider demographics to be consistent over the two year period).

    Is there any other way to conduct this regression? If not, is there code that would allow me to copy responses over from time 1 to time 0 for the same caseid?

    Thank you for your help,
    Christopher Tracey

  • #2
    You haven't given us enough info to replicate your problem. Caseid is not included in the dataex output. Also 3 of the variables have nothing but missing data. Many vars in your reshape command are not included in dataex.

    You say the team considers demographic info to be consistent over time, but your data seem to disagree. Do you mean they are filled in for time 1 but missing for all later time periods?

    You might want to redo dataex, including caseid. Then make sure it is adequate to produce a reproducible example.
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    StataNow Version: 19.5 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      You say the team considers demographic info to be consistent over time, but your data seem to disagree. Do you mean they are filled in for time 1 but missing for all later time periods?
      If that is the problem, I suspect you can do something like

      Code:
      bysort caseid: gen xgender= gender[1]
      Or, if you are sure you won't be wiping out data incorrectly,

      Code:
      bysort caseid: replace gender = gender[1] if missing(gender)
      -------------------------------------------
      Richard Williams, Notre Dame Dept of Sociology
      StataNow Version: 19.5 MP (2 processor)

      EMAIL: [email protected]
      WWW: https://www3.nd.edu/~rwilliam

      Comment


      • #4
        Here is the dataex redone to show the crossover from TimeSeries 0 to TimeSeries 1 and includes all demographic and outcome variables with caseid and should provide enough to produce a reproducible sample.

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input int Age byte HHNum str7 caseid float(TimeSeries LSTotal) long(Village gender Rtribe Roccupation Reducation Rroof RGeneralLook RFloor) float rooms long(REdithWealth REdithSS RProximityForest RForestFrequency) float(nQ19 nQ20 nQ21 nQ22)
         . . "1"   0  . 1 . . . . . . . . . . . . 3 3 2 3
        28 8 "1"   1  7 1 2 2 1 3 2 2 1 3 2 2 4 5 2 5 5 3
        30 5 "10"  1  7 1 2 4 1 3 1 2 1 3 2 2 4 5 2 4 6 5
         . . "10"  0  . 1 . . . . . . . . . . . . 2 3 2 3
         . . "102" 0  . 2 . . . . . . . . . . . . 2 2 2 1
        24 4 "102" 1  4 2 3 2 1 2 1 2 1 2 2 2 2 4 2 3 2 3
         . . "103" 0  . 2 . . . . . . . . . . . . 2 2 1 0
        23 3 "103" 1  3 2 3 4 1 2 1 2 1 2 3 2 1 5 2 2 4 2
        38 8 "104" 1 12 2 3 2 1 2 2 1 1 4 2 2 3 3 3 2 3 3
         . . "104" 0  . 2 . . . . . . . . . . . . 3 1 1 3
        end
        label values Village village
        label def village 1 "Bugembe", modify
        label def village 2 "Kijinjomi", modify
        label values gender gender
        label def gender 2 "F", modify
        label def gender 3 "M", modify
        label values Rtribe Ltribe
        label def Ltribe 2 "Mukiga", modify
        label def Ltribe 4 "Other", modify
        label values Roccupation Loccupation
        label def Loccupation 1 "Peasant", modify
        label values Reducation Leducation
        label def Leducation 2 "Some Primary", modify
        label def Leducation 3 "Some Secondary", modify
        label values Rroof Lroof
        label def Lroof 1 "Old Iron Sheets", modify
        label def Lroof 2 "Newer Iron Sheets", modify
        label values RGeneralLook LGeneralLook
        label def LGeneralLook 1 "Permanent", modify
        label def LGeneralLook 2 "Temporary", modify
        label values RFloor LFloor
        label def LFloor 1 "Dirt", modify
        label values REdithWealth LEdithWealth
        label def LEdithWealth 2 "Middle", modify
        label def LEdithWealth 3 "Low", modify
        label values REdithSS LEdithSS
        label def LEdithSS 2 "Middle", modify
        label values RProximityForest LProximityForest
        label def LProximityForest 1 "<100m", modify
        label def LProximityForest 2 "100m-300m", modify
        label def LProximityForest 3 "300m-1km", modify
        label def LProximityForest 4 ">1km", modify
        label values RForestFrequency LForestFrequency
        label def LForestFrequency 3 "2/week", modify
        label def LForestFrequency 4 "2/month", modify
        label def LForestFrequency 5 "never", modify
        As you can see the demographic variables are only included in TimeSeries 1 observations, but the Village and outcome measures nQ19-22 were recorded during both surveys. Each caseid has a time 0 observation and a time 1 observation.

        Originally posted by Richard Williams View Post
        You say the team considers demographic info to be consistent over time, but your data seem to disagree. Do you mean they are filled in for time 1 but missing for all later time periods?
        Yes, they are filled in for time 1 but missing for time 0, there are no other time measures.

        Originally posted by Richard Williams View Post

        If that is the problem, I suspect you can do something like

        Code:
        bysort caseid: gen xgender= gender[1]
        Or, if you are sure you won't be wiping out data incorrectly,

        Code:
        bysort caseid: replace gender = gender[1] if missing(gender)

        Since there are no entries for TimeSeries == 0 I believe that:

        Code:
        bysort caseid: replace gender = gender[1] if missing(gender)
        should work for filling in the variables.

        ​​​​​​​Is reshaping the data the best option to run a multiple mixed regression where the dependent measure is the change in nQ## from TimeSeries 0 to TimeSeries 1?

        Thank you for your time



        Comment

        Working...
        X