I have seen different ways of adding an independent dummy variable in regression models. The first approach is to just list the varname or adding i. before the varname. The two methods produce different results when I run the same model.
My goal is to estimate the probability of an individual dropping out from a dataset of sub-reddit posts, controlling for the user's average sentiment in the posts, timing of the reddit posts that they wrote, and the average responses the individual received per reddit post. The outcome variable dropout equals 1 if a user drops (i.e. stop using the subreddit once the government launches a specific economic policy and 0 otherwise. I want to test whether individuals were more likely to dropout in the month right before the policy's implementation, but I am not sure how to structure my time variable in the logistic regression model.
First, here is a data example:
```
* Example generated by -dataex-. For more info, type help dataex clear input byte drop_out float month_year double avg_sentiment float avg_response 1 625 -1 1 1 623 -1 0 1 631 0 0 . 632 0 . . 632 0 . 1 625 1 6 0 624 -.2105263157894737 .2105263 0 629 -.2105263157894737 .2105263 0 629 -.2105263157894737 .2105263 0 629 -.2105263157894737 .2105263 0 629 -.2105263157894737 .2105263 0 629 -.2105263157894737 .2105263 0 629 -.2105263157894737 .2105263 0 629 -.2105263157894737 .2105263 0 629 -.2105263157894737 .2105263 0 630 -.2105263157894737 .2105263 0 630 -.2105263157894737 .2105263 0 630 -.2105263157894737 .2105263 0 630 -.2105263157894737 .2105263 0 630 -.2105263157894737 .2105263 0 632 -.2105263157894737 . 0 632 -.2105263157894737 . 0 632 -.2105263157894737 . 0 632 -.2105263157894737 . 0 633 -.2105263157894737 . 1 623 -.2307692307692307 .6923077 1 623 -.2307692307692307 .6923077 1 623 -.2307692307692307 .6923077 1 623 -.2307692307692307 .6923077 1 624 -.2307692307692307 .6923077 end format %tm month_year ``` Here is the first model is without using i.varname:
```
xtlogit drop_out month_year avg_sentiment avg_respons
```
Is the result below telling us that higher month_year values are correlated on average, with a lower probability of dropout at -.335?

I then ran the same model using i.varname for the month_year dummy variable:
```
xtlogit drop_out i.month_year avg_sentiment avg_response
```
I have seen different ways of adding an independent dummy variable in regression models. The first approach is to just list the varname or adding i. before the varname. The two methods produce different results when I run the same model.
First, here is a data example:
* Example generated by -dataex-. For more info, type help dataex clear input byte drop_out float month_year double avg_sentiment float avg_response 1 625 -1 1 1 623 -1 0 1 631 0 0 . 632 0 . . 632 0 . 1 625 1 6 0 624 -.2105263157894737 .2105263 0 629 -.2105263157894737 .2105263 0 629 -.2105263157894737 .2105263 0 629 -.2105263157894737 .2105263 0 629 -.2105263157894737 .2105263 0 629 -.2105263157894737 .2105263 0 629 -.2105263157894737 .2105263 0 629 -.2105263157894737 .2105263 0 629 -.2105263157894737 .2105263 0 630 -.2105263157894737 .2105263 0 630 -.2105263157894737 .2105263 0 630 -.2105263157894737 .2105263 0 630 -.2105263157894737 .2105263 0 630 -.2105263157894737 .2105263 0 632 -.2105263157894737 . 0 632 -.2105263157894737 . 0 632 -.2105263157894737 . 0 632 -.2105263157894737 . 0 633 -.2105263157894737 . 1 623 -.2307692307692307 .6923077 1 623 -.2307692307692307 .6923077 1 623 -.2307692307692307 .6923077 1 623 -.2307692307692307 .6923077 1 624 -.2307692307692307 .6923077 end format %tm month_year
Here are the two models, where I am estimating the probability of an individual dropping out from the dataset controlling for their online average sentiment, time of the reddit posts they wrote, and the average responses they received per reddit post.
The first model is without using i.varname:
xtlogit drop_out month_year avg_sentiment avg_response
Is the result below telling us that higher month_year values are correlated on average, with a lower probability of dropout at -.335?
I then ran the same model using i.varname for the month_year dummy variable:
xtlogit drop_out i.month_year avg_sentiment avg_response
Here is the result, but I am a bit confused as to why the coefficients for month_year differ than the prior model by 1)having positive results and 2)much larger in magnitude.
My goal is to estimate the probability of an individual dropping out from a dataset of sub-reddit posts, controlling for the user's average sentiment in the posts, timing of the reddit posts that they wrote, and the average responses the individual received per reddit post. The outcome variable dropout equals 1 if a user drops (i.e. stop using the subreddit once the government launches a specific economic policy and 0 otherwise. I want to test whether individuals were more likely to dropout in the month right before the policy's implementation, but I am not sure how to structure my time variable in the logistic regression model.
First, here is a data example:
```
* Example generated by -dataex-. For more info, type help dataex clear input byte drop_out float month_year double avg_sentiment float avg_response 1 625 -1 1 1 623 -1 0 1 631 0 0 . 632 0 . . 632 0 . 1 625 1 6 0 624 -.2105263157894737 .2105263 0 629 -.2105263157894737 .2105263 0 629 -.2105263157894737 .2105263 0 629 -.2105263157894737 .2105263 0 629 -.2105263157894737 .2105263 0 629 -.2105263157894737 .2105263 0 629 -.2105263157894737 .2105263 0 629 -.2105263157894737 .2105263 0 629 -.2105263157894737 .2105263 0 630 -.2105263157894737 .2105263 0 630 -.2105263157894737 .2105263 0 630 -.2105263157894737 .2105263 0 630 -.2105263157894737 .2105263 0 630 -.2105263157894737 .2105263 0 632 -.2105263157894737 . 0 632 -.2105263157894737 . 0 632 -.2105263157894737 . 0 632 -.2105263157894737 . 0 633 -.2105263157894737 . 1 623 -.2307692307692307 .6923077 1 623 -.2307692307692307 .6923077 1 623 -.2307692307692307 .6923077 1 623 -.2307692307692307 .6923077 1 624 -.2307692307692307 .6923077 end format %tm month_year ``` Here is the first model is without using i.varname:
```
xtlogit drop_out month_year avg_sentiment avg_respons
```
Is the result below telling us that higher month_year values are correlated on average, with a lower probability of dropout at -.335?

I then ran the same model using i.varname for the month_year dummy variable:
```
xtlogit drop_out i.month_year avg_sentiment avg_response
```
I have seen different ways of adding an independent dummy variable in regression models. The first approach is to just list the varname or adding i. before the varname. The two methods produce different results when I run the same model.
First, here is a data example:
* Example generated by -dataex-. For more info, type help dataex clear input byte drop_out float month_year double avg_sentiment float avg_response 1 625 -1 1 1 623 -1 0 1 631 0 0 . 632 0 . . 632 0 . 1 625 1 6 0 624 -.2105263157894737 .2105263 0 629 -.2105263157894737 .2105263 0 629 -.2105263157894737 .2105263 0 629 -.2105263157894737 .2105263 0 629 -.2105263157894737 .2105263 0 629 -.2105263157894737 .2105263 0 629 -.2105263157894737 .2105263 0 629 -.2105263157894737 .2105263 0 629 -.2105263157894737 .2105263 0 630 -.2105263157894737 .2105263 0 630 -.2105263157894737 .2105263 0 630 -.2105263157894737 .2105263 0 630 -.2105263157894737 .2105263 0 630 -.2105263157894737 .2105263 0 632 -.2105263157894737 . 0 632 -.2105263157894737 . 0 632 -.2105263157894737 . 0 632 -.2105263157894737 . 0 633 -.2105263157894737 . 1 623 -.2307692307692307 .6923077 1 623 -.2307692307692307 .6923077 1 623 -.2307692307692307 .6923077 1 623 -.2307692307692307 .6923077 1 624 -.2307692307692307 .6923077 end format %tm month_year
Here are the two models, where I am estimating the probability of an individual dropping out from the dataset controlling for their online average sentiment, time of the reddit posts they wrote, and the average responses they received per reddit post.
The first model is without using i.varname:
xtlogit drop_out month_year avg_sentiment avg_response
Is the result below telling us that higher month_year values are correlated on average, with a lower probability of dropout at -.335?

I then ran the same model using i.varname for the month_year dummy variable:
xtlogit drop_out i.month_year avg_sentiment avg_response
Here is the result, but I am a bit confused as to why the coefficients for month_year differ than the prior model by 1)having positive results and 2)much larger in magnitude.

Comment