I want to run a logistic regression where my outcome variable (i.e. drop_status) equals 1 if an individual stops using a given sub-reddit after a government policy is introduced, and 0 otherwise.
If my objective is learning about every user's dropout probability from Reddit based on their online behavior, I am not if my unit of analysis should be at the post-level or username-level?
Specifically, each post is evaluated using a continuous sentiment variable (i.e. 1 for positive, -1 for negative, or 0 for neutral sentiment), so if I want to learn about Kenny's probability of dropout based on the online sentiment of all of his posts, shall I compute an average sentiment for each username and then run the model with a continuous variable based on average sentiment?
Would this mean running the following where each row would represent one user and their average sentiment:
```
collapse mood, by(username)
```
Here is a data example:
```
dataex username int date long mood byte drop_status
```
```
----------------------- copy starting from the next line ----------------------- [CODE] * Example generated by -dataex-. For more info, type help dataex clear input str36 username int date long mood byte drop_status username date mood drop_status Kenny 2020-09-02. -1 1 Kenny 2020-09-03. -1 1 Kenny 2020-09-07. 1 1 Cartman 2020-09-03. -1 0 Cartman 2020-09-06. -1 0 Cartman 2020-09-08. -1 0 Mackey 2020-09-03. 0 0 Mackey 2020-09-04. 0 0 Mackey 2020-09-08. 1 0 Kyle 2020-09-13. -1 1 Kyle 2020-09-14. -1 1 ------------------ copy up to and including the previous line ------------------ ```
If my objective is learning about every user's dropout probability from Reddit based on their online behavior, I am not if my unit of analysis should be at the post-level or username-level?
Specifically, each post is evaluated using a continuous sentiment variable (i.e. 1 for positive, -1 for negative, or 0 for neutral sentiment), so if I want to learn about Kenny's probability of dropout based on the online sentiment of all of his posts, shall I compute an average sentiment for each username and then run the model with a continuous variable based on average sentiment?
Would this mean running the following where each row would represent one user and their average sentiment:
```
collapse mood, by(username)
```
Here is a data example:
```
dataex username int date long mood byte drop_status
```
```
----------------------- copy starting from the next line ----------------------- [CODE] * Example generated by -dataex-. For more info, type help dataex clear input str36 username int date long mood byte drop_status username date mood drop_status Kenny 2020-09-02. -1 1 Kenny 2020-09-03. -1 1 Kenny 2020-09-07. 1 1 Cartman 2020-09-03. -1 0 Cartman 2020-09-06. -1 0 Cartman 2020-09-08. -1 0 Mackey 2020-09-03. 0 0 Mackey 2020-09-04. 0 0 Mackey 2020-09-08. 1 0 Kyle 2020-09-13. -1 1 Kyle 2020-09-14. -1 1 ------------------ copy up to and including the previous line ------------------ ```
Comment