Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Convert R command into Stata

    Hi everyone,

    I have got a command for R, however, I am using Stata. I am not familiar with R, only with Stata. I can't use the code in R and use that data in Stata, because I must have the command. Is there someone who has experience with both Stata and R and can provide me with help to convert the command below?

    Most=rep(0, length(AB))
    for (i in 1:length(AB)){
    if (AB[i]==3)Most[i]=1
    if (AC[i]==3)Most[i]=1
    if (AC[i]==4)Most[i]=1
    if (AD[i]==4)Most[i]=1
    if (AD[i]==5)Most[i]=1
    if (AE[i]==4)Most[i]=1
    if (AE[i]==5)Most[i]=1
    if (AF[i]==4)Most[i]=1
    if (AF[i]==5)Most[i]=1
    if (AG[i]==4)Most[i]=1
    if (AG[i]==5)Most[i]=1
    }

    Thank you for you help in advance!

  • #2
    Fleur:
    welcome to the list.
    Maybe this link will be helpful http://www.stata.com/statalist/archi.../msg01280.html
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      Thanks Carlo. Unfortunately, I have to present the Stata command. Using the command in R and transfer the data to Stata is therefore not useful.

      Comment


      • #4
        Fleur:
        if you describe what you're after, prossibly some lister may reply positively.
        Kind regards,
        Carlo
        (Stata 18.0 SE)

        Comment


        • #5
          The example below, with made-up data, would seem to do what you want, given my understanding of R. Note that the code could be written more tightly, given the repeated comparisons to 4 and 5, but what I have is very close to where you started.
          Code:
          clear
          input byte (AB AC AD AE AF AG)
          1 1 1 1 1 1
          2 2 2 2 2 2
          3 0 0 0 0 0
          0 3 0 0 0 0
          0 4 0 0 0 0
          0 0 4 0 0 0
          0 0 5 0 0 0
          0 0 0 4 0 0
          0 0 0 5 0 0
          0 0 0 0 4 0
          0 0 0 0 5 0
          0 0 0 0 0 4
          0 0 0 0 0 5
          6 6 6 6 6 6
          end
          generate byte Most=0
          replace Most=1 if AB==3
          replace Most=1 if AC==3
          replace Most=1 if AC==4
          replace Most=1 if AD==4
          replace Most=1 if AD==5
          replace Most=1 if AE==4
          replace Most=1 if AE==5
          replace Most=1 if AF==4
          replace Most=1 if AF==5
          replace Most=1 if AG==4
          replace Most=1 if AG==5
          list, clean

          Comment


          • #6
            In addition to William's helpful code, the generate and replace statements could be collapsed to one:
            Code:
             
             gen Most = (AC == 3) | inlist(4, AC, AD, AE, AF, AG) | inlist(5, AD, AE, AF, AG)

            Comment


            • #7
              Thanks everyone for your help, I have figured it out thanks to your contributions.

              Comment


              • #8
                Hello everyone,

                I have a problem that is identical to Fleur’s one. I need to transform a code in R into a Stata code.

                I only know that the following R code builds an event-study figure showing average daily electricity consumption by households in the treated groups (3 treatment groups, variable ‘treatgr’, treatgr=1/2/3) compared to the control group over time (treatgr=0). The date of the treatment is May 2, 2017.


                library(tidyverse)
                dat <- read_csv("../Data/complete_data/daily_data.csv")

                day_avg <- dat %>%
                group_by(date, treatgrp) %>%
                summarise(mean_consumption = mean(elec_consumption)) %>%
                spread(key=treatgrp, value=mean_consumption) %>%
                mutate(diff1 = `1` - `0`,
                diff2 = `2` - `0`,
                diff3 = `3` - `0`) %>%
                select(date, contains("diff")) %>%
                gather(key=treatment, value=mean_consumption,contains("diff"))

                ggplot(day_avg,
                aes(x=date, y=mean_consumption, colour=factor(treatment))) +
                theme_bw() +
                xlab("") +
                geom_line() +
                ylab("Electricity consumption per day (kWh/day)") +
                geom_hline(yintercept=0) +
                geom_vline(xintercept = as.numeric(as.Date("02-05-2017","%d-%m-%Y")), linetype="dashed")


                Thanks in advance!!

                Comment


                • #9
                  I am clear that we don't rule out or advise against this kind of question. But I am advising that it's unlikely to get answered in practice. What you're asking is for someone fluent in both languages to read some R code, understand what it does, and then translate into Stata.

                  At the same time you're not even showing us what your data look like in Stata.

                  I will sincerely be happy for you if someone shows me up and does it for you, but my guess is that you're asking too much. This isn't a single statement (function, whatever): it's a script doing data management and then graphics.

                  Comment


                  • #10
                    Nick,

                    Yes, you are right: unfortunately, I cannot provide you with the data example in Stata. Thank you for your reply anyway!

                    Comment


                    • #11
                      Katherine Adams, you would increase your chances of someone helping you if you can provide example data of what you have and what you would like it to look like. It need not be real data, but should mimic the data that you have.

                      The graphing portion of your script is effectively just a scatter plot, with line connections and some annotations. Here is some (untested) example code, beware of typos and errors.
                      Code:
                      graph twoway (scatter yvar xvar if trt==1, msym(none) c(L) sort) || (scatter yvar xvar if trt==2, msym(none) c(L) sort), yline(0) xline(x intercept)

                      Comment


                      • #12
                        I may be able to help, but this involves a guess as to the data structure. Frequently, we can't post our data, but I would nonetheless encourage people to construct some fake data that's similar in concept, or to at least describe the variables and observations for us. It helps us help you.

                        Here, it sounds like the unit of observation is household-date (i.e. each row is electric consumption for, say, a household, on a specific date). You also have a variable called treatment group. You want to plot mean electric daily consumption over treatment group. You could try something like this:

                        Code:
                        preserve
                        collapse (mean) elec_consumption, by(treatment date)
                        reshape wide elec_consumption, i(date) j(treatment)
                        sort date
                        twoway line elec_consumption0 elec_consumption1 elec_consumption2 elec_consumption3 date, connect(L) ///
                        ytitle("Electricity consumption per day (kWh/day)") xtitle("Date")
                        From there, your R code definitely looks it's calculating the difference for each treatment group relative to the control. I'm not 100% sure that it's plotting the difference and not the means of each group, but I think it is. So:

                        Code:
                        forvalues g = 1/3 {
                        gen diff_`g' = elec_consumption`g' - elec_consumption0
                        }
                        twoway line diff1 diff2 diff3 date, connect(L) ///
                        ytitle("Electricity consumption per day (kWh/day)" "Difference vs Control Group") xtitle("Date") yline(0, lpattern(dash))
                        Your ggplot code inserts a line through the x-axis on a specific date. I'm not sure how your date variables are going to import from csv into Stata. You are likely to have to convert the dates into Stata date formats. Try

                        Code:
                        help datetime
                        And look for the bit about computing human readable format (HRF) to Stata internal format. When you convert your dates to Stata format, your code for the x-axis line could be

                        Code:
                        xline(date(05022017, "MDY"), lpattern(dash))
                        At least, I think that will work.

                        If I have mis-imagined how your data look, then again, you'll need to fake some similar data and post it, or at the very least describe your data.
                        Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                        When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                        Comment


                        • #13
                          Nick, Leonardo, Weiwen, thank you! I will try to get some data for my question...

                          Comment


                          • #14
                            Originally posted by Weiwen Ng View Post
                            I may be able to help, but this involves a guess as to the data structure. Frequently, we can't post our data, but I would nonetheless encourage people to construct some fake data that's similar in concept, or to at least describe the variables and observations for us. It helps us help you.

                            Here, it sounds like the unit of observation is household-date (i.e. each row is electric consumption for, say, a household, on a specific date). You also have a variable called treatment group. You want to plot mean electric daily consumption over treatment group. You could try something like this:

                            Code:
                            preserve
                            collapse (mean) elec_consumption, by(treatment date)
                            reshape wide elec_consumption, i(date) j(treatment)
                            sort date
                            twoway line elec_consumption0 elec_consumption1 elec_consumption2 elec_consumption3 date, connect(L) ///
                            ytitle("Electricity consumption per day (kWh/day)") xtitle("Date")
                            ....
                            Ahem. preserve means to tell Stata to memorize what your data look like. After all my manipulations, you will be drastically altering your dataset. However, when you're done plotting, type

                            Code:
                            restore
                            And Stata will go back to the dataset it had preserved in its memory. Worst case is that you can re-import the .csv file. My apologies for forgetting that!
                            Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                            When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                            Comment


                            • #15
                              Weiwen,


                              Code:
                              * Example generated by -dataex-. To install: ssc install dataex
                              clear
                              input long location str10 date str5 elec_consumption str2 treatgr
                              600001 "2017-01-01" "66.1" "0"
                              600003 "2017-01-01" "46.7" "3"
                              600004 "2017-01-01" "10.8"  "3"
                              600005 "2017-01-01" "24"   "3"
                              600006 "2017-01-01" "42.5" "2"
                              600007 "2017-01-01" "7.1"  "3"
                              600008 "2017-01-01" "41.1" "1"
                              600009 "2017-01-01" "41.4" "2"
                              600010 "2017-01-01" "96.1" "3"
                              600011 "2017-01-01" "22.1" "0"
                              600012 "2017-01-01" "31"   "1"
                              600013 "2017-01-01" "33.1" "2"
                              600014 "2017-01-01" "139"  "2"
                              600015 "2017-01-01" "44.9" "0"
                              600016 "2017-01-01" "76.9" "2"
                              600017 "2017-01-01" "34"   "0"
                              600018 "2017-01-01" "4.9"  "3"
                              600019 "2017-01-01" "27.1" "3"
                              600020 "2017-01-01" "50.5" "1"
                              600022 "2017-01-01" "47.1" "3"
                              end

                              I have data for 2017-2018 (sorted by date). As it has been said, the following R code is supposed to build an event-study figure showing average daily electricity consumption (= average consumption for the relevant treatment group minus the average consumption for the control group) by households in the treated groups (3 treatment groups, variable ‘treatgr’, treatgr=1/2/3) compared to the control group (treatgr=0) over time. The date of the treatment is May 2, 2017 (it should be the vertical line on the figure).

                              day_avg <- dat %>%
                              group_by(date, treatgrp) %>%
                              summarise(mean_consumption = mean(elec_consumption)) %>%
                              spread(key=treatgrp, value=mean_consumption) %>%
                              mutate(diff1 = `1` - `0`,
                              diff2 = `2` - `0`,
                              diff3 = `3` - `0`) %>%
                              select(date, contains("diff")) %>%
                              gather(key=treatment, value=mean_consumption,contains("diff"))

                              ggplot(day_avg,
                              aes(x=date, y=mean_consumption, colour=factor(treatment))) +
                              theme_bw() +
                              xlab("") +
                              geom_line() +
                              ylab("Electricity consumption per day (kWh/day)") +
                              geom_hline(yintercept=0) +
                              geom_vline(xintercept = as.numeric(as.Date("02-05-2017","%d-%m-%Y")), linetype="dashed")


                              I have a follow-up question on this. I guess that this figure will not actually be an event-study one... If I want to draw an event-study figure for my data (i.e., the one which typically displays point estimates from an event study regression of electricity consumption before and after the treatment), how can I do this? I have been desperately trying to find any information about an event study, but all I have found so far is related to financial data.

                              I would appreciate any help!

                              Comment

                              Working...
                              X