Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stata to GitHub integration - collaborating using shared files

    Hi Everyone,

    Our team uses Stata and collaborates on coding using SharePoint. i.e., we work on the same files in a shared folder. We are exploring using Stata to GitHub integration to improve version control and collaboration.

    Does anyone know if there is a way for multiple users to work in shared folders without having to each create a new local folder when initialising/cloning a repository? We want to continue to have a single folder with all our code and output and track changes to those files made by different users.

    I'd also be interested to know if others use other version control software to circumvent this issue.

    Thanks in advance!

    Elise
    Last edited by Elise Gordon; 16 Aug 2023, 03:28.

  • #2
    I am not an expert but keeping a GitHub repo in a shared folder seems contrary to the principles of git and likely to lead to all sorts of confusion, corrupted files, etc.

    At the risk of some self-promotion, my co-authors and I discuss integrating version control with filesharing in our recent Stata Journal paper:

    Guiteras, Raymond, Ahnjeong Kim, Brian Quistorff and Clayson Shumway, "statacons: An SCons-based build tool for Stata," The Stata Journal, 23(1):149-196, March 2023, doi:10.1177/1536867X231162032. Final pre-publication draft: https://osf.io/preprints/metaarxiv/qesx6/download.

    See especially Section 7. The example in the paper uses Dropbox but the principles should be similar.

    We provide a worked example on our project wiki page here:
    https://github.com/bquistorff/statac...he-SCons-cache

    Comment


    • #3
      I'm also not expert in Git, though I am somewhat familiar. Most of my projects these days don't require collaboration with others though I use Git for my own work.

      I have to agree with Raymond that keeping the source repo on a shared drive has many risks. If you don't want to host on a service like Github, that's okay, but you'll need to work out those details with your IT dept (or roll your own server).

      Not having a local copy of a repo is antithetical to the git paradigm. The whole reasons collaborative development works (with git, at least) is that everyone has a local copy which they work with. Only when changes are ready to share would you initiate a pull request, then once approved, it gets incorporated to the central source repo. The entire change history for whatever gets committed is retained in the source, and downloaded when the repo is cloned to a local drive.

      Comment


      • #4
        Thanks Raymond and Leonardo, I appreciate the advice and will take a look at your paper, Raymond.

        Comment


        • #5
          Hey Elise Gordon, did you get the help you needed on this? Struggling with the same, too.

          Comment


          • #6
            If you are using Github, then Github is the place where you "continue to have a single folder with all our code and output and track changes". However, each developer will get a copy of all the code and changes when they update their copy of the repository. So it might seem to be a lot of duplication - perhaps the extra disk space is what you object too? I don't have an answer. Git is intended for very dectralized development.

            Comment


            • #7
              Originally posted by Daniel Feenberg View Post
              If you are using Github, then Github is the place where you "continue to have a single folder with all our code and output and track changes". However, each developer will get a copy of all the code and changes when they update their copy of the repository. So it might seem to be a lot of duplication - perhaps the extra disk space is what you object too? I don't have an answer. Git is intended for very dectralized development.
              I have found that if I keep outputs on GitHub I run into storage space limits very quickly. This is especially the case when the outputs are binary files (e.g., dta, pdf, etc), since you are "charged" the entire size of the file each time it changes under version control.

              However, if you have just a few relatively small outputs and do not need to update them many times then perhaps you will not find this to be a constraint.

              I briefly looked into GitHub's LFS (large file storage) but did not find it very useful in my case, although this was a long time ago and I don't remember why it was not a great solution for me.

              Comment

              Working...
              X