Thursday, May 27, 2010

Submodules and Git (yet another Tutorial)

Suppose you have a Rails project which uses git and you want to add in a plugin that also uses git. It may be that you own the plugin or it may be that you want to be able to follow the updates using git. I faced this situation this past weekend.

git submodule is a bit confusing to me and so I sat down this past weekend, created some toy projects, and poked around until I came to understand them. Here are the steps I did (after several restarts). There is a quick summary of how to remember all this at the end.

  1. First, make ourselves a play area
    mkdir temp
    
    cd temp
    /Users/pedz/temp
    
    
  2. Create a play rails project
    rails play1
          create  
          create  app/controllers
          create  app/helpers
          create  app/models
          create  app/views/layouts
          create  config/environments
          create  config/initializers
          create  config/locales
          create  db
          create  doc
          create  lib
          create  lib/tasks
          create  log
          create  public/images
          create  public/javascripts
          create  public/stylesheets
          create  script/performance
          create  test/fixtures
          create  test/functional
          create  test/integration
          create  test/performance
          create  test/unit
          create  vendor
          create  vendor/plugins
          create  tmp/sessions
          create  tmp/sockets
          create  tmp/cache
          create  tmp/pids
          create  Rakefile
          create  README
          create  app/controllers/application_controller.rb
          create  app/helpers/application_helper.rb
          create  config/database.yml
          create  config/routes.rb
          create  config/locales/en.yml
          create  db/seeds.rb
          create  config/initializers/backtrace_silencers.rb
          create  config/initializers/inflections.rb
          create  config/initializers/mime_types.rb
          create  config/initializers/new_rails_defaults.rb
          create  config/initializers/session_store.rb
          create  config/environment.rb
          create  config/boot.rb
          create  config/environments/production.rb
          create  config/environments/development.rb
          create  config/environments/test.rb
          create  script/about
          create  script/console
          create  script/dbconsole
          create  script/destroy
          create  script/generate
          create  script/runner
          create  script/server
          create  script/plugin
          create  script/performance/benchmarker
          create  script/performance/profiler
          create  test/test_helper.rb
          create  test/performance/browsing_test.rb
          create  public/404.html
          create  public/422.html
          create  public/500.html
          create  public/index.html
          create  public/favicon.ico
          create  public/robots.txt
          create  public/images/rails.png
          create  public/javascripts/prototype.js
          create  public/javascripts/effects.js
          create  public/javascripts/dragdrop.js
          create  public/javascripts/controls.js
          create  public/javascripts/application.js
          create  doc/README_FOR_APP
          create  log/server.log
          create  log/production.log
          create  log/development.log
          create  log/test.log
    
  3. We now make a git repository in the usual way.
    cd play1
    
    /Users/pedz/temp/play1
    
    echo Initial README for play1 > README 
    
    git init
    Initialized empty Git repository in /Users/pedz/temp/play1/.git/
    
    git add .
    
    git commit -a -m 'Initial commit'
    
    [master (root-commit) 25662af] Initial commit
     42 files changed, 8219 insertions(+), 0 deletions(-)
     create mode 100644 README
     create mode 100644 Rakefile
     create mode 100644 app/controllers/application_controller.rb
     create mode 100644 app/helpers/application_helper.rb
     create mode 100644 config/boot.rb
     create mode 100644 config/database.yml
     create mode 100644 config/environment.rb
     create mode 100644 config/environments/development.rb
     create mode 100644 config/environments/production.rb
     create mode 100644 config/environments/test.rb
     create mode 100644 config/initializers/backtrace_silencers.rb
     create mode 100644 config/initializers/inflections.rb
     create mode 100644 config/initializers/mime_types.rb
     create mode 100644 config/initializers/new_rails_defaults.rb
     create mode 100644 config/initializers/session_store.rb
     create mode 100644 config/locales/en.yml
     create mode 100644 config/routes.rb
     create mode 100644 db/seeds.rb
     create mode 100644 doc/README_FOR_APP
     create mode 100644 log/development.log
     create mode 100644 log/production.log
     create mode 100644 log/server.log
     create mode 100644 log/test.log
     create mode 100644 public/404.html
     create mode 100644 public/422.html
     create mode 100644 public/500.html
     create mode 100644 public/favicon.ico
     create mode 100644 public/images/rails.png
     create mode 100644 public/index.html
     create mode 100644 public/javascripts/application.js
     create mode 100644 public/javascripts/controls.js
     create mode 100644 public/javascripts/dragdrop.js
     create mode 100644 public/javascripts/effects.js
     create mode 100644 public/javascripts/prototype.js
     create mode 100644 public/robots.txt
     create mode 100755 script/about
     create mode 100755 script/console
     create mode 100755 script/dbconsole
     create mode 100755 script/destroy
     create mode 100755 script/generate
     create mode 100755 script/performance/benchmarker
     create mode 100755 script/performance/profiler
     create mode 100755 script/plugin
     create mode 100755 script/runner
     create mode 100755 script/server
     create mode 100644 test/performance/browsing_test.rb
     create mode 100644 test/test_helper.rb
    
  4. We now repeat making a play2 rails project and git repository.
    cd ..
    
    rails play2
          create  
          create  app/controllers
          create  app/helpers
          create  app/models
          create  app/views/layouts
          create  config/environments
          create  config/initializers
          create  config/locales
          create  db
          create  doc
          create  lib
          create  lib/tasks
          create  log
          create  public/images
          create  public/javascripts
          create  public/stylesheets
          create  script/performance
          create  test/fixtures
          create  test/functional
          create  test/integration
          create  test/performance
          create  test/unit
          create  vendor
          create  vendor/plugins
          create  tmp/sessions
          create  tmp/sockets
          create  tmp/cache
          create  tmp/pids
          create  Rakefile
          create  README
          create  app/controllers/application_controller.rb
          create  app/helpers/application_helper.rb
          create  config/database.yml
          create  config/routes.rb
          create  config/locales/en.yml
          create  db/seeds.rb
          create  config/initializers/backtrace_silencers.rb
          create  config/initializers/inflections.rb
          create  config/initializers/mime_types.rb
          create  config/initializers/new_rails_defaults.rb
          create  config/initializers/session_store.rb
          create  config/environment.rb
          create  config/boot.rb
          create  config/environments/production.rb
          create  config/environments/development.rb
          create  config/environments/test.rb
          create  script/about
          create  script/console
          create  script/dbconsole
          create  script/destroy
          create  script/generate
          create  script/runner
          create  script/server
          create  script/plugin
          create  script/performance/benchmarker
          create  script/performance/profiler
          create  test/test_helper.rb
          create  test/performance/browsing_test.rb
          create  public/404.html
          create  public/422.html
          create  public/500.html
          create  public/index.html
          create  public/favicon.ico
          create  public/robots.txt
          create  public/images/rails.png
          create  public/javascripts/prototype.js
          create  public/javascripts/effects.js
          create  public/javascripts/dragdrop.js
          create  public/javascripts/controls.js
          create  public/javascripts/application.js
          create  doc/README_FOR_APP
          create  log/server.log
          create  log/production.log
          create  log/development.log
          create  log/test.log
    
    
    cd play2
    /Users/pedz/temp/play2
    
    echo Initial README for play2 > README
    
    git init
    Initialized empty Git repository in /Users/pedz/temp/play2/.git/
    
    git add .
    
    git commit -a -m 'initial commit for play2'
    [master (root-commit) 64223e4] initial commit for play2
     42 files changed, 8219 insertions(+), 0 deletions(-)
     create mode 100644 README
     create mode 100644 Rakefile
     create mode 100644 app/controllers/application_controller.rb
     create mode 100644 app/helpers/application_helper.rb
     create mode 100644 config/boot.rb
     create mode 100644 config/database.yml
     create mode 100644 config/environment.rb
     create mode 100644 config/environments/development.rb
     create mode 100644 config/environments/production.rb
     create mode 100644 config/environments/test.rb
     create mode 100644 config/initializers/backtrace_silencers.rb
     create mode 100644 config/initializers/inflections.rb
     create mode 100644 config/initializers/mime_types.rb
     create mode 100644 config/initializers/new_rails_defaults.rb
     create mode 100644 config/initializers/session_store.rb
     create mode 100644 config/locales/en.yml
     create mode 100644 config/routes.rb
     create mode 100644 db/seeds.rb
     create mode 100644 doc/README_FOR_APP
     create mode 100644 log/development.log
     create mode 100644 log/production.log
     create mode 100644 log/server.log
     create mode 100644 log/test.log
     create mode 100644 public/404.html
     create mode 100644 public/422.html
     create mode 100644 public/500.html
     create mode 100644 public/favicon.ico
     create mode 100644 public/images/rails.png
     create mode 100644 public/index.html
     create mode 100644 public/javascripts/application.js
     create mode 100644 public/javascripts/controls.js
     create mode 100644 public/javascripts/dragdrop.js
     create mode 100644 public/javascripts/effects.js
     create mode 100644 public/javascripts/prototype.js
     create mode 100644 public/robots.txt
     create mode 100755 script/about
     create mode 100755 script/console
     create mode 100755 script/dbconsole
     create mode 100755 script/destroy
     create mode 100755 script/generate
     create mode 100755 script/performance/benchmarker
     create mode 100755 script/performance/profiler
     create mode 100755 script/plugin
     create mode 100755 script/runner
     create mode 100755 script/server
     create mode 100644 test/performance/browsing_test.rb
     create mode 100644 test/test_helper.rb
    
    cd ..
    
  5. We now have play1 and play2 rails projects and git repositories. We make bare git repositories so we have a nice place to push changes back to. Generally, you want bare git repositories if you are going to push changes up or back. The git repositories on github are bare. There are no working files in them.
    git clone --bare play1 play1.bare
    Initialized empty Git repository in /Users/pedz/temp/play1.bare/
    
    git clone --bare play2 play2.bare
    Initialized empty Git repository in /Users/pedz/temp/play2.bare/
    
  6. Now, we create our first working clone from play1.bare and cd into it.
    git clone -l play1.bare play1-1
    
    Initialized empty Git repository in /Users/pedz/temp/play1-1/.git/
    
    cd play1-1
    /Users/pedz/temp/play1-1
    
  7. We add play2 as a submodule. For demonstration purposes we add it as a plugin since this is going to be a common use case.
    git submodule add ~/temp/play2.bare vendor/plugins/play2
    Initialized empty Git repository in /Users/pedz/temp/play1-1/vendor/plugins/play2/.git/
    
  8. Lets look at what that did.
    git status
    # On branch master
    # Changes to be committed:
    #   (use "git reset HEAD <file>..." to unstage)
    #
    # new file:   .gitmodules
    # new file:   vendor/plugins/play2
    #
    
    We see that it created a .gitmodules file and plopped a copy of play2 into vendor/plugins/play2. The key to remember is that the submodule is tied to a particular SHA1 -- not a branch or the latest version.
    cat vendor/plugins/play2/README 
    Initial README for play2
    
    
  9. We push the change of adding play2 as a submodule into play1 back up to the play1 server.
    git add .
    
    git commit -a -m 'Added play2 as plugin submodule'
    [master b737408] Added play2 as plugin submodule
     2 files changed, 4 insertions(+), 0 deletions(-)
     create mode 100644 .gitmodules
     create mode 160000 vendor/plugins/play2
    
    git push
    
    Counting objects: 6, done.
    Delta compression using up to 2 threads.
    Compressing objects: 100% (3/3), done.
    Writing objects: 100% (5/5), 489 bytes, done.
    Total 5 (delta 1), reused 0 (delta 0)
    Unpacking objects: 100% (5/5), done.
    To /Users/pedz/temp/play1.bare
       25662af..b737408  master -> master
    
  10. Now, lets suppose someone changes play2 somewhere. We simulate this by creating a new play2 clone and modifying it.
    cd ..
    
    Create the clone.
    git clone play2.bare play2-1
    Initialized empty Git repository in /Users/pedz/temp/play2-1/.git/
    
    Go into the new copy.
    cd play2-1
    /Users/pedz/temp/play2-1
    
    Add a change.
    echo New Line Added to Play2 README >> README 
    
    Review the change.
    cat README
    
    Initial README for play2
    New Line Added to Play2 README
    
    See what git status is ...
    git status
    # On branch master
    # Changed but not updated:
    #   (use "git add <file>..." to update what will be committed)
    #   (use "git checkout -- <file>..." to discard changes in working directory)
    #
    # modified:   README
    #
    no changes added to commit (use "git add" and/or "git commit -a")
    
    Add and push the changes back to the play2 server
    git add .
    
    git commit -a -m 'Change added via play2-1'
    [master 4cc3d8a] Change added via play2-1
     1 files changed, 1 insertions(+), 0 deletions(-)
    
    git push
    Counting objects: 5, done.
    Delta compression using up to 2 threads.
    Compressing objects: 100% (3/3), done.
    Writing objects: 100% (3/3), 338 bytes, done.
    Total 3 (delta 1), reused 0 (delta 0)
    Unpacking objects: 100% (3/3), done.
    To /Users/pedz/temp/play2.bare
       64223e4..4cc3d8a  master -> master
    
    Review the log just so we can see it. Note the SHA1. We'll see it later in this tutorial.
    git log
    commit 4cc3d8afffe9ef12d654eeaa5775c3c1f44bd96f
    Author: Perry Smith <pedz@newtoy.easesoftware.com>
    Date:   Sun Nov 15 09:27:47 2009 -0600
    
        Change added via play2-1
    
    commit 64223e4e06889a538e6adaf4604b44d5a6e50c98
    Author: Perry Smith <pedz@newtoy.easesoftware.com>
    Date:   Sun Nov 15 09:24:07 2009 -0600
    
        initial commit for play2
    
    
  11. We notice that these changes to play2 are not seen by the play1 project or clones based on play1. As far as play1 knows, it is pointing to the valid play2 submodule. This makes sense since we have not really tested the new changes of play2 in the play1 project.
    cd ../play1-1
    
    git status
    # On branch master
    nothing to commit (working directory clean)
    
    git pull
    
    Already up-to-date.
    
    git fetch
    
    Note that the README for play2 has not changed.
    cat vendor/plugins/play2/README 
    Initial README for play2
    
  12. In fact, new clonse of play1 still refer back to the point that play2 was added as a submodule. Lets see...
    Create a fresh clone of play1
    cd ..
    
    git clone play1.bare play1-2
    Initialized empty Git repository in /Users/pedz/temp/play1-2/.git/
    
    cd play1-2
    /Users/pedz/temp/play1-2
    
  13. First, a new clone that has submodules does not pull the submodules over and populate them. Note that we have a play2 directory but nothing beneath it.
    cat vendor/plugins/play2/README
    cat: vendor/plugins/play2/README: No such file or directory
    
  14. To get the submodules over, we do a two step process of init and update.
    git submodule init
    Submodule 'vendor/plugins/play2' (/Users/pedz/temp/play2.bare) registered for path 'vendor/plugins/play2'
    
    git submodule update
    Initialized empty Git repository in /Users/pedz/temp/play1-2/vendor/plugins/play2/.git/
    Submodule path 'vendor/plugins/play2': checked out '64223e4e06889a538e6adaf4604b44d5a6e50c98'
    
  15. Now we have play2/README but notice is it the original. This is because play1, as a project, is using the original play2. Not until the play1 project is updated will that change.
    cat vendor/plugins/play2/README
    Initial README for play2
    
  16. We get the play1 project to use a new version of play2 by getting in a play1 clone, cd-ing down to the submodule's directory, pulling over the version we want, and pushing that change back to the play1 server. Lets see...
    First, go to a play1 sandbox and go down to the play2 directory.
    cd ..
    
    cd play1-2
    /Users/pedz/temp/play1-2
    
    cd vendor/plugins/play2/
    /Users/pedz/temp/play1-2/vendor/plugins/play2
    
  17. Lets look at what this looks like. Notice that the git branch is detached.
    git branch
    * (no branch)
      master
    
  18. We checkout the master branch of play2 while we are in the play2 directory. In this example, we do not add more changes but we could. If we did, we would have to push those back up to the play2 server. But to keep it simple, lets suppose we just want to update our play2 submodule to the latest. We do this by checking out the master branch.
    git checkout master
    Previous HEAD position was 64223e4... initial commit for play2
    Switched to branch 'master'
    
  19. Lets see what that did in the play2 directory...
    git branch
    * master
    
    It now looks like a normal clone. And we see we have the latest readme.
    cat README
    Initial README for play2
    New Line Added to Play2 README
    
    git status says everything is clean.
    git status
    # On branch master
    nothing to commit (working directory clean)
    
  20. Now lets see what it looks like from the play1 main project directory.
    cd ../../..
    
    pwd
    /Users/pedz/temp/play1-2
    
    
    git status
    # On branch master
    # Changed but not updated:
    #   (use "git add <file>..." to update what will be committed)
    #   (use "git checkout -- <file>..." to discard changes in working directory)
    #
    # modified:   vendor/plugins/play2
    #
    no changes added to commit (use "git add" and/or "git commit -a")
    
    We see that the play1 project now has changes.
  21. We push those changes back to the play1 server.
    git add .
    
    
    git commit -a -m 'Pulled in new version of play2'
    [master 06b82e0] Pulled in new version of play2
     1 files changed, 1 insertions(+), 1 deletions(-)
    
    git push
    Counting objects: 7, done.
    Delta compression using up to 2 threads.
    Compressing objects: 100% (2/2), done.
    Writing objects: 100% (4/4), 351 bytes, done.
    Total 4 (delta 1), reused 0 (delta 0)
    Unpacking objects: 100% (4/4), done.
    To /Users/pedz/temp/play1.bare
       b737408..06b82e0  master -> master
    
    The submodule status points to the new version
    git submodule status
     4cc3d8afffe9ef12d654eeaa5775c3c1f44bd96f vendor/plugins/play2 (heads/master)
    
    
    To recap, while inside a play1 clone, we pulled over the latest play2 version and pushed that change (the move of play1 to the latest play2 version) back to the server.
  22. Lets see what a new play1 clone looks like.
    cd ..
    
    Make the clone.
    git clone play1.bare play1-3
    Initialized empty Git repository in /Users/pedz/temp/play1-3/.git/
    
    
    cd play1-3
    /Users/pedz/temp/play1-3
    
    Remember, making the clone does not populate the submodules.
    cat vendor/plugins/play2/README
    cat: vendor/plugins/play2/README: No such file or directory
    
    We can combine the two steps to populate the submodules into one with update using the --init option.
    git submodule update --init
    
    Submodule 'vendor/plugins/play2' (/Users/pedz/temp/play2.bare) registered for path 'vendor/plugins/play2'
    Initialized empty Git repository in /Users/pedz/temp/play1-3/vendor/plugins/play2/.git/
    Submodule path 'vendor/plugins/play2': checked out '4cc3d8afffe9ef12d654eeaa5775c3c1f44bd96f'
    
    We now see that a fresh clone of play1 has the latest version of the play2 submodule.
    cat vendor/plugins/play2/README
    Initial README for play2
    New Line Added to Play2 README
    
    pwd
    /Users/pedz/temp/play1-3
    
  23. What about the other clones? Have they changed?
    Lets go back to our first clone; the one we used to add the play2 submodule.
    cd play1-1
    /Users/pedz/temp/play1-1
    
    git status of that clone is clean.
    git status
    # On branch master
    nothing to commit (working directory clean)
    
    I probably should have done this before. In the clone where we added the submodule, we need to also do the submodule init and update steps to get all the files in the proper state. We discover this when we do a submodule status. Note the leading minus sign.
    
    git submodule status
    -64223e4e06889a538e6adaf4604b44d5a6e50c98 vendor/plugins/play2
    
    Lets do that now.
    git submodule update --init
    Submodule 'vendor/plugins/play2' (/Users/pedz/temp/play2.bare) registered for path 'vendor/plugins/play2'
    
    But notice that we are pointing back to the original SHA1.
    git submodule status
    
     64223e4e06889a538e6adaf4604b44d5a6e50c98 vendor/plugins/play2 (heads/master)
    
    That is because this clone has not been updated.
    git status
    # On branch master
    nothing to commit (working directory clean)
    
    cat vendor/plugins/play2/README 
    Initial README for play2
    
    Lets update this clone and then look around.
    git pull
    remote: Counting objects: 7, done.
    remote: Compressing objects: 100% (2/2), done.
    remote: Total 4 (delta 1), reused 0 (delta 0)
    Unpacking objects: 100% (4/4), done.
    From /Users/pedz/temp/play1.bare
       b737408..06b82e0  master     -> origin/master
    Updating b737408..06b82e0
    Fast forward
     vendor/plugins/play2 |    2 +-
     1 files changed, 1 insertions(+), 1 deletions(-)
    
    Note that git status sees a change in play2. That is because we have not actually pulled over the changes to play2. The play1 project says we should be on a different SHA1 of play2 but we are still at the original.
    git status
    # On branch master
    # Changed but not updated:
    #   (use "git add <file>..." to update what will be committed)
    #   (use "git checkout -- <file>..." to discard changes in working directory)
    #
    # modified:   vendor/plugins/play2
    #
    no changes added to commit (use "git add" and/or "git commit -a")
    
    We can further verify this with submodule status. Again, note the leading character. This time its a plus sign.
    
    git submodule status
    +64223e4e06889a538e6adaf4604b44d5a6e50c98 vendor/plugins/play2 (heads/master)
    
    The plus sign says that we have not updated our submodules. We check and see that this is true.
    cat vendor/plugins/play2/README 
    Initial README for play2
    
  24. We pull over the proper version of the submodules using submodule update without the init option because this is not the first time.
    
    git submodule update
    remote: Counting objects: 5, done.
    remote: Compressing objects: 100% (3/3), done.
    remote: Total 3 (delta 1), reused 0 (delta 0)
    Unpacking objects: 100% (3/3), done.
    From /Users/pedz/temp/play2.bare
       64223e4..4cc3d8a  master     -> origin/master
    Submodule path 'vendor/plugins/play2': checked out '4cc3d8afffe9ef12d654eeaa5775c3c1f44bd96f'
    
    We now see that we have the new README
    cat vendor/plugins/play2/README 
    Initial README for play2
    New Line Added to Play2 README
    
    And we have the proper SHA1. Note the first character is blank
    git submodule status
     4cc3d8afffe9ef12d654eeaa5775c3c1f44bd96f vendor/plugins/play2 (remotes/origin/HEAD)
    
    

We can see why so many people get frustrated with submodules because there are a lot of steps. But, lets drop back five yards and look at this.

The key point is that plugins are usually stable and do not change that often. They are also updated by other people. A particular project we are working on does not want to always follow the leading edge of those repositories. This boils down to two somewhat easy processes, each with only a few steps.

Adding submodules to an existing project
  1. Add the submodule
    1. git submodule add <repo> <dir>
  2. Commit and push those changes
    1. git add .
    2. git commit -a -m Add Submodule
    3. git push
  3. Initialize submodule git files to be in the proper state
    1. git submodule init
Updating submodules to the latest version
There are two choices here:
We move our clone up and publish it
  1. Change directory to the submodule
    1. cd vendor/plugins/...
  2. Change that directory to the desired version
    1. git checkout master
  3. Normally, we would run the test bucket here before committing and publishing the changes.
  4. Add, commit, and push that change for the main project
    1. cd ../../..
    2. git add .
    3. git commit -m Moved to latest of submodule
    4. git push
Some else moves the project to a new version and we need to follow
  1. Get the latest version
    1. git pull
  2. Get the submodules
    1. git submodule update

No comments:

Post a Comment