Categories
Tips

Managing software project dependencies with git submodules

Rarely any software project today is built from the ground up. Frameworks and libraries have made developers’ lives so easy that there’s no need to reinvent the wheel anymore when it comes to software development. But these frameworks and libraries become dependencies of our projects, and as the software grows in complexity over time, it can become pretty challenging to efficiently manage these dependencies. 

Sooner than later developers can find their code depending on software projects of other developers which are either open source, hosted online or being developed in-house, in maybe another department of the organisation. These dependencies are also evolving, and need  to be updated and in sync with your main source tree. This ensures that a small change breaks nothing and your project is not outdated and does not have  any known security vulnerability or bugs.

A good recent example of this is log4j, a popular framework for logging, initially released in 1999, which became a huge headache for many businesses at the end of 2021, including Apple, Microsoft and VMware. log4jt was a dependency in a variety of software and the vulnerabilities discovered affected all of them. This is a classic example of how dependencies play a huge role in software lifecycle and why managing them efficiently becomes important. 

While there are a variety of ways and frameworks to manage software dependencies, depending on software complexity, today I’ll cover one of the most common and easy to use methods called “git submodule”:. As the name suggests it is built right into git itself, which is the de facto version control system for the majority of software projects.

Hands-on with git submodules:

Let us assume your project name “hello-world” depends on an open source library called “print”.

A not-so-great way to manage the project is to clone the “print” library code and push it alongside the “Hello World” code tree to GitHub (or any version control server). This works and everything runs as expected. But what happens when the author of “print” makes some changes to its code or fixes a bug?Since you’ve used your own local copy of print and there is no tracking to the upstream project, you won’t be able to get these new changes in, therefore you need to manually patch it yourself or re-fetch and push the code once again. Is this the best way of doing it, one may ask?

git has this feature baked in which allows you to add other git repos (dependencies projects) as submodules. This means your project will follow a modular approach and you can update the submodules, independent of your main project. You can add as many submodules in your project as you want and assign rules such as “where to fetch it from” and “where to store the code once it is fetched”. This obviously works if you use git for your software project version control.

Let’s see this in action:

So I’ve created a new git project namely “hello-world” on my GitHub account, which has two directories:

src – where my main source code is stored

lib – where all the libraries a.k.a dependencies are stored which my source code is using.

These libraries are hosted on GitHub by their maintainers as independent projects. For this example, I’m using two libraries.

  1. hello – which is also created by me as a separate github repo 
  2. resources –  which is another git repository in Developer Nation account

To add these two above-mentioned libraries as submodules to my project, let’s open the terminal, change to the main project directory where I want them to be located. In this case, I want them in my lib directory, so I’ll execute the following commands:

cd hello-world/lib

Add submodule with command : git submodule add <link to repo>

git submodule add git@github.com:iayanpahwa/print.gitgit submodule add git@github.com:devnationworld/resources.git

This will fetch the source code of these libraries and save them in your lib folder. Also, now you’ll find a new hidden file created in root of your main project directory with name .gitmodules which has the following meta-data:

```
[submodule "lib/print"]
path = lib/print
url = git@github.com:iayanpahwa/print.git
[submodule "lib/resources"]
path = lib/resources
url = git@github.com:devnationworld/resources.git
```

This tells git about :

  • submodules use in this project 
  • where to fetch them from
  • where to store them

Now every time someone clones the project, they can separately clone the submodule using following commands:

git clone < Your project URL >
cd <Your project URL>
git submodule init 
git submodule update 

OR:

This can also be done in one command as:

git clone <Your Project URL> —recursive, in this case

git clone git@github.com:iayanpahwa/hello-world.git —recursive

One more thing you’ll notice on GitHub project repo is in lib directory, folders are named as :

print @ fa3f …

resources @ c22

The hash after @ denotes the last commit from where print and resources libraries were fetched. This is a very powerful feature as by default, the submodule will be fetched from the latest commit available upstream i.e HEAD of master branch, but you can fetch from different branches as well. More details and options can be found on the official doc here.

Now you can track and update dependency projects independent of your main source tree. One thing to note is all your dependencies need not to be on the same hosting site as long as they’re using git. For example: If hello-world was hosted on Github and printed on Gitlab, the git submodule will still work the same.

I hope this was a useful tutorial and you can now leverage git submodules to better manage your project dependencies. If you have any questions and ideas for more blogs, I’d love to hear from you in the comments below.