Version Control And Deployments
January 24, 2021
Version control systems are at the center of software development these days. Version control lets you track any changes made to the codebase, recreate any revision of your software from any given moment of time, and allows developers to work independently (and in parallel) on the same codebase. I encourage everyone unfamiliar with what version control is to go over documentation of one the most used VCS tool out there: GIT
For purpose of this write up I will focus on Git, although the broader concepts should apply to all VCS tools.
This is another incoherent thought dump - so if you are interested in particular topics feel free to jump ahead
A Primer On Git
When you do a git init
you mark that folder as a Git repository. Every item within that folder will be marked as a part of that repository - which means changes to every single file in that folder will be tracked within the git tool. You can explicitly mark items within a folder as untrackable. Anything you add to .gitignore
file will not be tracked.
If you are aware of how state machines work, you might be able to understand Git better in terms of a state machine. A git repository is like a state machine where each change to the state of the repository is tracked. When you do a git init
, you create a new immutable linked list to keep track of all changes you make within that folder.
Branches
A git branch is a series of commits. Think of it as an immutable linked list, with each node representing the change made to the repo (could be any change made to any number of files, as long as that file was being tracked) - and the head
points to the tip of that list. From any node of that list, you can spawn a new branch (or a new linked list!!)
Git has a main branch that is everlasting. You can name it whatever you want but the most commonly used term these days is the main
branch. Beyond that, it can have as many branches as you want. You can merge branches onto each other, you can rebase
one branch on top of another etc. etc.
Commits
A commit is a unit of change you apply to the repo. In terms of a linked list, a new node listing all the changes you made to the repository.
Origins, Upstreams, And Local Dev
Git is a distributed version control system, so everyone working on a repo gets to keep a copy of it. There is also a remote location where your git repo is stored, a centralized server that hosts your repositories - some commonly used cloud options are github and bitbucket.
In such a setup origin
is the remote location of where the repository exists and until you explicitly push your branch or your commit to origin, your code changes will never reflect or be shared with others
Github lets you fork
a repository, which creates a copy of the repository under your own personal github account. Within that fork you can treat it as a separate repo and diverge from the parent, with your own branching strategies. However, more often than not, forks are created to contribute to the main repo. When you fork a repo you can continue to work under the new forked repo on your personal account, but when you are ready to push your changes to the main fork, you will have to merge your changes to it. This is where upstream
comes into place. You can define the main repository you forked from as an upstream
and then use simple git commands to sync your fork
To fetch changes from the main repo
git fetch upstream
and then rebase
git rebase upstream
If you add an upstream and do a git remote -v
you will see what origin and upstream point to. If you want any other fork of that repo to pull from, or rebase on you can add that by simple doing git remote add some_random_string some_random_url
Git Branching Strategies
I am a big advocate of the git flow - which was originally proposed here
TL;DR: It is good to keep your main branch separate from your integration branch. Your main
branch HEAD
should ideally point to your current production revision. Developers should pull and push from the integration
branch and once ready for production should be merged to the main
branch. In cases of production hotfixes, you can create a hotfix off of the main
branch as it always should be in production ready state, and merge back the hotfix
branch to main
and the integration
branches.
People might have differing opinions on this but here are the best practices when working with Git, per my experience:
Always Rebase: Always prefer rebasing over merge - it keeps your git history much cleaner. Although, beware that once you push your commit to remote, you branch history will have diverged from the branch you branched out from and will be unable to rebase. In that case you have two options: 1) if your branch is truly yours, and not shared with anyone else, you can rebase your local branch and force push it to remote 2) keep your remote as is, and merge it into main branch instead of rebasing
Squash And Merge: I am a big advocate of squashing your commits and then merging, it keeps master history much cleaner.
Good Commit Message: This one everyone agrees with yet rarely is practiced, a good commit message is a life saver when going through a git history.
Commit Often: I have been burnt too many times to not commit my changes often - that is the only way your local changes would be tracked by git. If you are worried about adding too many commits (which you shouldn’t be if you squash and merge) - you can always squash your local commits later too.
Publish When Ready: Unless you are working on a shared branch (which you shouldn’t be!!) - there is really no reason to publish your changes with every commit. Once published, your remote will diverge from the base branch and rebasing will become a pain. Publish your branch when you are ready.
Branch Out: Create a new branch for your own work and only publish it when it is ready to be merged with the base of that branch.
Tag Your Releases: Tags are usually used to mark commits that were released with a particular version (allows easier rollbacks, help point to snapshot of code that was released) - such tags should never be deleted and only incremented. Hopefully this was an obvious one.
This is not an extensive list - but one handy for day to day development I hope
Continuous Integration And Continuous Deployment
Integration branch should always be stable - it is the branch everyone will work off of so its imperative that it stays in a healthy state to not slow down development across teams. Which is why the integration branch should be tested and deployed with every commit. Tools these days let us run pre-merge automated tests to ensure that a bad commit does not merged into an integration branch, post-merge jobs can ensure code works when deployed in an integrated environment. Every successful build should be merged to master, ideally with continuous deployment, tagged and deployed to production.
- Integrate Automated Review Tools: Tools like sonarqube for automated reviews and linting as part of pre-merge checks
- Use Unit Test Coverage: While test coverage is always a good thing, we have to be careful to not add unneeded complexity of what to mock and when. There are ways to smart about this, which varies from one programming language to another
- Integration Tests Should Be A Part Of Pre-Merge Checks: Self-contained integration tests can ensure our application can handle integration with external resources.
- Run Daily Builds (at the least): Daily builds with full test coverage helps us stay ahead of any issues.
- Automated And Frequent Deploys: Deployments should be automated and frequent. Everyone in the team should feel comfortable doing deploys, rollbacks and hotfixes