Branching and Merging Strategy in Git for SSIS Projects
We have a team of multiple developers working on the same Integration Services project. The project is integrated into Git for source control, but from time to time conflicts occurs due to developers working on the same object. How can we avoid this?
Git is a free and open-source distributed version control system. For an introduction on Git, please refer to the following tips:
- Getting Started with GitHub for SQL Server Developers
- Continuous SQL Server Database Integration with Visual Studio and Azure DevOps
Microsoft offers Git workspaces through Github and Azure DevOps. Both have free and paying tiers. For an introduction on how to add Integration Services (SSIS) projects to Azure Devops, refer to the tip How to Add SQL Server Integration Services Projects to Azure Devops for Version Control. I recommend reading that tip before continuing with this one.
Setting up Branches
When working with Git you can create multiple branches. A branch is a pointer to a snapshot of changes and can be used as a line of development. For example, if you want to add a new feature to your product, start a new branch and work on this feature in the branch alone. This way, your development work doesn't interfere with the work of others. Once your development is done, merge the branch back into the original branch where you can run integration tests.
There are many strategies to branching which are too far outside the scope of this tip to discuss. For the simplicity of this tip, we focus on the two main branches:
- Main - Typically a new git repo has the branch master, but you can rename it to main using the steps explained in this blog post. You could look at the main branch as it would represent production.
- Dev - This branch represents all the development work. When a release happens, the commits from the dev branch are deployed to the main branch.
When you only have the main branch, you can create the dev branch from within Visual Studio.
In Team Explorer, go to Branches and select to create a new branch:
Name the new branch Dev and check it out. This will create the branch in the local repository. To create it on the remote repository as well (the origin repo corresponds with the repo in Azure DevOps), right-click the dev branch and choose Push Branch.
The dev branch is now also present on the remote which means other developers will be able to pull this branch onto their machines. Tip: you can see which branch is currently checked out in your local repo by checking which branch name is displayed in bold. You can also see the branch name that is displayed between brackets next to the repo name.
The key to avoiding conflict (developers working on the same object and making conflicting changes) is to use proper branching. However, it's not always possible to avoid conflict. When you merge two branches together and a conflict arises, you can perform merge conflict resolution where you manually inspect the conflicting file and decide which changes are applied and which are not. In theory, you could merge any file. With SSIS there are some issues:
- Behind the scenes SSIS files are described using XML. Even a small change (e.g. dragging a task to another location on the canvas) can result in many lines changing in the XML. Script tasks/components also add extra complexity to the XML. Over the years the SSIS team has tried to simplify the XML, but it is still not always simple to resolve a merge conflict.
- Other factors can cause the XML to change even though no actual change was made. For example, if developers use different versions of Visual Studio this can have an impact. It can result in git flagging the package as changed. This leads to "false positives".
The easiest way to deal with this is to make sure there is as little conflict as possible. This can be done by following some guidelines:
- Developers don't work on the same package at the same time. This requires some communication between the different team members.
- Create small, modular packages. This means a package should have only one task, e.g. update one single dimension or load data into one staging table. Giant packages doing many things are not only more difficult to maintain, they also lead to more conflicts as more logic is encapsulated in one package. Smaller packages make it easier to follow the first rule.
- Don't let branches live for a long time. Once the development task is done and tested, it should be merged into the dev branch and the branch should be deleted. If you keep branches around for too long, the possibility for conflicts rises as the branch starts lagging behind other branches. If too many commits need to be merged, it becomes harder to manage.
There are some files in an SSIS project where it's possible to have regular conflicts: project connection managers and parameters and the SSIS project file itself (this ends in the extension .dtproj). However, these files are much simpler and easier to merge.
A typical workflow could look like this:
- Two changes need to be made to the project, each change on a different package. Two different developers will make the changes.
- Each developer creates a new branch from the dev branch.
- They implement their changes, commit them, and test them locally.
- Once a developer feels the work is done, the branch is merged to the dev branch and deleted. Everything that is merged to the dev branch should be ready to be deployed to the development/test server.
- On the development/test server, integration tests can be run to check the integration of the new code with the existing code.
Graphically, the process looks like this:
Both branches are created from the same commit in the dev branch (the top blue circle). You can look at a commit as a snapshot of the state of the repo saved at a point in time. On branch 1, the developer commits three times before merging the branch back in the dev branch. The merging process creates a new commit on the dev branch. The other developer creates 4 commits and then merges branch #2 in the dev branch, again creating a new commit. It's important to notice that work created by either developer is not visible for the other developer. They work isolated in their own branch. At the third commit in the dev branch, the work of both developers is available for testing.
If both developers modified the same file, it's possible the last commit on the dev branch results into a conflict (if Git cannot resolve the merge automatically). In the second part of this tip, we'll look at how we can resolve such conflicts.
- More tips on how to use branching with Git:
- Continuous database deployments with Azure DevOps
About the author
View all my tips
Article Last Updated: 2020-12-21