Chapter 10
Git

Collaborative work
Reading time : 5 minutes


Who the *!#@? deleted my code??? <a dev from the year 2000>

How do thousands of developers manage to work on the same projects without stepping on each other's toes?

Imagine 10 people in a room. Each one receives a copy of a 500-line speech. Isolated in their corners, they have to correct the speech and rephrase parts of it. In the end, the work of everyone must be combined to generate a single speech with a consistent tone, that makes sense, and that works well.

It's an impossible mission without a tool that allows you to:

  • See additions and deletions easily.
  • Keep track of the different versions of the text.
  • Combine elements while allowing conflicts to be resolved: when two people modify the same line.

Code is nothing more than text. Developers face this problem daily. The larger the team, the greater the need for control.

Every request for a new feature can lead to a conflict with another ongoing development. In small teams, we try to defuse the situation as much as possible in advance by defining a target architecture and steps to achieve it. Often, we even try to eliminate the problem entirely by negotiating a delay on one of the features.

But some conflicts are inevitable!

Git

In 1972, to meet all these needs, Marc Rochkind, an engineer at Bell Labs, created the Source Code Control System (SCCS) software.
Others followed, each bringing a different vision and their own set of features: work speed, data compression, cost, version history security, decentralization of repositories, etc.

Unsatisfied with the existing solutions for Linux kernel development, Linus Torvalds launched the creation of Git in 2005. It has since become the undisputed leader in version control. It is natively integrated into dozens of software tools, bringing even more features, including graphical interfaces that facilitate visualization. (For now, we’ll stick with the command line; otherwise, it's no fun 🤭.)

Git is available for all operating systems. After installation, you just need to go to an existing folder and transform it into a Git repository. A repository is a space managed by Git that will keep track of all changes.

git init

Git is a decentralized system, meaning that everyone has their own repository on their machine with a complete copy of the entire history. You can work without needing to connect to a centralized repository. In practice, in a company, there is almost always a central repository for sharing changes between developers.

It is mainly used for text files, but it works on any type of file. For example, if you are a graphic designer, you can version your PNG files.

You can do crazy things with Git, but the basics are very simple and boil down to a few main actions.
Let’s jump straight into an example: the speech to be modified is composed of 2 files, the body and the conclusion.

body.txt
What do you call a hamster in space?

A hamsteroid 😂.
conclusion.txt
Bisoux <3

There is a mistake; I want to correct it. I replace "Bisoux" with "Bisous".

1 - I make my modification in Word or any other text editor.

2 - I add the file to the list of changes I want to save.

git add conclusion.txt

3 - I commit this version of the changes. Basically, I save the current state of my local repository in my Git history.

git commit -m "A message to explain the change"

4 - I push this commit (this version) to the central repository so that my change is accessible to my collaborators.

git push

5 - My collaborators have also worked on the files. When pushing, Git indicates that there has been a change and that they need to update before they can share their work.

git pull

This is where the trouble begins. Either, by chance, we haven't modified the same lines, and Git automatically assembles the different parts. Or, we find ourselves in an unstable state with a conflict that will need to be resolved manually. It's easy when it's just a word like here, much harder when it involves hundreds of lines 😅.

6 - Everyone can browse the history and analyze the different versions.

git log

Branches

I have a feature to develop. It will take me several days, during which my work will be unstable and unfinished. During this time, I might also have an urgent issue, something that needs to be quickly fixed in the version of the code currently in production. In this situation, I would need to start with a clean base and apply just the fix. I thus need to work in parallel on multiple versions of the code.

This is where the concept of "branches" comes in.

You've probably seen a movie with parallel universes. A point in time creates a divergence, leading to two realities. Creating a branch in Git is like creating a divergence in history. The two branches can evolve completely differently.

git checkout -b my_new_branch
--(commit 1)------(commit 2A)---
              \
               \____(commit 2B)___

The community has established methodologies to standardize practices regarding branch management. The main one is GitFlow.

In a simplified version:

  • The main branch, "master," represents the current state of production.
  • When a developer starts a feature, they create a dedicated branch.
  • If they need to address an urgent issue, they can revert to master and create a third branch dedicated to the fix.

The real GitFlow has a few additional subtleties, which you can explore on your own. Here, I will keep it simple by only talking about master.

At any time, the developer can check the differences between master and their branch:

git diff master

Once the work is finished, these differences can be applied to master. This means they are ready to be deployed in the next production release. This is called a "merge."

git checkout master
git merge my_new_branch

Imagine trying to combine two universes with different histories. In one, Zidane missed his Panenka. Obviously, it doesn't go well; there will be conflicts. You have to rewrite the version of the story you want to become reality. Once the conflicts are resolved, this new history is ready to be pushed for everyone.

One of the major problems with GitFlow is that branches last too long and often cause conflicts.

More and more developers, including myself, are adopting "trunk-based development." The goal is to shorten the life of branches as much as possible so that developments are made available on the main trunk (master) as quickly as possible. In this approach, we emphasize the iterative nature of development and accept deploying unfinished features to production, as long as they don't negatively impact the customer experience.

How do we ensure the quality of deliveries?

GitHub / GitLab

In any serious project, code does not go into production and is not even mergeable to master without going through a series of checks: automated tests, syntax verification, quality analysis. This is called a Continuous Integration (CI) pipeline.

Different pipelines are automatically triggered depending on the branch's lifecycle on the central repository: creation, merge request with master, executed merge.

In advanced stages, pipelines can include automatic code delivery to pre-production or even production environments. This is called Continuous Delivery (CD).

Tools, the most well-known being GitHub and GitLab, are used to host projects using Git and organize post-development phases: CI/CD pipeline configuration and execution, better history visualization, and code review.

Code Review

The developer pushes their branch to the central repository. They log in to GitHub or GitLab and open a "merge request." This means they are requesting their code to be merged into master. On GitHub, this merge request is called a "Pull Request" (PR) and on GitLab, it's called a "Merge Request" (MR).

We saw that opening a merge request automatically triggers the execution of CI pipelines. This is also when the dreaded code review step occurs!

Code review is the step where other developers on the team will study what you have produced. In GitHub and GitLab, the graphical interface displays the differences between the two branches and allows comments to be added. Code review is not an exact science; everyone has their own way of organizing it: pair reviews, optional vs. mandatory, workshop organization, etc.

A poorly organized review can quickly turn into an ego war and create tensions 💣.