I'm trying to set up a pre-commit hook for formatting code, that would format files and include changes in the commit. Several scripts that say they do this, but the ones I tried have the same problem: they leave files "half staged".

See for example this script. It properly adds files after modifying them, and says it should work on Windows. The fact that hooks don't work for me when they work for other people leads me to believe that something is up with my environment.

This happens when the hook modifies a file with a superfluous line break:

$ git status -s
A  src/hello.c

$ git commit src/hello.c

Add 'Hello World!'
# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
# On branch master
# Changes to be committed:
#   new file:   src/hello.c
# Changes not staged for commit:
#   modified:   src/hello.c

$ git status
On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

        modified:   src/hello.c

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        modified:   src/hello.c

$ git diff
warning: LF will be replaced by CRLF in src/hello.c.
The file will have its original line endings in your working directory
diff --git a/src/hello.c b/src/hello.c
index 5e4b595..768d31a 100644
--- a/src/hello.c
+++ b/src/hello.c
@@ -1,6 +1,5 @@
 #include <stdio.h>

 int main() {
   printf("Hello, World!");
   return 0;

$ git diff --staged
diff --git a/src/hello.c b/src/hello.c
index 768d31a..5e4b595 100644
--- a/src/hello.c
+++ b/src/hello.c
@@ -1,5 +1,6 @@
 #include <stdio.h>

 int main() {
   printf("Hello, World!");
   return 0;

I would have expected the hook to leave a clean index. Instead it leaves the file staged without modification, but also leaving the file itself modified. Why does this behavior occur and how can I make it stop?


Warning: this answer is kind of long, but that's because it's really about all the pitfalls of this sort of pre-commit hook. There are several, and it gets complicated in complex cases.

You didn't show the hook directly but you did have a link to the link to the GitHub repository containing the hook; here's a more direct link to the hook itself). I will quote a few lines from the hook.

The hook makes some rather brash assumptions, because when you run git commit, there are at least three of what I like to call "active copies" of each file, and this hook is not sophisticated enough to notice discrepancies between them.

Three copies of files, sometimes with different content

The three copies are:

The committed copy in the current or HEAD commit. This file literally cannot be changed—it's frozen for all time—but it is important because it's the basis we will use for comparisons.

The index copy. This file can be changed. It's what you are proposing to commit: if your pre-commit and commit-message hooks permit the commit and all else goes right, the copy of the file that's in the index is the copy that will be committed. Hence, you can think of the index—which Git also calls the staging area—as, essentially, the proposed next commit.

These first two files—frozen HEAD copy, and index copy—are in a special, Git-only, compressed format. While the index copy can be changed, that's always done by replacing it, typically using git add to overwrite it. The git add command compresses a file into the Git-only format and places the compressed copy—well, technically, a reference to the compressed copy—into the index.

The work-tree copy. This file is an ordinary file that you can see and manipulate.

Now, you're using Git's LF/CRLF translation, as indicated by:

warning: LF will be replaced by CRLF in src/hello.c

The actual translation happens when Git copies the file from the work-tree to the index—i.e., during git add—or when it copies the file from the index to the work-tree, e.g., during git checkout. The extract-to-work-tree step changes LF-only line endings to CRLF line endings; the add-to-index step changes CRLF line endings to LF-only line endings. (You can control this and change it around somewhat, but that's the usual scheme.)

git status, git add, and the existing hook

Let's go to the script now, and look at a few lines:

for line in $(git status -s)

(technically this should be git status --porcelain, but at the moment they pretty much do the same thing: the main danger is that the --short output could be colorized, which would break the next bit)

  if [[ $line == A* || $line == M* ]]

Now it's time to consider what git status prints. The documentation says, about the short format:

... the status of each path is shown as one of these forms


where ORIG_PATH is where the renamed/copied contents came from. ORIG_PATH is only shown when the entry is renamed or copied. The XY is a two-letter status code. [snippage] X shows the status of the index, and Y shows the status of the work tree.

(Aside: copied is not currently a possible status for git status. The internal diff engine that git status invokes can set this, but to do so, the caller has to enable it, and git status just doesn't. If git status got new command-line flags or configuration entries that enabled copy detection, you could get C status-es, but as of now you cannot.)

The key item here is that the first letter, which is what the script is testing here, is based on the status of the index. That is, it's a summary of the result of comparing the HEAD commit to the index—to the proposed commit. A file will be Added if it's new in the index (does not appear in the HEAD commit), or Modified if it's in both the index and the HEAD commit, but the index copy is different from the HEAD commit.

The thing to realize here is that whether or not the index copy matches the head copy, the work-tree copy is a third file entirely. It might be quite different from one or both of these other two copies! That's OK, and in fact, that's deliberately the case if you use git add -p to selectively stage only part of the work-tree file. Just keep it in mind as we plow on.

Now let's go back to the pre-commit hook script:

    if [[ $line == *.c || $line == * || $line == *.h || $line == *.cpp ]]
      # format the file
      clang-format -i -style=file $(pwd)/${line:3}

       # and then add the file (so that any formatting changes get committed)
      git add $(pwd)/${line:3}

If the file name at the end of the line—which for A and M status files is just one file name; only R status files would have two names; but the script is faulty in not checking for R status files, since a file could be renamed and modified—ends in .c, , etc., this runs clang-format.

The input to clang-format is the work-tree file. The input almost certainly should be the index copy of the file, but it isn't. So the script assumes that the index and work-tree copy match.

Having run clang-format, the script then runs git add to copy the (updated) work-tree file back into the index. If we wanted to do this right, we'd need to format the index copy and then add the formatted index copy, which is pretty tricky. That's probably why the script is a little lazy, but it's definitely worth noting.

The work-tree file written by clang-format is probably going to have LF-only line endings (see https://reviews.llvm/D19031). This fits with the text of the warning:

warning: LF will be replaced by CRLF in src/hello.c.

This is telling you that the current work-tree copy, src/hello.c, has LF-only line endings. Git has been told that when Git copies from index back to work-tree, Git should change LF-only endings to CRLF endings.

More than three copies

Now things get complicated. I mentioned above that there are at least three copies of each file, and then described the places these three copies live. There's the HEAD commit, the index, and the work-tree. The one flaw with this description is the phrase the index, as Git will sometimes use a temporary index. That's the case for some git commit commands, but not for all of them.

The full story of git commit is that it always builds your new commit from an index, but not necessarily from the index. There is a "the" index—one particular, distinguished index that goes with the work-tree.1 Then there are extra index files, that some Git commands create for various purposes—e.g., git stash creates a temporary index to save the work-tree, and git filter-branch creates lots of temporary index files as it runs. Here, though, we're interested in git commit, and git commit will sometimes create one or two of its own temporary index files.

If you run git commit—with no extra arguments at all—git commit just uses the index file. That's your proposed commit, and it already has all the files in it. If your pre-commit hook runs git add, it copies new files into the index, displacing the old ones that were in the index, and eventually git commit writes out the new commit using the new files. If the new files came from the work-tree, things mostly match up, except maybe for CRLF line endings.

But if you run git commit --only or git commit --include, or even just git commit -a, Git takes a twist. If you run git commit file1, that means git commit --only file1, for instance, unless you add --include in which case it means git commit --include file1.

To do these operations—actually including plain git commit—Git makes at least one temporary index file, although for plain git commit this happens as late as possible. One temporary index file is named index.lock (well, .git/index.lock, depending on where your .git directory is). This temporary index will be the true source of files for the new commit. When the commit is finished, if it all succeeds, Git releases the lock by renaming .git/index.lock to be .git/index.

We can see these in action through a dummy .git/hooks/pre-commit that just prints the name of the environment variable $GIT_INDEX_FILE, then exits with a failure to prevent the commit:

$ cat .git/hooks/pre-commit
$ git commit
$GIT_INDEX_FILE is .git/index
$ git commit -a
$GIT_INDEX_FILE is [path]/git/.git/index.lock
$ git commit --only cache.h
$GIT_INDEX_FILE is [path]/.git/next-index-53061.lock
$ git commit --include cache.h
$GIT_INDEX_FILE is [path]/.git/index.lock


A plain git commit uses the regular index file. If your hook runs git add you'll replace files in the index. When Git gets around to creating the lock file index.lock, it creates it from index, and when git commit finishes (assuming success), your changes to the index, made by your hook, will take effect.

A git commit -a or git commit --include works similarly. The lock is made earlier, but git add should update the index.lock in place, and when git commit finishes, your main index should have the updates. (I have not tested this but it seems obvious.)

But git commit --only makes a temporary index (next-index-53061.lock) as well as locking the main index and git adding the --only files to the main locked index. When the commit finishes, the new commit's files will be those from the temporary index, including anything you updated; but the main index will come from index.lock, which is the old index with the specific files updated. When they got updated will control what is actually in that index.

1When you use git worktree add to create an additional work-tree, the extra work-tree gets its own singular index, so the index is the one that is paired with the work-tree: an added work-tree is a separate work-tree with a separate index. The added work-tree also gets its own HEAD, which makes things especially complicated on Windows, but we don't need to go there.


These are the pitfalls to be aware of in commit hooks. The implication of all of this is that, unless you want to get intimate with the innards of Git itself—of checking the name of $GIT_INDEX_FILE, for instance, and/or adding things to multiple index files—it's usually a bad idea to have hook modify the commit in progress. Instead, it's usually wiser to check the commit-in-progress. If the commit is good, let it proceed. If not, remind the user to run whatever is required, and have the commit fail.

You can modify the commit-in-progress; you just have to be aware of these weird cases.




