Contact

Dennis Kaarsemaker

Automatically pushing local git repositories to other sources

One of the many things I do at booking.com, is maintaining the infrastructure behind perl5.git.perl.org, the main git repository for Perl 5. The perl 5 committers moved to git a few years ago and we have been hosting this ever since. However, that repository not being on GitHub was causing some confusion with quite a few copies being pushed to GitHub anyway by individual contributors, and even a /mirrors/perl that we did not control. We couldn't even switch of pull requests, so possible contributions kept getting lost.

So now there's an official mirror at Perl/perl5.git! Instead of doing pushes from cron, I've set the master repository up to do pushes from the post-receive hook. Though because actually pushing will slow pushes to our master repository down, and could even cause errors, this is done via a beanstalk tube. It's a bit fiddly to set up, but works like a charm.

First the git post-receive hook. We already have a post-receive hook, so we need to do a little trick to run more than one hook:

#!/bin/sh
/usr/local/bin/pee /srv/gitcommon/contrib/hooks/post-receive-email \
                   /srv/gitcommon/contrib/hooks/submit-jenkins-job \
                   /srv/gitcommon/contrib/hooks/schedule-git-push

Pee is a tee-like utility to send input to multiple commands, making it really easy to have multiple hooks. So what does this hook do?

#!/bin/sh

remotes=$(git config hooks.push_to)
tube=$(git config hooks.push_tube)

push_to_github() {
    oldrev=$(git rev-parse $1)
    newrev=$(git rev-parse $2)
    refname="$3"

    if expr "$oldrev" : '0*$' >/dev/null
    then
        change_type="create"
    else
        if expr "$newrev" : '0*$' >/dev/null
        then
            change_type="delete"
        else
            change_type="update"
        fi
    fi

    # --- Get the revision types
    newrev_type=$(git cat-file -t $newrev 2> /dev/null)
    oldrev_type=$(git cat-file -t "$oldrev" 2> /dev/null)
    case "$change_type" in
    create|update)
        rev="$newrev"
        rev_type="$newrev_type"
        ;;
    delete)
        rev="$oldrev"
        rev_type="$oldrev_type"
        ;;
    esac

    # The revision type tells us what type the commit is, combined with
    # the location of the ref we can decide between
    #  - working branch
    #  - tracking branch
    #  - unannoted tag
    #  - annotated tag
    case "$refname","$rev_type" in
        refs/tags/*,commit)
            # un-annotated tag
            refname_type="tag"
            short_refname=${refname##refs/tags/}
            ;;
        refs/tags/*,tag)
            # annotated tag
            refname_type="annotated tag"
            short_refname=${refname##refs/tags/}
            # change recipients
            if [ -n "$announcerecipients" ]; then
                recipients="$announcerecipients"
            fi
            ;;
        refs/heads/*,commit)
            # branch
            refname_type="branch"
            short_refname=${refname##refs/heads/}
            ;;
        refs/remotes/*,commit)
            # tracking branch
            refname_type="tracking branch"
            short_refname=${refname##refs/remotes/}
            echo >&2 "*** Push-update of tracking branch, $refname"
            echo >&2 "***  - no email generated."
            exit 0
            ;;
        *)
            # Anything else (is there anything else?)
            echo >&2 "*** Unknown type of update to $refname ($rev_type)"
            echo >&2 "***  - no email generated"
            exit 1
            ;;
    esac

    if [ $change_type = "delete" ]; then
        beanstalk-submit "$tube" repo="$(pwd)" push_to="$remote" ref=:$refname
    else
        beanstalk-submit "$tube" repo="$(pwd)" push_to="$remote" ref=$refname:$refname
    fi
}

while read oldrev newrev refname
do
    push_to_remote $oldrev $newrev $refname
done

Those familiar with git may recognize parts of the post-receive-email hook there :) This hook interprets the input from git, and schedules pushes or deletions by submitting them to beanstalk. The repository configuration determins which tube will be used and which repository. For perl.git, this is

[hooks]
	remotes = git@github.com:Perl/perl5.git
        tube = github-push

The beanstalk-submit utility is a really simple python script that grabs arguments and turns them into a json document that gets submitted.

#!/usr/bin/python

import beanstalkc
import json
import sys

tube = sys.argv[1]
args = dict([x.split('=', 1) for x in sys.argv[2:]])

bs = beanstalkc.Connection('localhost', 11300)
bs.use(tube)
bs.put(json.dumps(args))

So now the pushes are scheduled. What's next? Actually pushing them of course! I have another script running in a screen session (yeah, I should turn this into a proper daemon, but I have bigger plans for that). In an infinite loop, it fetches jobs from the github-push queue and pushes code to the configured remote or remotes. It also ignores malformatted jobs and automatically buries failed jobs so they won't get retried.

#!/usr/bin/python

import json
import beanstalkc
import traceback
import os
import sys
from whelk import Shell

shell = Shell(redirect=False, raise_on_error=True)

bs = beanstalkc.Connection('localhost', 11300)
bs.watch('github-push')
while True:
    print "Waiting for job"
    try:
        job = bs.reserve()
    except KeyboardInterrupt:
        break
    except Exception:
        print "Reconnecting"
        bs = beanstalkc.Connection('localhost', 11300)
        bs.watch('github-push')
        continue
    try:
        data = json.loads(job.body)
        if not data or not isinstance(data, dict):
            job.delete()
            continue
        if 'repo' not in data or not data['repo'] or 'ref' not in data or not data['ref'] or 'push_to' not in data or not data['push_to']:
            job.delete()
            continue
        os.chdir(data['repo'])
        for remote in data['push_to'].split():
            shell.git('push', remote, data['ref'])
        job.delete()
    except Exception:
        traceback.print_exc()
        sys.stderr.flush()
        job.bury()

So there you go, now we have a near-instantly mirrored perl.git on github while still having our own main archive!

Comments

  1. Marius Gedminas

    Marius Gedminas on 10/16/2013 11:13 p.m. #

    I don't think your real script has $(tube) in it.

  2. Dennis Kaarsemaker

    Dennis Kaarsemaker on 10/16/2013 11:59 p.m. #

    Good catch. I did some cleanup for this post. Looks like a bit too much :)

Pingbacks

Pingbacks are open.

Trackbacks

Trackback URL

Post your comment