Contact

Dennis Kaarsemaker

Irssi for X-chat refugees

I'm a pretty horrible terminal junkie. As a large scale sysadmin, my life is spent mostly in a terminal, either fixing problems or writing puppet recipes to prevent them. I also manage mail with mutt (though I still mostly read it with evolution) and my favourite editor is vim.

As such, it's rather odd that I use X-chat for IRC, with a bouncer on my VPS, instead of using irssi in screen. Truth is that I've tried to switch from X-chat to irssi many times, but never could get used to it. A week ago I tried again. Determined to make it work this time, I started off with searching a theme that resembles X-Chat. A quick search uncovered Anton Fagerberg's irssi setup, including an X-chat theme.

/set theme xchat
/set indent 25
/set autolog yes
/set join_auto_chans_on_invite no
/set show_names_on_join off
/set channel_sync off

That made a world of difference! I felt a lot more at home already, so was quite motivated to really dig into what was left that annoyed me and fix all those issues. Step one was the nicklist: I really wanted it back. Even in larger channels, where it is less than useful, I like to see and browse through the list of nicks. And on our corporate jabber rooms (yay bitlbee) I find it essential. Fortunately there is a script available that can abuse screen to do this. Problem sorted.

/script load nicklist.pl
/nicklist screen
/set nicklist_automode SCREEN
/set nicklist width 24
/bind mup command nicklist scroll -1
/bind mdown command nicklist scroll 1

Next up was the list of windows. In X-chat I always use the channel tree view to get a quick overview of activity. In irssi I've always found this utterly impossible, the constantly changing window numbers always made me abandon it. A bit of searching, and I found that irssi can assign static numbers to windows. Problem solved! Or so I thought... irssi would now use the gaps in numbering to assign new windows to. This being the last thing to hold me back from really switching to irssi, I went ahead and dove into the code to find a solution. None existed, so I patched irssi myself. The patch adds an option to create new windows at the end of the list, or even at a much higher starting number. New windows now are numbered 200 and up, making it trivial to identify private messages.

/set windows_auto_renumber off
[... renumber all windows to my liking ...]
/set create_windows_at_end on
/set autocreate_window_min_refnum 200

My irssi setup was now usable! I didn't even want to switch back to X-chat, so victory could be declared there and then. But irssi is much more flexible, so why stop here? Another neat trick from Anton's page is the go.pl script. /go ubu<tab> would bring me to #ubuntu. Except it didn't, so I patched the script to make that work. As a bonus /go off now ...


Adventures in lazyness

My laptop is sitting a few meters away from me. I'm behind a desktop in the same /24. I'd like to SSH to this laptop, but don't know its IP address. On this network there are quite a few machines, mostly macs. How do I find the IP address?

Arp and nc to the rescue! First we arp-scan the network, then we find SSH versions.

$for h in $(arp-scan --localnet | grep 10.15 | cut -f1); do echo -ne "$h\t"; (echo "" | nc -w1 $h 22 || echo)|head -n1; done | grep ubuntu
10.15.3.28    SSH-2.0-OpenSSH_5.9p1 Debian-5ubuntu1
10.15.3.73    SSH-2.0-OpenSSH_6.0p1 Debian-3ubuntu1
10.15.3.158    SSH-2.0-OpenSSH_5.8p1 Debian-7ubuntu1
10.15.3.185    SSH-2.0-OpenSSH_6.0p1 Debian-3ubuntu1

Of course it was the last one I needed :)


Don't let your SSL certificates expire

It looks like Microsoft made a rather classical beginner mistake and forgot to renew an SSL certificate, taking down quite some Azure services doing so. I will not comment on what I think of that, but here is a tip to make sure this doesn't happen to you: monitor it!

It is really easy to forget a task that only needs to be done once every few years. Staff comes and goes, or gets reassigned. Reports get forgotten and putting it on your todo list only serves to make sure it doesn't get done. So instead of trying to remember when it is time to buy a new certificate, let your monitoring system check the age of your certificate and warn you if it's time to get a new one.

If you use nagios, this is really easy to set up, the check_http plugin already has this functionality! If your webservers are all in the https hostgroup, the following service and command definition will make nagios check certificate age every two hours so this type of embarrasment doesn't happen to you:

define service {
    use                   generic-service
    hostgroup_name        https,https-auth
    service_description   HTTPS SSL Cert Age
    check_command         check_cert_age!14!443
    normal_check_interval 120
    notification_interval 360
    notification_period   workhours
}
define command {
    command_name    check_cert_age
    command_line    $USER1$/check_http -S -C $ARG1$ -I $HOSTADDRESS$ -p $ARG2
}

The python trademark is in danger in Europe

Python trademark at risk in Europe: The python foundation need your help!

There is a company in the UK that is trying to trademark the use of the term "Python" for all software, services, servers... pretty much anything having to do with a computer.

In my not so humble opinion this is rather ridiculous. Python is the programming language originally created by Guido van Rossum and now maintained by a large community. Not some currently not-existing product by a british company.

According to our London counsel, some of the best pieces of evidence we can submit to the European trademark office are official letters from well-known companies "using PYTHON branded software in various member states of the EU" so that we can "obtain independent witness statements from them attesting to the trade origin significance of the PYTHON mark in connection with the software and related goods/services." We also need evidence of use throughout the EU.

So it's incredibly easy for any company using Python to help out and you should feel ashamed for yourself if you don't do it. At Booking.com we use python quite a bit (internal django apps, yum, mockbuild, mailman, func and a host of scripts for example) so I just sent our letter to the PSF. Hope it helps!


CV writing, or the art of selling yourself

Note for american readers: I'm european. In europe, CV and resume are synonyms and in the Netherlands, where I live, we call this thing a CV. An american reader of mine pointed out that what I'm describing is what americans call a resume, not a CV. In his words: "Here a resume generally only contains professional accomplishments. When someone asks for a CV, which usually only happens in education, they want more details like published works, community involvement if you're going for a Dean or Chair, etc etc".

Over the last few months, a few friends have asked me for help in writing a CV and applying for a job. As a person who has interviewed dozens of candidates and reviewed hundreds of CV's, I think I can safely say I know what I'm looking for and I know what makes a great CV for me. Unfortunately, most people suck at writing CV's. If you want to make it easier for a hiring manager to think he should interview you, make sure you understand the importance of the CV. There are no rules for writing one, though the following rules guidelines should help you write a great CV.

You are writing an ad. Not a CV.

Your CV is the only thing a hiring manager knows about you. After reading your CV, he needs to be willing to invite you in for an interview, so make it easy for him to decide to do so. Advertise yourself! This means:

  • Tell the hiring manager who you are
  • Give good, detailed information about you
  • Don't waste his time

I like CV's that start with an objective. What do you want in your career? This tells me a lot about whether you will fit in my team or not and whether I can give you what you want.

When reviewing CV's The worst thing there is, is a 14 page CV. Seriously. As a hiring manager I do not have enough time to waste it on 14 pages of a detailed description of your life, there are 30 other CV's waiting for me. On the flip side, I do want to know all the relevant things about you. So tell me what's relevant! For each education or experience item, tell me what you did. And tell me who you are.

Give facts about what you did, and start with important ones. Something like "Built a puppet environment for application X, allowing me and my team of 3 to install 200 servers in 3 days" is an excellent item. "Contributed to projects" (Yes, this is a quote from an actual CV) is not.

Also horrible is the common trap us techies fall into: mentioning everything we worked with. This is utterly useless. I "read" A CV the other day, which was a list 7 pages list each and every technology the person used in all of his jobs. This tells me absolutely nothing ...


Avoid flag days!

No, I don't mean holidays, go celebrate those as much as you can. I'm talking about the practice of making large, difficult and often irreversible changes to complex systems. They almost always cause more trouble than you expect and the work you now have to put in fixing the problem would have been better spent avoiding the need for a flag day.

As sysadmins and developers at a fast growing company, we've had to migrate many things to bigger and better systems or make other significant changes to our hardware and software stack over the years. Some of these migrations are easier than others, but one thing really stands out: flag days invariably suck. They're usually the easy route for making a change and if it works well it's awesome. The problem is that it never works well. As systems get bigger, dependencies become more complicated and it becomes harder and harder to plan your flag day.

Let's go for an example of a successful migration, no flag day.

Our mailsystem at the moment is a many-terabyte cluster of storage servers but began on a humble 1u server only a few years ago (I did say the company grows fast, didn't I?) It's grown over time and before we had this nice horizontally scalable cluster, we migrated it to ever bigger boxes when needed. These were simple migrations where even flag days worked, we just worked through the weekend.

The last single-server system however was a 4TB mailstore. Migrating that in one go was not an option, so what did we do instead?

First we built the new mailcluster next to the existing mailserver (we use cyrus+murder for this cluster, highly recommended) and tested it with bogus accounts. Next we tried to copy all data over in a timely manner. Unfortunately, this proved to be impossible as the combination of a 4TB rsync and cyrus' indexing processes simply takes too long. No more flag day. So instead of trying to do a big no-rollback migration, we did a per-user migration. This was made transparent by making our MX'es aware of multiple mailstores and adding perdition in front of the imap stores as semi-intelligent proxy. A lot more work, but it made the migration much less stressful. We could migrate per account as and when the relevant person was asleep and they wouldn't even notice. Success!

So when does it break down? When you don't realize the complexity of you problem and don't get forced to realize it like in the example above.

Like when we moved our puppet tree from subversion to git. This sounds unbelievably simple, so why did it take me a solid 9 hours of work, including a call to the on call network engineer for some emergency work? Because I tried to do too much at once. I wanted a flag day where one really was not appropriate. It would have been much ...


Goblet 0.1 alpha 2

I've just uploaded the source for goblet, my git web interface project to github. It's not quite complete yet, but I already find myself using it more than github's own web interface as it has fewer distracting elements. Current TODO list includes a working search, a blame view and fixing the branch selector, which looks odd in Chrome.

If you feel like using it, the README document should help, though it helps if you're a bit familiar with python, pip (and possibly virtualenv). Another item on the TODO list is packaging goblet and all its dependencies and sticking it in a PPA.


My take on git web interfaces

Like many people, I use github for managing my code repositories. It's a great tool to use, but it's closed source and I don't like vendor lock-in. So I mirror all my repos (using git-hub) and want a web inteface for that. However, when searching for git web interfaces, I found that most of them were ugly (gitweb), slow and unmaintained (Gitalist), closed source (Github) or tied into other products, such as issue trackers or complete project management tools (redmine).

I want a git web interface that looks better and is easier to customize than gitweb, but still remained just a web interface. Github can take care of issue tracking, I just want to be able to serve my repos from my own server, with a nice web frontend.

And thus Goblet was born.

Goblet is my take on a git web interface. Built on libgit2 and flask, it's easy to extend and customize. It currently does blob views (raw, or rendered) tree views, logs and snapshots.

For the design, I borrowed quite a bit from github. Given that I'm a lousy designer, this is probably a good idea. Though if the design is not to your liking, you can easily write your own templates to theme it and integrate it with your own website/design.


Debugging random connection problems (or: never change DNS servers)

For the last few weeks, we've seen a pretty constant but very small trickle in connection issues in our internal network. Application servers would complain about connections to databases timing out. Not all database servers were affected, but a fair chunk of them were, including some of the more important master databases. Nothing could explain these errors: no errors or packet loss were reported on the network side, no errors were seen on the MySQL side. So how do you debug this? This is how we (well, they. I wasn't involved personally) walked through it.

Graph everything

We have a metric buttload of graphs for our system. Even in imperial units it's a huge number. Most of them are stored in graphite, but we also have our homegrown monitoring system. The latter includes a set of graphs for application errors. We can group these various ways, but the nice one here was grouping them by exact error message. This told us that the problem happened on many machines, but more importantly: there was a pattern! The problem would happen for a few seconds each hour. Not on all machines at the same time (so, not a cronjob), but still: every hour.

tcpdump

Ok, so we can now predict when the problem will reoccur. This is great if you have no ideas, because to get an idea you need to collect data. Our database servers do an awful lot of traffic, and even if you manage to capture an error, finding it is ridiculously difficult if you can't narrow down your search window. So, we tcpdumped and caught one of the errors. Now the searching starts. What we found, was that during these problems there was some spurious DNS traffic going to unassigned IP addresses. The plot thickens...

DNS

Look at the post title. I gave away the solution didn't I? Not all of it though, because what follows now is a tale of hysterical raisins, paranoia, high availability and glibc.

It is reasonably well known that glibc's resolver (nss_dns) never reloads resolv.conf. This misfeature means that you need to use a local caching daemon, such as nscd or dnsmasq, to make sure DNS actually works for you if you don't use the same nameserver 100% of the time. Or you patch all your applications...

In a datacenter this usally is much less important than on a desktop or laptop, as nameservers rarely change. Though we still use nscd because mysql versions prior to 5.6 (not yet GA) have a ridiculously small host cache, which effectively means that it does a PTR lookup for every incoming connection. nscd has it's own problems though: in CentOS 4 for instance, nscd was rather buggy so we had to make it restart itself every hour. Yes, that very same hour as the interval between problems.

So what happens if nscd restarts? You fall back to nss_dns. When it does that, these PTR queries go ...


Playing with API's: git, github and docopt

Github is awesome for hosting code repositories. One of the more awesome things is their API that let's you do (almost?) anything you can do via the website. I like that, because that lets me write things that make me avoit the web interface (yes, I'm a CLI addict). And a python addict, so I was excited to see Ian Cordasco's github3.py. One thing I really wanted was to make it easier to fork/clone a repository. Yes, I'm aware of hub, but don't like it for two opinionated reasons: It wraps git instead of being a git subcommand, which feels dirty. And it's ruby, which feels dirty too :)

Besides, the best way to learn something is to tinker. So tinker I did. The result is git hub, a git subcommand that does various github actions.

dennis@lightning:~$ git hub whoami
Dennis Kaarsemaker
Profile   https://github.com/seveas
Email     dennis@kaarsemaker.net
Blog      http://www.kaarsemaker.net
Location  Amsterdam
Company   Booking.com
Repos     36 public, 0 private
Gists     4 public, 0 private
RSA key   ...N0nFw3oW5l (Dennis)

And that's just the beginning. I've already used it to fork and/or clone repos, fix repo configs and file pull requests. And of course it has a graphviz thing that visualizes part of your social coding network. Another thing I used is docopt for creating the command line interface (options and arguments). Instead of having to declare all options manually with optparse or argparse and have your --help output be autogenerated, it works the other way around: you write your usage info and it generates the parser. Much less annoying!

More info on github (where else).


  • Page 1 of 2
  • 1
  • 2

Calendar

June 2013
SunMonTueWedThuFriSat
March 2013  
      1
2345678
9101112131415
16171819202122
23242526272829
30