Remote Working – 3 Year Retrospective

A number of people have recently written about remote & distributed teams (including my friend and colleague Avleen), some of them advocating the benefits of a remote-friendly and distributed culture, some arguing against.

It’s been a little over three years since I joined Etsy as a remote employee, so I thought I’d write a little about my experiences of remote working, both the good and bad, by way of throwing a dash of real-life experience into the mixing pot of the “remote or not” discussion.

Before I dive in though, I should probably give a little context to my working situation, as various factors will come in to play as I work through this retrospective.

I currently work from home in a small town in the south-east of the UK for the Etsy operations team as a Staff Operations Engineer. We’re a heavily distributed team with people spanning 4 time zones, although I’m currently the only person outside the US, which means my work day is between 5 and 7 hours ahead of the rest of my team.

This is the only job I’ve ever had where I worked remotely,  and I should probably add the disclaimer up-front that I’m somewhat biased in favour of remote working.

The Good

Now that we’ve gotten the disclaimers, caveats and background info out of the way, let’s start off by taking a look at what I consider the positive aspects of remote working.

Productivity

One of the stand-out positives of working remotely for me has been the impact it’s had on my productivity. My natural tendency is to avoid crowded and noise places, so being out of the typical busy office environment and able to get some peace and quiet on a regular basis has made me far more productive and relaxed.

The fact that I’m 5 hours ahead of the rest of my team has also turned out to be a benefit to my productivity here too – because I’m usually the only person on the team at work until 2PM or so in the UK, my entire morning is a block of time without any interruptions where I can get through tons of work. I’m also a morning person, so my brain is freshest when I start work.

The other nice thing here is that when the rest of the team get to work in the afternoon UK time, I can spend the afternoon on other parts of my job like mentoring more junior team members, planning, attending meetings, collaborative work etc etc. Although it might not be to everybody’s tastes, I’ve really come to value having those few hours in the morning to buckle down and get work done – I also find it makes it easier to balance the different demands of my role. As it happens, I’m actually using those productive hours to write this blogpost :p

Quality of Life

Another positive of working from home I’ve found particularly beneficial is the improvements it has made to my quality of life. Rather than waxing lyrical about this for too long, I’m going to highlight a few specific examples that I’ve come to particularly value.

  • No Commute. I often joke that a bad commute for me is having to walk around a clothes dryer on the way to my desk, but there’s a serious point to make here – rather than spend 2 hours a day commuting as I did when working in London, I have a 10 second walk to my desk. This also gives me an extra 2 hours a day to play with….in my case, I literally do *play* with this time – see the next bullet.
  • Quality Mornings.  I define quality mornings as the extra time I get to spend with my wife in the mornings due to working from home. Since I’m a morning person and my wife is most definitely not,  while she’s getting ready for work I can take care of making coffee, breakfast and her lunch, then we sit down and have breakfast together. Once she’s left for work I still have roughly an hour until I start, so I play Xbox for a while to get my brain into gear. Don’t laugh, it works for me 😀
  • Flexibility. Another major plus point to remote working is the flexibility that it affords – I’m always at home to receive deliveries. Car needs to go to the garage? No worries, I can pop by. Wife needs to attend a courthouse with bad parking? (She’s a paralegal, not a criminal 😉 ) I can take a few minutes to drop her off. Individually, these are all very small things, but the cumulative effect makes the trials and tribulations of daily adulting much easier to deal with
  • Freedom of Location. One of the most tangible benefits to working from home (or remotely from a co-working space) is the fact that you no longer have to live in or near a major tech hub like New York, San Francisco or London which makes life a *lot* cheaper. In fact (tax and immigration rules allowing) it doesn’t really matter *where* you live, as long as you have decently fast internet. When I started at Etsy I was still living in London, but since then I’ve moved back to Scotland for a year, and then on to my current home. My wife and I are now considering a move to Europe to be closer to her family, and I cannot overstate how much less stressful this is thanks to the fact that my job can move with us.

The Bad

Of course, remote working isn’t all unicorns and rainbows – there are negatives too, and some aspects of remote life that make things more difficult than working in an office. To balance things out, let’s take a look through what I’ve struggled with or found difficult.

Cabin Fever

The fact that your home and your office are in the same physical building can often lead to cabin fever in varying degrees. In my case I don’t find this too problematic due to my aforementioned tendency to naturally avoid crowded and noisy places, but there are occasions where I just need to get outside of these four walls.

There are various strategies to alleviating this problem however, largely involving braving the outside world and the harsh glare of the day-star. Here are a few tips and tricks I make use of to make sure I don’t get too stir crazy:

  • Walk to the shops. We buy our groceries from a small supermarket a short distance a way, and rather than doing a large weekly shop I’ll walk there maybe two or three times a week to do a smaller shop, just to get out of the house and stretch my legs a bit.
  • Exercise at lunchtime. Since you work from home and don’t have to commute anywhere, where possible why not try going for a walk or exercising at lunchtime. In my case, my gym is walking distance from home so I make a point of going there at lunch to break up the work day and get some fresh air.
  • Seperation of space. Don’t work from the couch in front of the TV. Set aside a space where “work” lives so that you have some sort of physical separation between work time and play time. In my case, living in a fairly small flat, I don’t have a separate room for my office, but rather have a corner of the living room set aside. Although still in the same physical location, I find being able to leave my desk at the end of the day hugely beneficial.

Which, as it happens, brings me on to another aspect of working life that’s made harder by working from home…

Stopping Work

When you work and live in the same place, actually stopping work at the end of the day gets harder. There’s always the temptation to quickly check your emails, or IRC, or just follow up on that one thing. In my case, the timezones of my team mean that they’re all working until around 11PM UK time, which exacerbates this problem.

One of the toughest parts of my particular working situation, and that which I’ve had to be the most disciplined about, is stopping work at 6PM and not starting again until the next day.

I would almost describe my approach to this problem as slightly militaristic in fact, and I’m often to be found recommending the tricks I’ve learned to other colleagues who are in danger of over-working themselves:

  • No work email on my phone. I’m lucky enough that I work in a fairly large Ops team where I’m not solely responsible for on-call etc…this means that, especially since most of my team are at work when I’m not, there is no good reason for me to have my work email on my phone. It’s just adding temptation to check my work email while I’m relaxing after work – learning to accept that everybody else is on top of things and that I’m actually allowed to not be at work took some learning, and it’s something I try to be very careful to remember.
  • When you leave your desk, leave your desk. At 6PM every night, I stop working unless I’m in the middle of something vitally urgent. I don’t attend meetings after this time, I don’t reply to emails, I don’t check IRC. Although my desk is in the living room and hence never more than 10 feet or so away, I am very strict with myself about downing tools at the end of the day and not going back to my desk until the next morning. Again, not being a solo Ops person helps a lot here – knowing my team has my back makes this mainly a matter of mental discipline on my part.
  • Timezones are a thing. Again unique to my particular work situation, my team are at work roughly between 2PM and 11PM UK time. They don’t turn up for work at 9AM my time (4AM in NY), so why should I be at work at 11PM? This is again a mental discipline thing – I force myself to stop working when I’m supposed to. Work life balance is crucial to get right.

Communication

The negative aspect of remote working you’ll most often read about when people are arguing against it is that of communication. When you have people working across physical locations, timezones and even countries, communication gets harder. People aren’t able to gather around the water cooler, it’s easy for people to feel left out if they’re the one who isn’t in the office, and including remotes in meetings and discussions can often be tricky.

I’m going to be completely honest here, many of these points are valid and are indeed things that a remote-friendly culture will most likely struggle with. I would argue, however, that none of these things are insurmountable. While I’m not going to claim that Etsy is perfect and that we never have communications issues and that I never get pissed off with video conferencing systems, there *are* a number of things we do which help a lot here.

  •  Move communications online. At Etsy, the main venue for communication amongst the Ops team is IRC. True, we do have this a little easier than most since more of the Ops team is remote than in NY, but I still maintain it’s not that hard for teams to adjust to. Whether IRC, Slack, HipChat or something else, if you’re going to be adding remote workers to your team you’re going to need to move communication to a remote friendly venue. This also serves to ameliorate the “water cooler” problem. If the watercooler is actually the #watercooler IRC channel, your remotes are just as included as everybody else is.
  • Over Communicate. It’s often very easy to assume that everybody on your team knows what you’re working on, or maybe that the thing you just did is too trivial to mention – in many cases, especially when working with remote colleagues, this is not the case. At Etsy we try to if anything *over* communicate what we’re working on, changes we’ve made, tools we’ve come up with etc to both make sure that everybody’s included in both discussions, and that people are able to get praised when they do cool stuff. I’m not gonna lie, everybody likes to be praised, and if that doesn’t happen unless somebody’s looking at what you did over your shoulder, it’s very easy to make remotes feel left out.
  • Invest in Video Conferencing. Please, for the love of all that is good and holy, if you only do *one* thing in this section, invest in decent video conferencing. When you have remote employees, *every* meeting they are in will necessitate a video conference. If it takes 20 mins out of every meeting to get people dialled in, it frankly sucks. Similarly, if you can’t see or hear the people at the other end of the call, it makes you feel excluded and not part of the meeting. In the 3 years I’ve been at Etsy I’ve seen significant investment and improvement in our video conferencing setup, and let me tell you it makes life as a remote a *lot* easier.

Sub-standard Treatment

I wasn’t completely sure what to call this section, but I think the title I chose works – one of the difficulties of being remote is that when you’re out of the office is that you’re often treated very differently to office-based colleagues.

I’m going to assume that many of you reading this blogpost will be in some way involved in the tech industry. We all know about the perks generally offered by tech companies, don’t we. Free meals, subsidised gym memberships, ping pong tables, all that jazz. If you’re a remote sitting in your own house, you’re often excluded from that. So it’s very very important to make remotes feel that they’re treated just as well as anybody else.

This is something that, to be honest, Etsy wasn’t that great at some aspects of when I started  – I’ve seen a lot of improvement in how Etsy handles these things as we’ve hired increasing numbers of remote employees, and in all honestly it makes me feel a lot more valued by the company as a whole, rather than just by my team.

So what exactly can you do to help your remotes feel like they’re not being excluded from nice desks and happy fun times?

  • Let Your Remotes Expense! If people in your offices have nice ergonomic chairs and height adjustable desks, why not let your remotes do the same? In the case of the Etsy Ops team, for example, we’re able to expense the same chairs that folks in the office get, the same external displays, and our company benefits package allows for stuff like letting remotes claim back Gym membership costs and other benefits to bring them roughly in parity with office-based employees. When I started, we didn’t even *have* a benefits package in the UK part of the company, so this is something that has improved hugely – and makes me feel more valued as a result!
  • Bring your Remotes to the Office. Make it easy for your remotes to travel to the office every so often to hang out with everybody. Let them expense flights, hotels and food when they’re in town. Seriously. Regardless of how much you like working from home, assuming you like the people you work with it’s always gonna be nice to go and hang out with them. In my case, it’s also pretty cool to get to go to America, it’s something I’d never done before starting this job!

So that covers the good things about working remotely, and the things that aren’t so great.

So Can Everybody Work Remotely?

But Jon,  I hear you ask, does this mean that with a little elbow grease everybody can work remotely and everything will be amazing?

I would argue here that the answer is “maybe”. For remote working to work well, you need people who actually like working remotely, and a company prepared to support it. Both are equally important, and if one part is missing, then the whole thing falls apart.

On a personal level, you have to be prepared to deal with spending a lot of time in your home, and even with the best team in the world there will be a lot less inter-personal interaction than you’d get in an office. If you’re the kind of person who thrives on being in a group of people and in a busy environment, you may find remote working a bit of a shock to the system, and one you may not like.

On a company wide level, you have to be prepared to invest in supporting your remote employees and making them feel equally valued to those in your offices. From a paperwork perspective, it’s pretty trivial to tell people that they can work from home, but if you don’t invest in the remote experience you’re offering them, you can easily end up with excluded and unhappy employees.

Hopefully the points I’ve raised above and my own experiences at Etsy have helped to shed some light on what it’s like to be a remote worker…in my case, I freaking love working from home and have often said I honestly don’t know if I’d want to go back to an office environment.

In closing, if you want to work remotely or to have remote employees, it basically boils down to this.

Be prepared to work at it, and be awesome to each other. Remote working can be an amazingly empowering and positive experience, but it doesn’t come for free. Effort in, results out – from both company and employees.

Unplugging from the Matrix

Today’s my first day back at work after a week long holiday. A holiday is nothing unusual of course, but I wanted to write down a few thoughts on an aspect of my holiday which is a little more unusual. For the entire duration of that week, I was totally disconnected from the internet. I didn’t take my laptop with me. There was no wi-fi where I was staying. Data roaming on my phone was turned off.

This is a habit I acquired around two years ago now, and every year I make sure that I spend at least one (but preferably two weeks) completely cut off from the online world. Imagine that – no email, twitter, github, irc, skype or interwebs.

But guess what. The world kept on turning quite happily without me. Ops folk (and developers as well) are almost universally bad at unplugging in this way. Many of our professional lives revolve around the online world and the thought of forcibly disconnecting ourselves makes us all twitchy.

I must admit, my motivations for these annual offline weeks are purely selfish – I find it recharges my batteries and refreshes my mind. That’s been especially important this year as I’m entering the closing stages of writing a book, and had been skating on the edge of burnout-territory for a couple of months.

But there’s another side to this as well – I was catching up with one of my colleagues this morning, Pete Bellisano, who made the following observation:

I’d imagine it’s incredibly healthy thing to do, both from a personal and organizational view point

My feeling is that Pete was entirely correct on this point – although taking a week away from the constant barrage of information is good for me on a personal level, it’s also a healthy exercise for the company I work for and the people I work with.

The fact that I was able to take that week away without anybody having to call me on my mobile – the only means of communicating with me during that week – let us know that we’re doing a good job of making sure that I’m not a single point of failure for anything. It’s the classic “what if X gets hit by a bus” dilemma.

It’s my personal opinion that everybody should try taking an offline week at least once a year – without fail, I return to work feeling refreshed and ready to rock. I’m also writing the first post on this blog since October last year, which says something by itself!

Having said that, I realise that it’s not possible for everybody to do that – you might be the only person on your team, or your organization might not be all that “bus proof” at the moment. So my challenge to you is this:

  • Take 1 week off work per year and totally disconnect from the internet.
  • If you can’t do the above this year, aim to be able to do it next year. Figure out what you need to change to make that happen, and focus on fixing it.

Even if by this time next year you’re still not able to take that offline week, hopefully you’ll have gotten closer to it – and if you’re already working in a good team where you’re not a single point of failure you haven’t got any excuses. Leave the laptop at home, and go take a break. You won’t regret it.

 

knife-spork 1.3.0 released

I don’t usually write blogposts for each new knife-spork release, but along with the usual smattering of bugfixes this release has a couple of fairly significant new features I wanted to highlight and explain in greater detail.

Spork Omni

One of the most requested features in knife-spork was a more simplified workflow. A lot of the people who use knife-spork follow the bump, upload, promote (or promote –remote) pattern every time they change a cookbook.

With that in mind, I’ve added a new command, spork omni. This essentially combines bump, upload and promote (or promote –remote) in to one step. Here’s an example of omni in action:

$ knife spork omni apache2 --remote
OMNI: Bumping apache2
Successfully bumped apache2 to v0.3.99!

OMNI: Uploading apache2
Freezing apache2 at 0.3.99...
Successfully uploaded apache2@0.3.99!

OMNI: Promoting apache2
Adding version constraint apache2 = 0.3.99
Saving changes to development.json
Uploading development.json to Chef Server
Promotion complete at 2013-08-08 11:43:12 +0100!
Adding version constraint apache2 = 0.3.99
Saving changes to production.json
Uploading production.json to Chef Server
Promotion complete at 2013-08-08 11:43:12 +0100!

When using omni, all spork plugins run as if each individual step were being run, which is what in actual fact is happening under the hood. Omni is really a convenient wrapper for the most common spork workflow.

Node, role and databag Commands

One of the annoyances that we’ve experienced at Etsy is that whilst spork gives us excellent visibility into cookbook changes, we’re still effectively in the dark when it comes to role, node and databag changes. With that in mind, I’ve added spork commands which wrap all of the destructive knife default node, role and databag commands.

By destructive commands, I mean those which in some way alter the chef server by changing a run_list, uploading new data bag items etc. All of the spork equivalents run the default knife command under the hood, they just wrap them in spork’s plugin API so that you’re able to see IRC notifications when you upload a role, say.

The following data bag commands are provided in knife-spork:

knife spork data bag create
knife spork data bag delete
knife spork data bag edit
knife spork data bag from file

The following node commands are provided in knife-spork:

knife spork node create
knife spork node delete
knife spork node edit
knife spork node from file
knife spork node run_list add
knife spork node run_list remove
knife spork node run_list set

The following role commands are provided in knife-spork:

knife spork role create
knife spork role delete
knife spork role edit
knife spork role from file

And here’s an example of the IRC notification you might see from running knife spork node run_list set mynode.mydomain.com

CHEF: Jon Cowie set the  run_list for mynode.mydomain.com to 
["role[Base]", "recipe[awesome::stuff]"]

You can find knife-spork-1.3.0 on Rubygems.org now, please do check out the CHANGELOG for details on the rest of the changes in the new version.

Why I retired mycrot.ch

Way back in 2009, I decided to buy myself an amusing vanity URL. After much careful thought, I ended up choosing mycrot.ch (I won’t lie, because I thought it was funny as hell), which I have been using for my personal site / blog until the end of last week. But I’ve now decided to retire it. Why? Read on…

Those of you who follow me on twitter may have seen part of a minor argument I had with some folks on Friday night. Here’s a link to as much of the thread as twitter will show me in one place (I’ll paste screenshots of tweets excluded from this below as needed) https://twitter.com/jonlives/status/272037883421020160

Now, for context as to what I was complaining about, have a look at the hall of fame at the bottom of this page.

The gist of the argument is that I think it’s totally unacceptable for Comic Relief to be suggesting it might be funny to place your underpants on the desk of your receptionist and tell her that you’re going commando – in my (non lawyer) opinion, this sort of thing is pretty clearly sexual harassment  As you can see from the above twitter thread, several people (including @choosenick and @stef who worked on the project) didn’t see my complaint as anything other than being overly sensitive. So I replied with the following:

The aforementioned @choosenick then called me out with the following:

From subsequent comments on twitter I’m fairly sure he meant the above comment to be flippant, but it got me thinking. He’s actually got a perfectly valid point. Sure, my domain name is personal and not specifically directed at women, but is it really *that* different to what I’m complaining about? A good friend of mine observed that mycrot.ch was what he called “undirected sexual humour”, the problem arising with the typical direction in which such humour is normally directed – ie towards women. I have no actual evidence that my domain name ever offended anyone, but it’s still symptomatic of the sort of “bro” humour that pervades tech these days.

My employer, Etsy, have recently been doing an excellent job of encouraging more women to apply for tech roles, and it’s got me thinking a lot more about the issue recently. I generally fall into the apathetic bracket when it comes to issues and doing anything about them, but this time it struck me I could actually do something to demonstrate where I stand.

So, I shut down mycrot.ch and tweeted the following.

The following are the direct @replies I received in response to that tweet, all from people involved in the Comic Relief project:

It may be that I’m over-reacting, or making tiresome assumptions about what women find offensive, but my honest opinion is that by perpetuating (even non-directed) sexual humour, I’m also helping to perpetuate the idea that it is normal and accepted within the Tech Community at large.

I’ve now migrated my personal site and blog to the jonliv.es domain you’re reading this on now. My personal website is also directly linked to my reputation, both professional and personal, and I’m making a stand that I don’t want either to say “Hey, this guy’s a bro just like us”. I feel strongly that we as a tech community need to take steps to reverse the “testosterone and booth-babes” culture which has become ingrained in our industry. It may even be the case that until the scales are balanced again, we have to become more sensitive about our humour etc than we’d perhaps like to be the case.

Maybe one day, when any woman who wants to work in tech can be sure of being treated as fairly as her male counterparts, and isn’t deterred from working in tech for any reason other than genuinely not being interested, I’ll reclaim mycrot.ch. But for now, jonliv.es.

Incidentally, Comic Relief still have that video online, so please do give them your feedback if you feel the same way as I do.

knife-spork 1.0.0 released

Ohai chefs!

It’s been nearly 3 months since the last knife-spork release, but I haven’t forgotten about you all. Oh no. I’m happy to announce that finally knife-spork has hit version 1.0.0, and you can get it from Rubygems or Github right now.

I don’t usually write blogposts for new knife-spork releases, but since a lot of the changes in this release are behind the scenes, I thought I’d give you all a bit more insight into what’s been done that just pointing you at the Changelog.

Since the first version of knife-spork was released in Jan 2012, there’s been what developers like to call a lot of “organic growth” in the codebase. In short, what this means is that I added a ton of new stuff without cleaning up the old. The result of this was that (and I’ll be the first to admit this) the codebase was rather cluttered, and contained a lot of duplicated code. This isn’t exactly ideal for me as the main programmer, but it’s doubly annoying for those of you who’ve contributed code to knife-spork, because finding your way around the source for the first time to figure out where your contribution belongs isn’t the easiest.

With that in mind, I’d like to say a big thank you to Seth Vargo from Customink (@sethvargo) who took on the task of refactoring the entire thing! Seth’s cleaned up the code, cleaned up command outputs and structured things a lot more logically and generally give the code a good tidyup.

Seth also contributed a significant new feature to knife-spork with his implementation of a plugin framework. Support for several external systems like irccat, Hipchat and Git was already present in knife-spork, but Seth’s separated this out into a proper plugin framework which will make it far easier to work on specific integration points and integrate new systems.

Alongside Seth’s sterling work, I’ve also added a few new features:

  • The spork-config.yml schema has changed slightly to reflect the new plugin framework. Please check out README.md in the knife-spork repo root for more details
  • “knife spork info” command to display the config Hash that spork is using, and show you the status of any Plugins you have installed
  • spork check now gracefully handles missing local / remote cookbooks (bug reported by Jason Perry)
  • spork check now has an optional “–fail” parameter to make it throw a non-zero exit code if any checks fail (suggestion by Jason Perry)
  • A new safety check has been added which will prompt you for confirmation if you’re promoting a version constraint more that version_change_threshold versions ahead
    • The abovementioned version_change_threshold is now a configurable parameter in your spork-config file which defaults to 2 if not set.
    • Version diffs are calculated as follows: A patch level release increments by 1, a minor level release increments by 10, a major release increments by 100.
    • The default threshold value corresponds to the patch level changing ie from 1.0.1 to 1.0.3

I’d also like to add a note of thanks to Bethany Benzur and Nick Marden who submitted patches for a couple of bugs pre-refactor. Although not present in the original form they were submitted in, both patches have been incorporated into the refactored code.

There’s a whole other pile of new features I wanted to get into this release, but I decided to get the new code out there for you all to use, and save the new shiny for version 1.0.1 Here’s a sneak preview of some of the stuff I’m working on for the next release:

  • Configurable git behavior modes to support different git workflows
  • The ability to “lock” and “unlock” cookbooks you’re working on
  • Expose organizations information in notification messages
  • Cookbook diffing to warn you about large changes

As always, thanks for using knife-spork and please keep the suggestions, Issue reports and Pull requests coming!

Scaling Chef with more API Workers

We’re big fans of Opscode’s chef software at Etsy, and are using it on close to 700 nodes. Recently though, we found that we were beginning to see a large number of connection time outs during Chef runs. A little digging revealed that although the hardware on which we run Chef was by no means struggling, the API worker (the process running on port 4000 you point knife at by default) was continually maxing out a CPU core.

The default configuration which Chef ships with runs a single API worker, which is more than sufficient for most environments but evidently we’d hit the limit of what that worker could handle. Fortunately, scaling Chef to spawn more workers and make better use of a modern multi core machine is easy, though a little poorly documented. So, as with most of the posts I write here, I thought I’d document the process for anyone else hitting the same issues.

Please note, the following instructions are for Redhat / CentOS based systems, although most of the steps are platform agnostic.

The first step to multiple worker nirvana is to configure chef-server to start multiple worker processes. To do this, you’ll want to edit /etc/sysconfig/chef-server and change the OPTIONS line to the following, changing the number of processes as desired – in this example, we’re starting 8:

#Configuration file for the chef-server service
#CONFIG=/etc/chef/server.rb
#PIDFILE=/var/run/chef/server.pid
#LOCKFILE=/var/lock/subsys/chef-server
#LOGFILE=/var/log/chef/server.log
#PORT=4000
#ENVIRONMENT=production
#ADAPTER=thin
#CHILDPIDFILES=/var/run/chef/server.%s.pid
#SERVER_USER=chef
#SERVER_GROUP=chef
#Any additional chef-server options.
OPTIONS="-c 8"

Once you’ve done this, run /etc/init.d/chef-server restart, and then run “ps -ef | grep merb”. You should now see output similar to the following:

chef 16495 1 10 Feb23 ? 2-02:55:03 merb : chef-server (api) : worker (port 4000)
chef 16498 1 8 Feb23 ? 1-15:48:30 merb : chef-server (api) : worker (port 4001)
chef 16503 1 8 Feb23 ? 1-17:33:12 merb : chef-server (api) : worker (port 4002)
chef 16506 1 8 Feb23 ? 1-17:34:43 merb : chef-server (api) : worker (port 4003)
chef 16509 1 9 Feb23 ? 1-17:59:06 merb : chef-server (api) : worker (port 4004)
chef 16515 1 8 Feb23 ? 1-17:45:54 merb : chef-server (api) : worker (port 4005)
chef 16518 1 8 Feb23 ? 1-16:06:50 merb : chef-server (api) : worker (port 4006)
chef 16523 1 8 Feb23 ? 1-17:39:14 merb : chef-server (api) : worker (port 4007)

As you can see from the above output, the new worker processes have been started on ports 4000 through 4008. If we want our chef-clients to hit our new workers, we’re going to need a load balancer sitting in front of the workers. Luckily since our worker processes communicate over HTTP, we can use Apache for this through the use of it’s mod_proxy_balancer module. I’m going to assume that you’re familiar with the basics of setting up Apache here, and just cover the specifics of load balancing our workers.

The following vhost example shows how to enable the mod_proxy_balancer module and balance across our new worker processes.

<VirtualHost *:80>
   ServerName chef.mydomain.com
   DocumentRoot /usr/share/chef-server/public
   ErrorLog /var/log/httpd/_error_log
   CustomLog /var/log/httpd/access_log combined
   <Directory /usr/share/chef-server/public>
     Options FollowSymLinks
     AllowOverride None
     Order allow,deny
     Allow from all
   </Directory>
   <Proxy balancer://chefworkers>
     BalancerMember http://127.0.0.1:4001
     BalancerMember http://127.0.0.1:4002
     BalancerMember http://127.0.0.1:4003
     BalancerMember http://127.0.0.1:4004
     BalancerMember http://127.0.0.1:4005
     BalancerMember http://127.0.0.1:4006
     BalancerMember http://127.0.0.1:4007
   </Proxy>
   <Location /balancer-manager>
     SetHandler balancer-manager
     Order Deny,Allow
     Deny from all
     Allow from localhost
     Allow from 127.0.0.1
   </Location>
  RewriteEngine On
  RewriteCond %{REQUEST_URI} !=/balancer-manager
  RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_FILENAME} !-f
  RewriteRule ^/(.*)$ balancer://chefworkers%{REQUEST_URI} [P,QSA,L]
</VirtualHost>

You might notice that I’ve omitted our original worker on port 4000 from the balancer pool – this is so that we can migrate traffic off our overloaded single worker without throwing any more at it. Once all of our nodes are talking to the load balanced pool, our original worker will be idle and can then safely be added into the pool with its fellows.

Once you’ve configured a suitable vhost with your worker pool, restart Apache and make sure that the host name you configured works properly. It’s also worth having a look at the balancer-manager we configured above as well (http://yourhost/balancer-manager) as this will show you the status of your worker pool and let you tweak weightings and so on if you so desire.

Now that our load balanced worker pool is up and running, all that remains is to point chef-client on our nodes at the new host name. I’m going to assume here that you’re cheffing out your client.rb file – you are cheffing out your client.rb, aren’t you? Anyway, this step is as simple as changing the chef-server line from port 4000 to port 80 (or whatever port you set up your Apache vhost on) – a sample snippet from client.rb is below:

# Main config
log_level :info
log_location "/var/log/chef/client.log"
ssl_verify_mode :verify_none
registration_url "http://chef.mydomain.com:80"
template_url "http://chef.mydomain.com:80"
remotefile_url "http://chef.mydomain.com:80"
search_url "http://chef.mydomain.com:80"
role_url "http://chef.mydomain.com:80"
client_url "http://chef.mydomain.com:80"
chef_server_url "http://chef.mydomain.com:80"

With that all done, presto chango – your chef-clients are now pointing at a shiny new pool of load balanced workers making use of as many CPU cores as you can throw at them. Once chef-client has run on all of your nodes, you’ll probably want to add our original worker on port 4000 into the loadbalancer pool again as well.

It’s worth noting that we found the optimum number of worker processes for our setup to be 10. We’re running close to 700 nodes with an interval of 450 seconds and a splay of 150 seconds, but your mileage may vary. Providing your chef-sever’s underlying hardware can handle it , keep adding workers until you stop seeing connection timeout errors. I’d recommend you don’t add more workers then you have CPU cores, and remember that you need to leave enough free cores for the rest of Chef’s processes.

Chef Development with Shef

I thought I’d do a post how to use Shef, the interactive chef console, for iterative cookbook development for those times when you just want to experiment without uploading anything to the server as it’s a workflow I use heavily and have found really useful.

Getting Started

To get yourself set up for developing with shef, you’ll need to perform the following steps:

  • Make sure you have a copy of your chef repository on a machine similar to that on which your cookbooks normally run
  • Make sure you’re on the latest version of the chef gem (a bunch of shef bugs were fixed post-0.10.4)
  • Create a file called solo.rb somewhere, and paste the following into it (changing cookbook_path to wherever you have your chef repo):
file_cache_path "/var/chef-solo"
cookbook_path "/home/myuser/chef/cookbooks"
  • Create a file called anythingyoulike.json, containing the following (changing mycookbook as appropriate):
{ "run_list": [ "recipe[mycookbook::default]" ] }
  • Please note, this run_list should contain all the cookbooks you want to be available to you while using Shef. It doesn’t have to be the entire cookbooks directory, but make sure that if your cookbook has any dependancies, you include them here.
  • Run the following command (changing file paths as applicable):
sudo shef --solo --config ~/solo.rb -j ~/anythingyoulike.json
  • You’re now in the shef console, and should see output like the following:
loading configuration: /etc/chef/solo.rb

Session type: solo

Loading..[Tue, 07 Feb 2012 10:49:36 +0000] INFO: Run List is []

[Tue, 07 Feb 2012 10:49:36 +0000] INFO: Run List expands to []

done.

[Tue, 07 Feb 2012 10:49:36 +0000] INFO: Setting the run_list to ["recipe[mycookbook]"] from JSON

This is shef, the Chef shell.

Chef Version: 0.10.8

http://www.opscode.com/chef

http://wiki.opscode.com/display/chef/Home

run `help' for help, `exit' or ^D to quit.

Ohai2u me@mydomain.com!

chef >

You’re now using the shef console!

You can see that the run_list we specified in the JSON file has been picked up by shef. In the context of shef, this run_list is the list of cookbooks available to it, none of then will actually be run until we tell it to.

Performing a Chef Run

Now, to actually run our cookbook, we want to enter the recipe context. This is done as follows:

chef >
chef > recipe
chef:recipe >

Next, we need to load our cookbook into the current context. Please note, you can only do this with cookbooks that are shown in the run_list. Otherwise shef won’t know where to find them.

We do that with the include_recipe command:

chef:recipe>
chef:recipe > include_recipe "mycookbook::default"
=> [#<Chef::Recipe:0x00000004b5bdb0 @cookbook_name=:mycookbook, @recipe_name="default", @run_context=#<Chef::RunContext:0x00000004c1a300

<snip>

chef:recipe>

The above command will give you a huge amount of output, as it’s basically loading all of the resources from the recipe we gave it.

Next, to perform a chef run, use the following command:

chef:recipe>
chef:recipe > run_chef

At this point, you’ll see the mycookbook::default recipe run, producing the same output as you’d expect to see during a normal chef run, with the same sort of errors too.

Advanced Debugging

Trace Logging

If the standard output of a chef run doesn’t give you enough information, you can also turn on more verbose logging by using irb’s trace facility. This is enabled by running the following command:

chef > tracing on

Breakpoints

One of the most awesome features of Shef is the ability to add breakpoints to your recipes. This allows you to pause chef runs, and step forwards and backwards through the run between breakpoints.

The chef wiki goes into a lot of detail on breakpoints here, so I won’t repeat all of what it says, just give an outline.

To add a breakpoint to your recipe, simply add the following:

breakpoint "foo"

Breakpoints will be ignored during the course of a normal chef run (ie using chef-client), so don’t worry if you forget to remove one from your code. If you’re running using Shef, however, when the run hits the breakpoint, it will be paused.

You can now check the state of the system, make sure the recipe has done what you expected it to so far, and then assuming you’re happy to continue, run the following command:

chef:recipe > chef_run.resume

The opscode wiki page I linked above goes into more detail on actions like rewinding the chef run, and stepping through the run pausing at the next breakpoint, so if you’re still reading this post after all that, I’d recommend you have a look.

Creating your own Signed APT Repository and Debian Packages

We create a lot of our own debian packages at Aframe where I work, and until recently have been keeping them in a flat repository without properly signing any of our packages. However, since there’s a possibility some of those packages may be released publicly (since Opscode may not be providing debian lenny packages for Chef anymore), I decided that it was high time to properly organise the repository and sign all our packages plus the repository itself.

As anyone who has tried this will no doubt have found, there is a large amount of conflicting information out there as to how exactly this can be achieved. To ease the burdens of my fellow sysadmins, I thought I’d gather all the necessary info together into one easy post.

The first thing we’re going to need to do is to generate a GPG key to sign our repository and all of our packages with. If you’ve already done this, please skip this section. If not, you can follow these simple steps!

Creating a GPG Key for Signing Stuff

  • Run the following command: gpg --gen-key
  • Select Option 5 to generate an RSA Key
  • Choose your keysize from the options given (doesn’t really matter for our purposes
  • Enter your name and email address when prompted

There, we now have a GPG key we’re going to use to sign everything. The next stage is to produce a ASCII formatted public key file which we can distribute to people so they’ll be able to tell apt to trust our repository.

Creating a ASCII Formatted Public Key File

By default, the gpg utility’s export function produces a binary formatted output of our public key. If you’re planning to distribute your public key via the web, this isn’t very handy so we’re going to use the gpg utility’s --armor option to produce an ASCII formatted version.

You’ll want to substitute the email address of the key you’re exporting, and the output filename as appropriate in the following command.

  • gpg --armor --export jon@aframe.com --output jon@aframe.com.gpg.key

Save this keyfile somewhere, we’ll be making it available over the web for people to add to their apt keychains – this is what’ll say that our repository is trusted for package installs.

Signing Some .deb Packages

I’m going to assume that you already know how to create .deb packages, by way of keeping this blogpost short. This section will simply cover signing a package you’re creating and resigning an existing package.

The good news is that if you’ve already generated a GPG key as detailed above, your packages will be automatically signed as long as the name and email address in your package’s changelog file are the same as that of the GPG key you created. This means that simply running dpkg-buildpackage will now give you signed packages. If you’re unsure of how to build debian packages at all, there’s plenty of information out there on doing this. I might write a blog post on that soon 🙂

If you want to resign an existing debian package, for example if you’re setting up your own backport of a package (as with my usecase, backporting Chef 0.9.16 into debian), then this is very easy too if you already have a GPG key set up. We use a tool called dpkg-sig.

Install the dpkg-sig tool by running the command

    • apt-get install dpkg-sig
  • Then run the following command to sign the package, substituting the name of the deb package as appropriate.

    • dpkg-sig --sign builder mypackage_0.1.2_amd64.deb
  • The “builder” name mentioned is a debian convention, indicating that the builder of the package signed it. The GPG key used will be the one you set up above, providing you’re running the command as the same user you set up the key with.

    Creating a Signed APT Repository

    OK, so you’ve got a bunch of signed packages, but now you need some way to make them available to the world. Preferably one which doesn’t throw up authentication warnings every time they try to install from your repository. What you need is a signed repository!

    The first step to creating your repository is to create a base directory for it to live in. Doesn’t matter where, any directory will do. Now inside that directory, create a folder called conf.

    Inside your conf folder, you’re going to create a text file called distributions. Below is the one I’ve used for my repository:

    Origin: apt.aframe.com
    Label: apt repository
    Codename: lenny
    Architectures: amd64 source
    Components: main
    Description: Aframe debian package repo
    SignWith: yes
    Pull: lenny

    Some of the above will be self explanatory, for example if you’re not packaging for debian lenny, you’d replace all occurrences of lenny with squeeze, for example. Additionally, if you’re going to package for more than one distribution, you’ll want to copy the above section for each distro. It all stays in the same file, you just change the “lenny” parts as applicable.

    Also, since I’m only creating 64-bit packages, I’ve said that my repository will only contain the amd64 and source architectures. If you’re packaging for i386 or other architectures, remember to add them!

    The important line in the above example is the one that says SignWith: yes. That line states that we want to sign the repository with our GPG key. Again, if you’re running commands as the same user you used to create your GPG key, this should work fine.

    Now that we have the descriptions file all ready to go, we’re going to use a tool called reprepro to add our packages to the repository. You can install reprepro by running the following command:

    • apt-get install reprepro
  • The nice thing about this tool is that it will create all the structure it needs inside the repository automatically so you don’t need to worry about it. Here’s an example of how to add a package to our repository. PLEASE NOTE you have to run the below command from your repository directory, ie the directory containing the conf folder.

    • reprepro --ask-passphrase -Vb . includedeb lenny /home/joncowie/libjavascript-minifier-xs-perl_0.06-1_amd64.deb
  • So what does this command mean? Well, the --ask-passphrase part tells it to prompt you for the passphrase for your GPG Key. You did set one, right? The -Vb . includedeb lenny part tells the command to be verbose, and set the base directory for the command as the current directory. It then says we’re going to be passing the command a deb file (that’s the includedeb part) and then says that we’re going to be adding it to the lenny distribution in our repository. The last part is the absolute path to the .deb package we’re adding.

    When you run this command, you’ll see a bunch of output telling you that various folders have been created, and once you’ve entered the password for your GPG key your package will be tucked away nicely in your properly structured, properly signed debian repository. Running the above command on a further package or packages will add the new package(s) into the existing structure. You’ll probably now notice that your repository directory is structured much more like the official debian repositories – that’s because it’s now structured “The Debian Way”.

    Making your Repository Available to the World

    The final section in this blog post is how to make your wonderful new repository available to the rest of the world. This comes in two parts. The first is making your repository directory available over the web. I’m going to assume if you’re creating packages you can probably do this part by yourself, so I’m going to skip over it 😉

    The second part is making our public GPG key available to the world so they can mark our repository as trusted. I’d suggest keeping the public GPG key we created above in the root of your repository, to make it easy for people to find. Just make sure you only store the *public* part of the key there. That’s the part we created using the gpg --armor --export command.

    I’d also recommend publishing a simple command for people to download your public key and import it into their apt keychain in one step – this makes it nice and easy for them to get up and running with your repo. This command is as follows (change the URL to your key as appropriate):

    wget -O - http://apt.aframe.com/jon@aframe.com.gpg.key | sudo apt-key add -

    Once your users have run the above command, they can add your nicely-formatted repo to their /etc/apt/sources.list file by adding the following line (change URL and distribution as appropriate):

    deb http://apt.aframe.com/ lenny main

    Then they just run apt-get update and they’re all ready to use your repository containing all of your signed packages – and not an authentication warning in sight.

    Produce a members report for all your Mailman lists

    I recently had cause to produce a report on the membership of all our Mailman mailing lists, so rather than doing it manually I knocked together the following handy bash script…change mailman location and output file as desired 🙂


    OUTPUTFILE="/tmp/mailman_report"
    CURRMONTH=`date +%m-%Y`
    LISTS=`/usr/local/mailman/bin/list_lists | awk '{print $1}' | grep -v [!0-9]`
    rm ${OUTPUTFILE}
    echo "Mailman Report for ${CURRMONTH}" > ${OUTPUTFILE}
    echo >> ${OUTPUTFILE}
    for x in ${LISTS}
    do
    echo "Members of List ${x}:" >> ${OUTPUTFILE}
    LIST_MEMBERS=`/usr/local/mailman/bin/list_members ${x}`
    for mems in ${LIST_MEMBERS}
    do
    echo ${mems} >> ${OUTPUTFILE}
    done
    echo >> ${OUTPUTFILE}
    done
    /bin/mail -s "Mailman_Report_for_${CURRMONTH}" foo@foo.com -c blah@blah.com < ${OUTPUTFILE}

    Shared Network Storage with iSCSI and OCFS2

    So we got a bunch of new hardware at work recently to build a crunch farm for all our heavyweight data processing. Part of that system is two very beefy servers which share a SAN (This one for those interested) for the majority of their disk storage. The SAN uses iSCSI, which was fairly straightforward to set up (I’ll document it here anyway) so I got that all set up and then made a nice big ext3 partition for the servers to share. All so far so good, the servers were talking to the SAN, could see the partition, read and write to it etc. The only problem seemed to be that when one server changed a file, the other server wouldn’t pick up the change until the partition had been re-mounted. What I hadn’t accounted for was that ext3 doesn’t expect multiple machines to share the same block device so it wasn’t synching changes.

    I knew that filesystems designed for exactly this sort of sharing were available but hadn’t done much with them but after investigating for a bit, it seemed like the Oracle Clustered File System (linky) was the best option as it was already supported by the Linux kernel and was pretty mature code. The main problem I had in setting all of this up was that the available documentation was very much geared towards people who already had in-depth experience of OCFS whereas I’d never used it before. Hence this blog post, which details setting up iSCSI and then configuring both servers to talk to the same OCFS partition. The instructions are written for Ubuntu Server, but will work on any distro which used apt. Packages are also available for rpm distros, the only instructions you need to change are the package fetching ones.

    Setting up iSCSI

    * Install Open-iSCSI

    apt-get install open-iscsi

    * Edit the Open-iSCSI configuration file

    The default configuration file could be located at /etc/openiscsi/iscsid.conf or ~/.iscsid.conf. Open the file and set the parameters as required by your iSCSI device. I’ve included the (mostly default) options I used for reference

    node.startup = automatic node.session.timeo.replacement_timeout = 120 node.conn[0].timeo.login_timeout = 15 node.conn[0].timeo.logout_timeout = 15 node.conn[0].timeo.noop_out_interval = 10 node.conn[0].timeo.noop_out_timeout = 15 node.session.iscsi.InitialR2T = No node.session.iscsi.ImmediateData = Yes node.session.iscsi.FirstBurstLength = 262144 node.session.iscsi.MaxBurstLength = 16776192 node.conn[0].iscsi.MaxRecvDataSegmentLength = 65536

    * Save and close the file. Restart the open-iscsi service:

    /etc/init.d/open-iscsi restart

    Now you need to run a discovery against the iscsi target host which basically finds all the iSCSI targets the SAN can give us:

    iscsiadm -m discovery -t sendtargets -p ISCSI-SERVER-IP-ADDRESS

    Finally restart the service again:

    /etc/init.d/open-iscsi restart

    Now you should see an additional drive on the system such as /dev/sdc. Look in the /var/log/messages file to find out device name:

    Next, you need to use fdisk to create a blank partition on the device. This is pretty well documented so I’ll skip these steps, other than to say that I’ll assume the device was called /dev/sdc, and the new blank partition is called /dev/sdc1 for the remainder of this post. So now we’re talking to our iSCSI device and we’ve got a blank partition all ready to format as an OCFS drive. Next, how exactly we do that!

    To be continued…