You are here

Planet Linux Australia

Subscribe to Planet Linux Australia feed
Planet Linux Australia -
Updated: 57 min 55 sec ago

Sridhar Dhanapalan: Twitter posts: 2015-06-29 to 2015-07-05

Sun 05th Jul 2015 23:07
Categories: thinktime

Donna Benjamin: CCR at OSCON

Sun 05th Jul 2015 11:07
Sunday, July 5, 2015 - 11:11

I've given a "Constructive Conflict Resolution" talk twice now. First at DrupalCon Amsterdam, and again at DrupalCon Los Angeles. It's something I've been thinking about since joining the Drupal community working group a couple of years ago. I'm giving the talk again at OSCON in a couple of weeks. But this time, it will be different. Very different. Here's why.

After seeing tweets about Gina Likins keynote at ApacheCon earlier this year I reached out to her to ask if she'd be willing to collaborate with me about Conflict Resolution in open source, and ended up inviting her to co-present with me at OSCON. We've been working together over the past couple of weeks. It's been a joy, and a learning experience! I'm really excited about where the talk is heading now. If you're going to be at OSCON, please come along. If you're interested, please follow our tweets tagged #osconCCR.

Jen Krieger from interviewed Gina and I about our talk - here's the article: Teaching open source communities about conflict resolution

In the meantime, do you have stories of conflict in Open Source Communities to share?

  • How were they resolved?
  • Were they intractable?
  • Do the wounds still fester?
  • Was positive change an end result?
  • Do you have resources for dealing with conflict?

Tweet your thoughts to me @kattekrab

Categories: thinktime

Binh Nguyen: Python Decompilation, Max4Live Programming, Ableton Push Colour Calibration, Automated DJ'ing and More

Sat 04th Jul 2015 03:07
I was recently discussing with someone how Ableton programming/scripting works. This was particularly within the context of the Ableton Push device and possible hacking of other devices to allow for more sophisticated functionality. Apparently, many of the core scripts use Python. They need to be decompiled to allow you to have a proper look at them though. Obviously, some of the scripts are non-tricial and will require a sufficient understanding of both music as well as programming to be useful.

A decompilation of all files in the following directory,

C:\ProgramData\Ableton\Live9Suite\Resources\MIDI Remote Scripts\

is available here. The reason why I've done it is because others who have previously done it have removed it from their websites.

The decompilation was achieved using two small scripts which I created available here and use uncompyle2, at their core. Since the current code contains an error which doesn't allow for a successful RPM build I've had to make a small modification.

For those who want to know the uncompyle2 currently only works with Python 2.7. To get it running in a Debian based environment I had to change a symlink so that /usr/bin/python -> python2.7 as opposed to /usr/bin/python -> python2.6

To get the RPM build working I had to copy README.rst to README.Running 'python bdist_rpm' would give me an RPM package. Running 'alien' allows conversion of the RPM to a DEB package for easy installation on a Debian based platform.

Successful RPM and DEB packages are available from my website,

The following ZIP archive contains updated code, RPM, and DEB packages. following ZIP archive contains the decompiled code and scripts to automate decompilation of the Ableton code.

For those who are interested, Max4Live programming looks rather interesting for building devices and effects. It also looks like a perfect choice for those who may be on a limited budget and looking to extend Ableton's capabilities.

There have has been some grumbles regarding Ableton Push quality control (Novaton has sort of had similar problems with their Launchpad series but it hasn't been as obvious because most current models have only relied on a limited set of colours. Note to others this issue isn't actually covered by warranty either and it's a difficult problem to fix from a manufacturing perpsective. Hence, the need for this particular solution.) with regards to inconsitent colouring of LEDs. There was a small application that was created but wasn't publicly released. It's called, '' and basically allows for calibration of white on the device by altering internal colour balance of primary colours. It's available on some file sharing websites. You'll require firmware version 1.7 tor it to run.

Someone recently asked me about automated DJ options. I've seen a few but they seem to be becoming increasingly sophisticated.

How To DJ - Phil K (Intermediate Level)

Apparently, some of my ideas and perspectives regarding the modern world and capitalism are similar to that of Thomas Piketty. However, the way in which we would set about rebalancing global economics to ensure a more fair and just global economic system for all is somewhat different. More on this in time...

Some options for puchasing used music equipment locally.

In case you've ever wanted to download videos from various websites, there are quite a few options out there.

If you've had minor scratches on your optical discs you know that they can be extraordinarily frustrating. There are quite a few solutions out there for it though.

If you ever have to use automated imaging/partitioning software sometimes things don't turn out perfectly. Hidden partitions appear when they shouldn't wreaking havoc with links throughout your system. Changing the partition type is the solution though the actual 'type/code/number' may vary depending on the circumstances.

Options for locking down a device in case it is lost or stolen are increasingly popular nowadays even in consumer class devices. It's interesting how far, some companies are willing to take this and what their implementation is like.

Help evaluate, test, and design Windows 10.
Categories: thinktime

David Rowe: WTF Internal Combustion?

Fri 03rd Jul 2015 13:07

At the moment I’m teaching my son to drive in my Electric Car. Like my daughter before him it’s his first driving experience. Recently, he has started to drive his grandfathers pollution generator, which has a manual transmission. So I was trying to explain why the clutch is needed, and it occurred to me just how stupid internal combustion engines are.

Dad: So if you dump the clutch too early the engine stops.

Son: Why?

Dad: Well, a petrol engine needs a certain amount of energy to keep it running, for like compression for the next cycle. If you put too big a load on the engine, it doesn’t have enough power to move the car and keep the engine running.

Dad: Oh yeah and that involves a complex clutch that can be burnt out if you don’t use it right. Or an automatic transmission that requires a complex cooling system and means you use even more (irreplaceable) fossil fuel as it’s less efficient.

Dad: Oh, and petrol motors only work well in a very narrow range of RPM so we need complex gearboxes.

Dad thinks to himself: WTF internal combustion?

Electric motors aren’t like that. Mine works better at 0 RPM (more torque), not worse. When the car stops my electric motor stops. It’s got one moving part and one gear ratio. Why on earth would you keep using irreplaceable fossil fuels when stopped at the traffic lights? It just doesn’t make sense.

The reason of course is energy density. We need to store a couple of hundred km worth of energy in a reasonable amount of weight. Petrol has about 44 MJ/kg. Let see, one of my Lithium cells weighs 3.3kg, and is rated at 100AH at 3.2V. So thats (100AH)(3600 seconds/H)(3.2V)/(3kg)=0.386MJ/kg or about 100 times worse than petrol. However that’s not the whole story, an EV is about 85% efficient in converting that energy into movement while a dinosaur juice combuster is only about 15% efficient.

Anyhoo it’s now possible to make EVs with 500 km range (hello Tesla) so energy density has been nailed. The rest is a business problem, like establishing a market for smart phones. We’re quite good at solving business problems, as someone tends to get rich.

I mean, if we can make billions of internal combustion engines with 1000′s of moving parts, cooling systems, gearboxes, anti-pollution, fuel injection, engine management, controlled detonation of an explosive (they also make napalm out of petrol) and countless other ancillary systems I am sure human kind can make a usable battery!

Internal combustion is just a bad hack.

History is going to judge us as very stupid. We are chewing through every last drop of fossil fuel to keep driving to and from homes in the suburbs that we can’t afford, to buy stuff we don’t need, making plastic for gadgets we throw away, and flying 1000′s of km to exotic locations for holidays, and overheating the planet using our grandchildren’s legacy of hydrocarbons that took 75 million years to form.

Oh that’s right. It’s for the economy.

Categories: thinktime

Rusty Russell: Wrapper for running perf on part of a program.

Fri 03rd Jul 2015 13:07

Linux’s perf competes with early git for title of least-friendly Linux tool.  Because it’s tied to kernel versions, and the interfaces changes fairly randomly, you can never figure out how to use the version you need to use (hint: always use -g).

But when it works, it’s very useful.  Recently I wanted to figure out where bitcoind was spending its time processing a block; because I’m a cool kid, I didn’t use gprof, I used perf.  The problem is that I only want information on that part of bitcoind.  To start with, I put a sleep(30) and a big printf in the source, but that got old fast.

Thus, I wrote “perfme.c“.  Compile it (requires some trivial CCAN headers) and link perfme-start and perfme-stop to the binary.  By default it runs/stops perf record on its parent, but an optional pid arg can be used for other things (eg. if your program is calling it via system(), the shell will be the parent).

Categories: thinktime

Linux Users of Victoria (LUV) Announce: LUV Main July 2015 Meeting: Ansible / BTRFS / Educating People to become Linux Users

Thu 02nd Jul 2015 19:07
Start: Jul 7 2015 18:30 End: Jul 7 2015 20:30 Start: Jul 7 2015 18:30 End: Jul 7 2015 20:30 Location: 

200 Victoria St. Carlton VIC 3053



• Andrew Pam, An introduction to Ansible

• Russell Coker, BTRFS update

• Lev Lafayette, Educating People to become Linux Users: Some Key Insights from Adult Education

200 Victoria St. Carlton VIC 3053 (formerly the EPA building)

Before and/or after each meeting those who are interested are welcome to join other members for dinner. We are open to suggestions for a good place to eat near our venue. Maria's on Peel Street in North Melbourne is currently the most popular place to eat after meetings.

LUV would like to acknowledge Red Hat for their help in obtaining the venue and VPAC for hosting.

Linux Users of Victoria Inc. is an incorporated association, registration number A0040056C.

July 7, 2015 - 18:30

read more

Categories: thinktime

Donna Benjamin: Certification: Necessary Evil?

Thu 02nd Jul 2015 14:07
Thursday, July 2, 2015 - 14:14

I wrote this as a comment in response to Dries' post about the Acquia certification program - I thought I'd share it here too. I've commented there before.

I've also been conflicted about certifications. I still am. And this is because I fully appreciate the pros and cons. The more I've followed the issue, the more conflicted I've become about it.

My current stand, is this. Certifications are a necessary evil. Let me say a little on why that is.

I know many in the Drupal community are not in favour of certification, mostly because it can't possibly adequately validate their experience.

It also feels like an insult to be expected to submit to external assessment after years of service contributing to the code-base, and to the broader landscape of documentation, training, and professional service delivery.

Those in the know, know how to evaluate a fellow Drupalist. We know what to look for, and more importantly where to look. We know how to decode the secret signs. We can mutter the right incantations. We can ask people smart questions that uncover their deeper knowledge, and reveal their relevant experience.

That's our massive head start. Or privilege. 

Drupal is now a mature platform for web and digital communications. The new challenge that comes with that maturity, is that non-Drupalists are using Drupal. And non specialists are tasked with ensuring sites are built by competent people. These people don't have time to learn what we know. The best way we can help them, is to support some form of certification.

But there's a flip side. We've all laughed at the learning curve cartoon about Drupal. Because it's true. It is hard. And many people don't know where to start. Whilst a certification isn't going to solve this completely, it will help to solve it, because it begins to codify the knowledge many of us take for granted.

Once that knowledge is codified, it can be studied. Formally in classes, or informally through self-directed exploration and discovery.

It's a starting point.

I empathise with the nay-sayers. I really do. I feel it too. But on balance, I think we have to do this. But even more, I hope we can embrace it with more enthusiasm.

I really wish the Drupal Association had the resources to run and champion the certification system, but the truth is, as Dries outlines above, it's a very time-consuming and expensive proposition to do this work.

So, Acquia - you have my deep, albeit somewhat reluctant, gratitude!


Thanks Dries - great post.



(Drupal Association board member)

Categories: thinktime

James Purser: Tell your MP you support Same Sex Marriage

Thu 02nd Jul 2015 10:07

If you support the right for two people to get married regardless of gender, then please respectfully and politely contact your local federal member and let them know.

Those who oppose this have already started up their very effective networks, and we will need to work very hard to counter it.

If you're not sure who your local MP or Senators are, I recommend you use to find out. Just punch in your post code and it will let you know, as well as give you a run down of their voting history.

Do it, DO IT NOW!

This message brought to you by the realisation that I'm going to be rainbow haired soon.

Blog Catagories: Politicssame sex marriage
Categories: thinktime

Michael Still: Hunting for GC1D1NB

Thu 02nd Jul 2015 09:07
I went for an after work walk to try and find GC1D1NB on Tuggeranong Hill yesterday. It wasn't a great success. I was in the right area but I just couldn't find it. Eventually I ran out of time and had to turn back. I am sure I'll have another attempt at this one soon.


Interactive map for this route.

Tags for this post: blog pictures 20150701-tuggeranong_hill photo canberra bushwalk

Related posts: Goodwin trig; Big Monks; Geocaching; Confessions of a middle aged orienteering marker; A quick walk through Curtin; Narrabundah trig and 16 geocaches

Categories: thinktime

David Rowe: FreeDV Robustness Part 5 – FreeDV 700

Wed 01st Jul 2015 15:07

We’ve just released FreeDV v0.98 GUI software, which includes the new FreeDV 700 mode. This new mode has poorer speech quality than FreeDV 1600 but is far more robust, close to SSB on low SNR fading HF channels. Mel Whitten and the test team have made contacts over 1000 km using just 1 Watt!

You can download the Windows version of FreeDV 0.98 here.

To build it you need the latest codec2-dev and fdmdv2-dev from SVN, follow the Quickstart 1 instructions in fdmdv-dev/README.txt. I’ve been cross compiling for Windows on my Ubuntu Linux machine which is a time saver for me. Thanks Richard Shaw for your help with the cmake build system.

Mel and the team have been testing the software for the past few weeks and we’ve removed most of the small UI bugs. Thanks guys! I’m working on some further improvements to the robustness which I will release in a few weeks. Once we are happy with the FreeDV 700 mode, it will be ported to the SM1000. If you have time, and gcc/embedded experience I’d love to have some help with this!

It’s sounds pretty bad at 700 bit/s but so does SSB at 0dB SNR. The new modem uses a pilot symbol assisted coherent PSK modem (FreeDV 1600 uses a differential PSK modem). The new modem also has diversity; the 7 x 75 symb/s QPSK carriers are copied to form a total of 14 half power carriers. Overall this gives us significantly lower operating point SNR than FreeDV 1600 for fading channels. However the bandwidth is a little wider (800 – 2400 Hz), lets see how that goes through real radios.

Simulations indicate it has readability 4/5 at 0dB SNR on CCIR poor (fast) fading channels. It also has a PAPR of 7dB so if your PA can handle it you can hammer out 5dB more power than FreeDV 1600 (be careful).

For those of you who are integrating FreeDV into your own applications the FreeDV API now contains the 700 bit/s mode and freedv_tx and freedv_rx have been updated to demo it. The API interface has changed, we now have variables for the number of modem and speech samples which change with the mode. The coherent PSK modem has the very strange sample rate of 7500 Hz which at this stage the user (that’s you) has to deal with (libresample is your friend).

The 700 bit/s codec (actually 650 bit/s plus 2 data bits/frame) band limits the input speech between 600 and 2200 Hz to reduce the amount of information we need to encode. This might be something we can tweak, however Mel and the team have shown we can communicate OK using this mode. Here are some samples at 1300 (the codec rate used in FreeDV 1600) and 700 bit/s with no errors for comparison.

Lots more to talk about. I’ll blog some more when I pause and take a breath.

Categories: thinktime

Donna Benjamin: Comparing D7 and D8 outta the box

Wed 01st Jul 2015 12:07
Wednesday, July 1, 2015 - 11:53

I did another video the other day. This time I've got a D7 and D8 install open side by side, and compare the process of adding an article.

Categories: thinktime

Linux Australia News: Linux Australia council meeting minutes to be published on the planet

Wed 01st Jul 2015 12:07
Wed, 2015-07-01 11:33

Last fortnight the Linux Australia council resolved to begin publishing their minutes to

While meeting minutes may seem boring, they in fact contain a lot of useful and interesting information about what the organisation and its various subcommittees are up to. As such we felt that this was useful information to publish wider and starting from now we'll be publishing them to the planet.

If you are interested in previous meetings and minute notes, you can find them at

Categories: thinktime

David Rowe: New Charger for my EV

Wed 01st Jul 2015 09:07

On Sunday morning I returned home and plugged in my trusty EV to feed it some electrons. Hmm, something is wrong. No lights on one of the chargers. Oh, and the charger circuit breaker in the car has popped. Always out for adventure, and being totally incompetent at anything above 5V and 1 Amp, I connected it directly to the mains. The shed lights started to waver ominously. Humming sounds like a Mary Shelley novel. And still no lights on the charger.

Oh Oh. Since disposing of my nasty carbon burner a few years ago I only have one car and it’s the EV. So I needed a way to get on the road quickly.

But luck was with me. I scoured my local EV association web site, and found a 2nd hand Zivan NG3 charger, that was configured for a 120V lead acid pack. I have a 36 cell Lithium pack that is around 120V when charged. Different batteries have different charging profiles, for example the way current tapers. However all I really need is a bulk current source, my external Battery Management System will shut down the charger when the cells are charged.

Using some residual charge I EVed down the road where I met Richard, a nice man, fellow engineer, and member of our local EV association. I arranged to buy his surplus NG3, took it home and fired it up. Away it went, fairly hosing electrons into my EV at 20A. The old charger was just 10A so this is a bonus – my charging time will be halved. I started popping breakers again, as I was sucking 2.4kW out of the AC. So I re-arranged a few AC wires, ripped out the older chargers, rewired the BMS module loop a little and away I went with the new charger.

Here is the lash up for the initial test. The new Zivan NG3 is the black box on the left, the dud charger the yellow box on the right. The NG3 replaces the 96V dud charger and two 12V chargers (all wired in series) that I needed to charge the entire pack. My current clamp meter (so useful!) is reading 17A.

Old chargers removed and looking a bit neater. I still need to secure the NG3 somehow. My BMS controller is the black box behind the NG3. It shuts down the AC power to the chargers when the batteries signal they are full.

Pretty red lights in the early morning. Each Lithium cell has a BMS module across it, that monitors the cell voltage, The red light means “just about full”. When the first cell hits 4.1V, it signals the BMS controller to shut down the charger. Richard pointed out that the BMS modules are shunt regulators, so will discharge each cell back down to about 3.6V, ensuring they are all at about the same state of charge.

This is the only reason I go to petrol stations. For air. There is so little servicing on EVs that I forget to check the air for a year, some tyres were a bit low.

The old charger lasted 7 years and was used almost every day (say 2000 times) so I can’t complain. The NG3 was $875 2nd hand. Since converting to the Lithium pack in 2009 I have replaced the electric motor armature (about $900) as I blew it up from overheating, 2 cells ($150 ea) as we over discharged them, a DC-DC converter ($200 ish) and now this charger. Also tyres and brakes last year, which are the only wearing mechanical parts left. In that time I’ve done 45,000 electric km.

Categories: thinktime

Michael Still: Percival trig

Tue 30th Jun 2015 21:06
I had a pretty bad day, so I knocked off early and went for a walk before going off to the meeting at a charity I help out with. The walk was to Percival trig, which I have to say was one of the more boring trigs I've been to. Some of the forest nearly was nice enough, but the trig itself is stranded out in boring grasslands. Meh.


Interactive map for this route.

Tags for this post: blog pictures 20150630-percival photo canberra bushwalk trig_point

Related posts: Goodwin trig; Big Monks; Narrabundah trig and 16 geocaches; Cooleman and Arawang Trigs; One Tree and Painter; A walk around Mount Stranger

Categories: thinktime

Michael Still: A team walk around Red Hill

Tue 30th Jun 2015 09:06
My team at work is trying to get a bit more active, so a contingent from the Canberra portion of the team went for a walk around Red Hill. I managed to sneak in a side trip to Davidson trig, but it was cheating because it was from the car park at the top of the hill. A nice walk, with some cool geocaches along the way.


Interactive map for this route.

Tags for this post: blog pictures 20150629-davidson photo canberra bushwalk trig_point

Related posts: Goodwin trig; Big Monks; Narrabundah trig and 16 geocaches; Cooleman and Arawang Trigs; One Tree and Painter; A walk around Mount Stranger

Categories: thinktime

Binh Nguyen: The Value of Money - Part 4

Tue 30th Jun 2015 01:06
- I previously remarked that since we use the concept of 'deterrence' so readily throughout the world we are in a de-facto state of 'Cold War' whose weapons are defense, intelligence, and economics. There's a lot of interesting information out there... 

- it makes sense that companies try to run lean rather than try to create. Everybody knows how to save. It's much more difficult to create something of value

- advertising is a broadcast means of achieving increased transactions but in spite of targeted advertising it is still incredibly inefficient. Based on previous experience even single digit click through rates for online advertising is considered suspect/possibly fraudulent

- the easiest way of estabishing the difference between what's needed and what's wanted is to turn off all advertising around you. Once you've done that, the difference between need and want becomes very strange and the efficacy of advertising on your perspective becomes much, much clearer

- most businesses fail. A lot of people basically have trouble running a business, have flawed business models, or don't achieve enough transactions to make it worthwhile

- immigration is a good thing provided that the people in question bring something to the economy. I look at the Japanese situation and wonder whether or not immigration is a more cost effective means of dealing with their ageing problem than 'Abenomics'. Even if all they do is re-patriate former nationals...

- if you run through their numbers carefully, and think about where many of the world's top companies are headed, the performance (net profit in particular) of some of them aren't any where near impressive (percentage wise) as the share price growth in recent history. There are many small/mid cap firms that would out do them (% net profit wise) if you're looking to invest

- in software engineering people continually harp on about the benefits of Agile, Extreme programming and so on. Basically, all it is maintaining regular contact between staff members to get the best out of a piece of work. Peer pressure and continual oversight also forces you to remain productive. Think about this in the real world. The larger the teams are the more difficult it is to maintain oversight particuarly if the manager in question is of a poor standard and there are no systems in place to maintain standards. There is also a problem with unfettered belief in this metholodgy. If in general, the team members are unproductive or of a poor standard this will ripple throughout your team

- GDP is a horrible measure of productivity. As I've stated previously, the difference between perceived, effective, and actual value basically diguises where true value lies. Go spend some time in other parts of the world. I guarantee that there will be a massive difference in the way you view productivity (productivity means amount of work completed per unit time not overall work)

- a good measure of a person's productivity/value is what happens if they take a day off or a have a break. Observe, the increase in workload for each other staff member and how they deal with it

- people keep on harping on about self interest as the best way of maintaining productivity and encouraging people to work hard. However, I have a huge problem with this as it is incredibly hard to differentiate between actual, effective, and perceived value sometimes. At one particular firm, we had difficulties with this as well. I was therefore tasked with writing an application to monitor things (if you intend to write something along these lines please be mindful relevant HR and Surveillance laws in your jurisdiction. Also, keep the program 'silent'. Staff will likely alter their behaviour if they know that the program is running.). The funny thing is that even people you think are productive tend to work in bursts. The main difference is the amount of time that trasnpires between each piece of work and the rate of work that occurs during each burst. The other thing that you should know is that even with senior members of staff when you look at a lot of metrics it can be extremely difficult to justify their wage. Prepare to be surprised if you currently have poor oversight in your organisation. Lack of proper oversight breeds nepotism, lack of productivity, etc...

- you'll be shocked at what poor staff can do to your team. If the members in question is particularly bad he in effect takes a number of other staff out of the equation at the same time. Think about this. You all are recruited for highly skilled jobs but one team member is poor. If he continually has to rely on other staff then he in effects takes out another member of your team simultaneously (possibly more). Think about this when training new staff. Give them enough time/training to get a guage of what they'll be like but if they can't hold up their part of the deal be prepared to move them elsewhere within the organisation or let go of them. The same is also true in the opposite direction. Good employees have a multiplier effect. You'll only figure out the difference with proper oversight and monitoring. Without this, perceived value may completely throw you off - we like to focus in on large companies because they supposedly bring in a lot of business. The problem is if they have a monopoly. If they strangle the market of all value and don't put back in via taxes, employment, etc... the state in question could be in a lot of trouble down the line. If/when the company moves the economy would have evolved to see these companies as being a core component. Other surrouding will likely be poorly positioned to adapt when they leave for a place which offers better terms and/or conditions. The other problem is this, based on experience people are willing to except a lower wage to work for such firms (mostly for reasons of financial safety). There is no guarantee that you will be paid what you are worth 

- when and if a large company collapses or moves the problem is the number of others who rely on it for business

- people keep on saying that there are safe industries from off shoring and automation. I think they're naive or haven't spent enough time around good technologists. Good employees will try to automate or develop processes to get things done more efficiently. Virtually all industries (or vast chunks of them) can be automated fully given time (trust me on this. I like to read a lot...).

Only way to keep yourself safe is to be multi-skilled and entrepreneurial or else extremely skilled at a particular profession. Even then there's no guarantee that you'll be safe

- sometimes I think people just don't get it. A small number of outliers is all it takes in order to change group behaviour. Even if we ban regulate/automation there will be those who adopt it without any misgivings much like organised crime, and use of illegal migrants, cash economy, etc... Only real way is to force a cashless society so that we can run algorithms to check for unusual behaviour and breed a more puritan society

- minimal but effective regulation helps to level out the playing field. Making it too complex creates possible avenues for loopholes to be exploited. Too simple and without enough coverage and you have the same problem

- obvious ways to make sustained, long term money include creating something that others need or want, else have the ability to be able to change perception, to be able to see changes and adapt, arbitrage, and using a broadcast structure

- personal experience and history of others with emerging markets such as Asia and Africa says that results can be extremely variable. Without on the ground knowledge and oversight you can just as easily make a substantial profit as a massive loss through fraud. There is very little you can do about this about from taking due diligence and having measures/knowledge to be able to deal with it should it actually occur

- in reality, very few have a genuine chance of making it 'big', "Americans raised at the top and bottom of the income ladder are likely to remain there themselves as adults. Forty-three percent of those who start in the bottom are stuck there as adults, and 70 percent remain below the middle quintile. Only 4 percent of adults raised in the bottom make it all the way to the top, showing that the "rags-to-riches" story is more often found in Hollywood than in reality."

- use first mover advantage as quickly as you can but have defensive measures in place

- investment from third parties (angel investment, venture capital, etc...) can vary drastically. More and more want a guaranteed return on investment at least though

- based on what I've experienced VC is much more difficult to get locally than in Europe or the United States. Luckily, more companies are willing to invest provided you are posting good numbers. One other thing I've discovered locally is that they are too lazy/unwilling to help even if the idea/s may be good though (though this is changing)

- we don't want to live day by day or have creditors/shareholders to report to so seek the highest profit whenever possible

- you can select a lot of numbers and prove essentially anything in business but their are certain numbers that you simply can't ignore such as net profit/income

- pay a person with cash by the hour where he has to do the numbers versus lump sump and he will look at things very differently. That goes for any profession, even high earning ones

- growth is great but only if it can be sustained and it is genuine. If you have susbtantial variation in growth such as having a few fantastic years of growth and then a sudden drop off that is fed by massive debt you could be in a bit of trouble. You may say that you can just sell off assets. If the growth wasn't good enough then do you see a problem? Moreover, what if you don't have something that it considered worthwhile or easy to sell off? For a state/business, your credit risk suddenly shoots up and you may possibly be priced out of the market. Targeted, sustainable, growth should be the target not growth at all costs. The Chinese position towards economic management is actually making a lot more sense to me now though I'm not certain that it would work quite as easily or be accepted in other states. You may say that we'll invest during good times? The problem is that we're often not wise enough to know when and where to invest

- in many places you are seeing a rise of left wing parties. The worrying thing is that they'll lose sight of the benefits of capitalism and fall into the trap of a more puritan communism/socialist system which hasn't really worked over the long term in the past. The other thing to be concerned about is that a lot of them don't have solid policies or answers to the problems which currently face us

- if more people could distinguish real value from perceived and effective value, needs and wants, we would have less assetts bubbles and price gouging across the board

- there will be those who say who cares about the collective. Capitalism is composed of boom and bust cycles. Here's the problem. Most companies require debt to survive. If they can't survive that bust cycle they will be part of a collective collapse in the economy. Moreover, based on information I've come across other developed countries have looked at the plans for the Eurozone and the ways of dealing with high debt and are basically using that as the blueprint for the future. Your assets can and will be raided in the event of the state or systemic entities getting into trouble 

- people say that we should get educated in order to have a high paying job but the problem is that we are increasingly siloed into specific roles. If we can't use the knowledge, the time and money we've spent on education has been for nothing. We require better organisation between educational curriculums and professional settings

- even if governments are aware that there are problems that are cropping up with our version of capitalism, it's possible that there are those that may be saying that we have no choice but to keep the cycle going. It's the best of the worst

- globlisation essentially buys us more time before things come to a head (if they do). Most of the sceanarios point to organised debt forgiveness as a means of dealing with the problem. Private asset seizure is something that is being metioned everywhere. Raw commodities stored at secure locations may be your only source of safety if things look bad if you are a private citizen 

- if you want a resilient economy you need maintain a level playing field, flexible workforce, and possibly limit the size and influence of major companies in your economy

- I don't get it. Heaps of countries have adequate blocking technology to be help deal with this if they deem it illegal. Deploy it correctly and your rioting problem is over with...

- as stated previously, I've come to the conclusion that a lot of financial instruments are useless. They effectively provide a means of making money under any conditions. If we remove these instruments from play then I think that it may be possible that we may return to less speculative markets that depend more on fundamentals

- anyone can create something of value. The issue is whether it is negligible versus tangible value. This will also determine your business model

- you may know that ther is a bubble but as China and local experiences have demonstrated popping it gracefully is far from easy. Moreover, by the time you figure out there's a bubble it may often too late. Too many people may have too many vested interests 

- theory helps but you won't figure out how market economies work without first hand experience
Categories: thinktime

Craige McWhirter: Craige McWhirter: How To Delete a Cinder Snapshot with a Status of error or error_deleting With Ceph Block Storage

Mon 29th Jun 2015 18:06

When deleting a volume snapshot in OpenStack you may sometimes get an error message stating that Cinder was unable to delete the snapshot.

There are a number of reasons why a snapshot may be reported by Ceph as unable to be deleted, however the most common reason in my experience has been that a Cinder client connection has not yet been closed, possibly because a client crashed.

If you were to look at the snapshots in Cinder, the status is usually error or error_deleting:

% cinder snapshot-list +--------------------------------------+--------------------------------------+----------------+------------------------------------------------------------------+------+ | ID | Volume ID | Status | Display Name | Size | +--------------------------------------+--------------------------------------+----------------+------------------------------------------------------------------+------+ | 07d75992-bf3f-4c9c-ab4e-efccdfc2fe02 | 3004d6e9-7934-4c95-b3ee-35a69f236e46 | error | snappy:3004d6e9-7934-4c95-b3ee-35a69f236e46:2015-06-26T14:00:02Z | 40 | | 2db84ec7-6e1a-41f8-9dc9-1dc14e6ecef0 | 3004d6e9-7934-4c95-b3ee-35a69f236e46 | error_deleting | snappy:3004d6e9-7934-4c95-b3ee-35a69f236e46:2015-05-18T00:00:01Z | 40 | | 47fbbfe8-643c-4711-a066-36f247632339 | 3004d6e9-7934-4c95-b3ee-35a69f236e46 | available | snappy:3004d6e9-7934-4c95-b3ee-35a69f236e46:2015-06-29T03:00:14Z | 40 | | 52c43ec8-e713-4f87-b329-3c681a3d31f2 | 3004d6e9-7934-4c95-b3ee-35a69f236e46 | error_deleting | snappy:3004d6e9-7934-4c95-b3ee-35a69f236e46:2015-06-24T14:00:02Z | 40 | | a595180f-d5c5-4c4b-a18c-ca56561f36cc | 3004d6e9-7934-4c95-b3ee-35a69f236e46 | error | snappy:3004d6e9-7934-4c95-b3ee-35a69f236e46:2015-06-25T14:00:02Z | 40 | +--------------------------------------+--------------------------------------+----------------+------------------------------------------------------------------+------+

When you check Ceph you may find the following snapshot list:

# rbd snap ls my.pool.cinder.block/volume-3004d6e9-7934-4c95-b3ee-35a69f236e46 SNAPID NAME SIZE 2069 snapshot-2db84ec7-6e1a-41f8-9dc9-1dc14e6ecef0 40960 MB 2526 snapshot-52c43ec8-e713-4f87-b329-3c681a3d31f2 40960 MB 2558 snapshot-47fbbfe8-643c-4711-a066-36f247632339 40960 MB

The astute will notice that there are only 3 snapshots listed in Ceph yet 5 listed in Cinder. We can immediately exclude 47fbbfe8 which is available in both Cinder and Ceph, so there's no issues there.

You will also notice that the snapshots with the status error are not in Ceph and the two with error_deleting are. My take on this is that for the status error, Cinder never received the message from Ceph stating that this had been deleted successfully. Whereas for the status error_deleting status, Cinder had been unsuccessful in offloading the request to Ceph.

Each status will need to be handled separately , I'm going to start with the error_deleting snapshots, which are still present in both Cinder and Ceph.

In MariaDB, set the status from error_deleting to available:

MariaDB [cinder]> update snapshots set status='available' where id = '2db84ec7-6e1a-41f8-9dc9-1dc14e6ecef0'; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 MariaDB [cinder]> update snapshots set status='available' where id = '52c43ec8-e713-4f87-b329-3c681a3d31f2'; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0

Check in Cinder that the status of these snapshots has been updated successfully:

% cinder snapshot-list +--------------------------------------+--------------------------------------+----------------+------------------------------------------------------------------+------+ | ID | Volume ID | Status | Display Name | Size | +--------------------------------------+--------------------------------------+----------------+------------------------------------------------------------------+------+ | 07d75992-bf3f-4c9c-ab4e-efccdfc2fe02 | 3004d6e9-7934-4c95-b3ee-35a69f236e46 | error | tuttle:3004d6e9-7934-4c95-b3ee-35a69f236e46:2015-06-26T14:00:02Z | 40 | | 2db84ec7-6e1a-41f8-9dc9-1dc14e6ecef0 | 3004d6e9-7934-4c95-b3ee-35a69f236e46 | available | tuttle:3004d6e9-7934-4c95-b3ee-35a69f236e46:2015-05-18T00:00:01Z | 40 | | 47fbbfe8-643c-4711-a066-36f247632339 | 3004d6e9-7934-4c95-b3ee-35a69f236e46 | available | tuttle:3004d6e9-7934-4c95-b3ee-35a69f236e46:2015-06-29T03:00:14Z | 40 | | 52c43ec8-e713-4f87-b329-3c681a3d31f2 | 3004d6e9-7934-4c95-b3ee-35a69f236e46 | available | tuttle:3004d6e9-7934-4c95-b3ee-35a69f236e46:2015-06-24T14:00:02Z | 40 | | a595180f-d5c5-4c4b-a18c-ca56561f36cc | 3004d6e9-7934-4c95-b3ee-35a69f236e46 | error | tuttle:3004d6e9-7934-4c95-b3ee-35a69f236e46:2015-06-25T14:00:02Z | 40 | +--------------------------------------+--------------------------------------+----------------+------------------------------------------------------------------+------+

Delete the newly available snapshots from Cinder:

% cinder snapshot-delete 2db84ec7-6e1a-41f8-9dc9-1dc14e6ecef0 % cinder snapshot-delete 52c43ec8-e713-4f87-b329-3c681a3d31f2

Then check the results in Cinder and Ceph:

% cinder snapshot-list +--------------------------------------+--------------------------------------+----------------+------------------------------------------------------------------+------+ | ID | Volume ID | Status | Display Name | Size | +--------------------------------------+--------------------------------------+----------------+------------------------------------------------------------------+------+ | 07d75992-bf3f-4c9c-ab4e-efccdfc2fe02 | 3004d6e9-7934-4c95-b3ee-35a69f236e46 | error | tuttle:3004d6e9-7934-4c95-b3ee-35a69f236e46:2015-06-26T14:00:02Z | 40 | | 47fbbfe8-643c-4711-a066-36f247632339 | 3004d6e9-7934-4c95-b3ee-35a69f236e46 | available | tuttle:3004d6e9-7934-4c95-b3ee-35a69f236e46:2015-06-29T03:00:14Z | 40 | | a595180f-d5c5-4c4b-a18c-ca56561f36cc | 3004d6e9-7934-4c95-b3ee-35a69f236e46 | error | tuttle:3004d6e9-7934-4c95-b3ee-35a69f236e46:2015-06-25T14:00:02Z | 40 | +--------------------------------------+--------------------------------------+----------------+------------------------------------------------------------------+------+ # rbd snap ls my.pool.cinder.block/volume-3004d6e9-7934-4c95-b3ee-35a69f236e46 SNAPID NAME SIZE 2558 snapshot-47fbbfe8-643c-4711-a066-36f247632339 40960 MB

So we are done with Ceph now, as the error snapshots do not exist there. As they only exist in Cinder, we need to mark them as deleted in the Cinder database:

MariaDB [cinder]> update snapshots set status='deleted', deleted='1' where id = '07d75992-bf3f-4c9c-ab4e-efccdfc2fe02'; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 MariaDB [cinder]> update snapshots set status='deleted', deleted='1' where id = 'a595180f-d5c5-4c4b-a18c-ca56561f36cc'; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0

Now check the status in Cinder:

% cinder snapshot-list +--------------------------------------+--------------------------------------+-----------+------------------------------------------------------------------+------+ | ID | Volume ID | Status | Display Name | Size | +--------------------------------------+--------------------------------------+-----------+------------------------------------------------------------------+------+ | 47fbbfe8-643c-4711-a066-36f247632339 | 3004d6e9-7934-4c95-b3ee-35a69f236e46 | available | tuttle:3004d6e9-7934-4c95-b3ee-35a69f236e46:2015-06-29T03:00:14Z | 40 | +--------------------------------------+--------------------------------------+-----------+------------------------------------------------------------------+------+

Now your errant Cinder snapshots have been removed.

Enjoy :-)

Categories: thinktime

Sridhar Dhanapalan: Twitter posts: 2015-06-22 to 2015-06-28

Mon 29th Jun 2015 00:06
Categories: thinktime

Russell Coker: RAID Pain

Sun 28th Jun 2015 21:06

One of my clients has a NAS device. Last week they tried to do what should have been a routine RAID operation, they added a new larger disk as a hot-spare and told the RAID array to replace one of the active disks with the hot-spare. The aim was to replace the disks one at a time to grow the array. But one of the other disks had an error during the rebuild and things fell apart.

I was called in after the NAS had been rebooted when it was refusing to recognise the RAID. The first thing that occurred to me is that maybe RAID-5 isn’t a good choice for the RAID. While it’s theoretically possible for a RAID rebuild to not fail in such a situation (the data that couldn’t be read from the disk with an error could have been regenerated from the disk that was being replaced) it seems that the RAID implementation in question couldn’t do it. As the NAS is running Linux I presume that at least older versions of Linux have the same problem. Of course if you have a RAID array that has 7 disks running RAID-6 with a hot-spare then you only get the capacity of 4 disks. But RAID-6 with no hot-spare should be at least as reliable as RAID-5 with a hot-spare.

Whenever you recover from disk problems the first thing you want to do is to make a read-only copy of the data. Then you can’t make things worse. This is a problem when you are dealing with 7 disks, fortunately they were only 3TB disks and only each had 2TB in use. So I found some space on a ZFS pool and bought a few 6TB disks which I formatted as BTRFS filesystems. For this task I only wanted filesystems that support snapshots so I could work on snapshots not on the original copy.

I expect that at some future time I will be called in when an array of 6+ disks of the largest available size fails. This will be a more difficult problem to solve as I don’t own any system that can handle so many disks.

I copied a few of the disks to a ZFS filesystem on a Dell PowerEdge T110 running kernel 3.2.68. Unfortunately that system seems to have a problem with USB, when copying from 4 disks at once each disk was reading about 10MB/s and when copying from 3 disks each disk was reading about 13MB/s. It seems that the system has an aggregate USB bandwidth of 40MB/s – slightly greater than USB 2.0 speed. This made the process take longer than expected.

One of the disks had a read error, this was presumably the cause of the original RAID failure. dd has the option conv=noerror to make it continue after a read error. This initially seemed good but the resulting file was smaller than the source partition. It seems that conv=noerror doesn’t seek the output file to maintain input and output alignment. If I had a hard drive filled with plain ASCII that MIGHT even be useful, but for a filesystem image it’s worse than useless. The only option was to repeatedly run dd with matching skip and seek options incrementing by 1K until it had passed the section with errors.

for n in /dev/loop[0-6] ; do echo $n ; mdadm –examine -v -v –scan $n|grep Events ; done

Once I had all the images I had to assemble them. The Linux Software RAID didn’t like the array because not all the devices had the same event count. The way Linux Software RAID (and probably most RAID implementations) work is that each member of the array has an event counter that is incremented when disks are added, removed, and when data is written. If there is an error then after a reboot only disks with matching event counts will be used. The above command shows the Events count for all the disks.

Fortunately different event numbers aren’t going to stop us. After assembling the array (which failed to run) I ran “mdadm -R /dev/md1” which kicked some members out. I then added them back manually and forced the array to run. Unfortunately attempts to write to the array failed (presumably due to mismatched event counts).

Now my next problem is that I can make a 10TB degraded RAID-5 array which is read-only but I can’t mount the XFS filesystem because XFS wants to replay the journal. So my next step is to buy another 2*6TB disks to make a RAID-0 array to contain an image of that XFS filesystem.

Finally backups are a really good thing…

Related posts:

  1. RAID and Bus Bandwidth As correctly pointed out by cmot [1] my previous post...
  2. Some RAID Issues I just read an interesting paper titled An Analysis of...
  3. Reliability of RAID ZDNet has an insightful article by Robin Harris predicting the...
Categories: thinktime

Joshua Hesketh: adventures

Sat 27th Jun 2015 15:06

Over the past few months I started to notice occasional issues when cloning repositories (particularly nova) from

It would fail with something like

git clone -vvv git:// . fatal: The remote end hung up unexpectedly fatal: early EOF fatal: index-pack failed

The problem would occur sporadically during our 3rd party CI runs causing them to fail. Initially these went somewhat ignored as rechecks on the jobs would succeed and the world would be shiny again. However, as they became more prominent the issue needed to be addressed.

When a patch merges in gerrit it is replicated out to 5 different cgit backends (git0[1-5] These are then balanced by two HAProxy frontends which are on a simple DNS round-robin.

+-------------------+ | | | (DNS Lookup) | +--+-------------+--+ | | +--------+ +--------+ | A records | +-------------------v----+ +-----v------------------+ | | | | | (HAProxy frontend) | | (HAProxy frontend) | +-----------+------------+ +------------+-----------+ | | +-----+ +---+ | | +-----v------------------------------------v-----+ | +---------------------+ (source algorithm) | | | | | | | +---------------------+ | | +---| | | | | +---------------------+ | | +---| | | | | +---------------------+ | | +---| | | | | +---------------------+ | | +---| | | | | (HAProxy backend) | | | +---------------------+ | +------------------------------------------------+

Reproducing the problem was difficult. At first I was unable to reproduce locally, or even on an isolated turbo-hipster run. Since the problem appeared to be specific to our 3rd party tests (little evidence of it in 1st party runs) I started by adding extra debugging output to git.

We were originally cloning repositories via the git:// protocol. The debugging information was unfortunately limited and provided no useful diagnosis. Switching to https allowed for more CURL output (when using GIT_CURL_VERBVOSE=1 and GIT_TRACE=1) but this in itself just created noise. It actually took me a few days to remember that the servers are running arbitrary code anyway (a side effect of testing) and therefore cloning from the potentially insecure http protocol didn’t provide any further risk.

Over http we got a little more information, but still nothing that was conclusive at this point:

git clone -vvv . error: RPC failed; result=18, HTTP code = 200 fatal: The remote end hung up unexpectedly fatal: protocol error: bad pack header

After a bit it became more apparent that the problems would occur mostly during high (patch) traffic times. That is, when a lot of tests need to be queued. This lead me to think that either the network turbo-hipster was on was flaky when doing multiple git clones in parallel or the git servers were flaky. The lack of similar upstream failures lead me to initially think it was the former. In order to reproduce I decided to use Ansible to do multiple clones of repositories and see if that would uncover the problem. If needed I would have then extended this to orchestrating other parts of turbo-hipster in case the problem was systemic of something else.

Firstly I need to clone from a bunch of different servers at once to simulate the network failures more closely (rather than doing multiple clones on the one machine or from the one IP in containers for example). To simplify this I decided to learn some Ansible to launch a bunch of nodes on Rackspace (instead of doing it by hand).

Using the pyrax module I put together a crude playbook to launch a bunch of servers. There is likely much neater and better ways of doing this, but it suited my needs. The playbook takes care of placing appropriate sshkeys so I could continue to use them later.

--- - name: Create VMs hosts: localhost vars: ssh_known_hosts_command: "ssh-keyscan -H -T 10" ssh_known_hosts_file: "/root/.ssh/known_hosts" tasks: - name: Provision a set of instances local_action: module: rax name: "josh-testing-ansible" flavor: "4" image: "Ubuntu 12.04 LTS (Precise Pangolin) (PVHVM)" region: "DFW" count: "15" group: "raxhosts" wait: yes register: raxcreate - name: Add the instances we created (by public IP) to the group 'raxhosts' local_action: module: add_host hostname: "{{ }}" ansible_ssh_host: "{{ item.rax_accessipv4 }}" ansible_ssh_pass: "{{ item.rax_adminpass }}" groupname: raxhosts with_items: raxcreate.success when: raxcreate.action == 'create' - name: Sleep to give time for the instances to start ssh #there is almost certainly a better way of doing this pause: seconds=30 - name: Scan the host key shell: "{{ ssh_known_hosts_command}} {{ item.rax_accessipv4 }} >> {{ ssh_known_hosts_file }}" with_items: raxcreate.success when: raxcreate.action == 'create' - name: Set up sshkeys hosts: raxhosts tasks: - name: Push root's pubkey authorized_key: user=root key="{{ lookup('file', '/root/.ssh/') }}"

From here I can use Ansible to work on those servers using the rax inventory. This allows me to address any nodes within my tenant and then log into them with the seeded sshkey.

The next step of course was to run tests. Firstly I just wanted to reproduce the issue, so in order to do that it would crudely set up an environment where it can simply clone nova multiple times.

--- - name: Prepare servers for git testing hosts: josh-testing-ansible* serial: "100%" tasks: - name: Install git apt: name=git state=present update_cache=yes - name: remove nova if it is already cloned shell: 'rm -rf nova' - name: Clone nova and monitor tcpdump hosts: josh-testing-ansible* serial: "100%" tasks: - name: Clone nova shell: "git clone"

By default Ansible runs with 5 folked processes. Meaning that Ansible would work on 5 servers at a time. We want to exercise git heavily (in the same way turbo-hipster does) so we use the –forks param to run the clone on all the servers at once. The plan was to keep launching servers until the error reared its head from the load.

To my surprise this happened with very few nodes (less than 15, but I left that as my minimum testing). To confirm I also ran the tests after launching further nodes to see it fail at 50 and 100 concurrent clones. It turned out that the more I cloned the higher the failure rate percentage was.

Now that I had the problem reproducing, it was time to do some debugging. I modified the playbook to capture tcpdump information during the clone. Initially git was cloning over IPv6 so I turned that off on the nodes to force IPv4 (just in case it was a v6 issue, but the problem did present itself on both networks). I also locked to one IP rather than randomly hitting both front ends.

--- - name: Prepare servers for git testing hosts: josh-testing-ansible* serial: "100%" tasks: - name: Install git apt: name=git state=present update_cache=yes - name: remove nova if it is already cloned shell: 'rm -rf nova' - name: Clone nova and monitor tcpdump hosts: josh-testing-ansible* serial: "100%" vars: cap_file: tcpdump_{{ ansible_hostname }}_{{ ansible_date_time['epoch'] }}.cap tasks: - name: Disable ipv6 1/3 sysctl: name="net.ipv6.conf.all.disable_ipv6" value=1 sysctl_set=yes - name: Disable ipv6 2/3 sysctl: name="net.ipv6.conf.default.disable_ipv6" value=1 sysctl_set=yes - name: Disable ipv6 3/3 sysctl: name="net.ipv6.conf.lo.disable_ipv6" value=1 sysctl_set=yes - name: Restart networking service: name=networking state=restarted - name: Lock git.o.o to one host lineinfile: dest=/etc/hosts line='' state=present - name: start tcpdump command: "/usr/sbin/tcpdump -i eth0 -nnvvS -w /tmp/{{ cap_file }}" async: 6000000 poll: 0 - name: Clone nova shell: "git clone" #shell: "git clone" ignore_errors: yes - name: kill tcpdump command: "/usr/bin/pkill tcpdump" - name: compress capture file command: "gzip {{ cap_file }} chdir=/tmp" - name: grab captured file fetch: src=/tmp/{{ cap_file }}.gz dest=/var/www/ flat=yes

This gave us a bunch of compressed capture files that I was then able to seek the help of my colleagues to debug (a particular thanks to Angus Lees). The results from an early run can be seen here:

Gus determined that the problem was due to a RST packet coming from the source at roughly 60 seconds. This indicated it was likely we were hitting a timeout at the server or a firewall during the git-upload-pack of the clone.

The solution turned out to be rather straight forward. The git-upload-pack had simply grown too large and would timeout depending on the load on the servers. There was a timeout in apache as well as the HAProxy config for both frontend and backend responsiveness. The relative patches can be found at and

While upping the timeout avoids the problem, certain projects are clearly pushing the infrastructure to its limits. As such a few changes were made by the infrastructure team (in particular James Blair) to improve’s responsiveness.

Firstly is now a higher performance (30GB) instance. This is a large step up from the previous (8GB) instances that were used as the frontend previously. Moving to one frontend additionally meant the HAProxy algorithm could be changed to leastconn to help balance connections better (

+--------------------+ | | | (HAProxy frontend) | +----------+---------+ | | +------------------------v------------------------+ | +---------------------+ (leastconn algorithm) | | | | | | | +---------------------+ | | +---| | | | | +---------------------+ | | +---| | | | | +---------------------+ | | +---| | | | | +---------------------+ | | +---| | | | | (HAProxy backend) | | | +---------------------+ | +-------------------------------------------------+

All that was left was to see if things had improved. I rerun the test across 15, 30 and then 45 servers. These were all able to clone nova reliably where they had previously been failing. I then upped it to 100 servers where the cloning began to fail again.

Post-fix logs for those interested:

At this point, however, I’m basically performing a Distributed Denial of Service attack against git. As such, while the servers aren’t immune to a DDoS the problem appears to be fixed.

Categories: thinktime