thinktime
Colin Charles: Thanks Oracle for fixing the GPL man page issue
Timing is everything. I wrote about how MySQL man pages were silently relicensed away from the GPL. It was picked up by a lot of sites: Hacker News, Slashdot, LWN, and probably more. That led to a bug report in Debian (#712730) to complain that MySQL is no longer compliant with the Debian Free Software Guidelines (DFSG). That prompted Norvald Ryeng who’s active in Debian (thanks Oracle!) to file MySQL bug #69512. Almost immediately Oracle said it was a bug, where Yngve Svedsen pointed to the buildsystem: “This is indeed a bug, where the build system erroneously and silently started pulling in man pages with the wrong set of copyright headers.” That then prompted Tomas Ulin to write about how The MySQL Man Pages ARE Available under the GPL. Case closed, many on Hacker News attributed it to Hanlon’s razor. Most news sites updated it with the bug, and The H also wrote an article: Oracle bug accidentally removes GPL licence from MySQL man pages.
We learned about this issue from MariaDB Jira and spent some time looking at it. We looked at the MySQL source tarballs, and looked at 5.5.30/5.5.31/5.5.32. This issue is present in a release since 18 April 2013 (5.5.31) and a subsequent release on 3 June 2013 (5.5.32). What is clear is that this also affects 5.1, 5.6, and 5.7. This has been an issue for about two months.
So this issue is written off as a bug. Great. Its fixed because it was noticed. It’s noticed not because it was just reported in the bugs system, but because there was a huge amount of traffic around it. While Tomas might say, “Reporting a bug is always a good way to communicate with us,” I doubt this would have been fixed in record time any other way. Also, I don’t need to rehash all the issues with the public bugs system.
I’m not about to start conspiracy theories here because that isn’t my goal. Our frame of mind since last week’s RHEL Software Collections news has been focused on documentation as well. Sheeri Cabral, an Oracle ACE Director, has had a rather interesting conversation on Twitter about our documentation. man pages aside, we’re improving documentation tremendously, and have over 2,700 articles in the Knowledgebase.
One thing is for sure with Oracle as steward for MySQL: the public perception of Oracle isn’t at its best and generally no one assumed this to be an accident.
Now let’s focus on something celebratory and positive: MySQL (NDB) Cluster 7.3 is now a GA. I’m excited by the node.js connector and the auto-installer. Can’t wait to give it a try. Congratulations all round to the Cluster Team at Oracle.
Related posts:
Sonia Hamilton: refactoring in Go – rather pleasant actually…
I’ve just finished refactoring a large Go program, and the process was rather…. pleasant.
Static typing catches all those obscure errors I wouldn’t think about in a scripting language (Python, Perl, Ruby, etc). My process is:
- type :make in vim (I have a dummy Makefile in my Go project just for vim)
- vim jumps cursor to error (vim quickfix list)
- “oh, I shouldn’t do that” – fix (type type type)
- start again
Finish rather sooner than expected, run tests, smile in knowledge program is working properly.
Update
To quickly setup the make command for Go, type this in a Vim window:
:setlocal makeprg=go\ build\ \.Or even better configure vim via your ~/.vimrc, for example:
autocmd BufRead *_test.go setlocal makeprg=go\ test\ \. autocmd BufRead *.go setlocal makeprg=go\ test\ \./..Thanks Martin for the comment!
Sonia Hamilton: gsnmpgo – SNMP for Golang using gsnmp
Work on gsnmpgo has halted. Making the gsnmp C library multithreaded was proving too time consuming. Use http://github.com/soniah/gosnmp instead.
Previously…I recently released gsnmpgo – a Go/CGo snmp library using gsnmp. Pull requests welcome!
From the gsnmpgo documentation, here’s an example of usage:
// do an snmp get; RFC 4088 is used for uris uri := `snmp://public@192.168.1.10//(1.3.6.1.2.1.1.1.0)` params := gsnmpgo.NewDefaultParams(uri) results, err := gsnmpgo.Query(params) if err != nil { fmt.Println(err) os.Exit(1) } // check your results gsnmpgo.Dump(results) // turn on debugging gsnmpgo.Debug = trueSonia Hamilton: Bokken and Sword Training
The guys training with wooden swords (Bokken) and metal swords.
Sonia Hamilton: git – delete local tracking branches
(Just a summary of Stack Overflow “How do you Remove an Invalid Remote Branch Reference from Git?”).
To delete a local tracking branch (without deleting the remote branch), do:
git branch -rd remote/branchAnd of course to delete the remote branch:
git push remote :branchOccasionally a gc will help, but usually shouldn’t be used:
git gc --prune=nowSo, when would you want to use this? Let’s say the repository you’re tracking has a lot of branches (eg the Linux Kernel). You start tracking branch “foo”, do some work with it, merge some of it in to your branch “bar”, then push “bar” up to the remote repository. Or, you’ve got a whole lot of dev branches you’ve merged pushed to your backup repository and also merged into your master branch.
In either case you’ve got a collection of tracking branches you don’t want to see anymore, so clean them up:
% git branch -r soniah/dev.a soniah/dev.b soniah/dev.c soniah/dev.d soniah/master % git branch -rd soniah/dev.a Deleted remote branch soniah/dev.a (was deadbeef). % git branch -rd soniah/dev.b Deleted remote branch soniah/dev.b (was deadbeef). # now remote branches is cleaner: % git branch -r soniah/dev.c soniah/dev.d soniah/masterSonia Hamilton: GoSnmp – SNMP for GoLang
Today I released soniah/gosnmp – an update of alouca/gosnmp.
Many, many thanks to Andreas Louca for writing alouca/gosnmp. The major difference between his version and soniah/gosnmp is that the latter has tests written. (However the code could do with refactoring). The tests were used to find and correct errors in the following SNMP BER Types:
- Counter32
- Gauge32
- Counter64
- OctetString
- ObjectIdentifier
- IpAddress
Also, this version contains functions for treating the returned snmp values as *big.Int (convenient, as SNMP can return int32, uint32, and uint64 values)
Sonia Hamilton: git pull -f (git force pull)
Git has a “force push” option (git push -f remote branch), but it doesn’t have a “force pull” option (like git pull -f remote branch).
This works:
% git fetch remote branch % git reset --hard FETCH_HEAD % git clean -dfOr, as a function for your bash/zsh config file:
gpuf () { # git pull -f $1 remote=${1:?"need remote to force pull from"} current_branch=$(git symbolic-ref -q HEAD) current_branch=${current_branch##refs/heads/} current_branch=${current_branch:-HEAD} if [ $current_branch = 'HEAD' ] ; then echo echo "On a detached head. Exiting..." exit 1 fi git fetch $remote $current_branch git reset --hard FETCH_HEAD git clean -df }Sonia Hamilton: Golang – building with Makefile and Jenkins
I’ve recently been working on a large Go project, and one of the deliverables was that the project be buildable using Jenkins. I was unfamiliar with Jenkins, and there didn’t seem to be any documentation around on how to build Go executables.
Project StructureFirst of all an aside on project structure. For my first project I had a GOPATH of ~/go and the following directory structure:
~/go |--bin |--pkg |--srcBut as I wrote more Go projects, it made more sense to separate each project into it’s own directory structure:
~/go |--project1 |--bin |--pkg |--src |--project2 |--bin |--pkg |--srcWith this directory structure I set GOPATH on the command line or in a Makefile:
% cd ~go/project2/src/github.com/soniah/gosnmp % GOPATH=~go/project2 go build MakefileFor smaller projects you can just produce a binary using go run or go build. But a Makefile will be needed for larger projects, as they will have other deliverables besides a binary – for example manpages or an operating system installer like a .deb for Ubuntu/Debian.
GOROOT := /usr/lib/go GOPATH := /var/lib/jenkins/workspace/go/project2 myhostname := $(shell hostname) ifeq (${myhostname}, laptop) GOPATH := /home/sonia/go/project2 else ifeq (${myhostname}, testmachine) GOPATH := /home/u1234/go/project2 GOROOT := /usr/local/go endif build: build-stamp build-stamp: file1.go file2.go file3.go # always format code GOPATH=$(GOPATH) go fmt $^ # binary GOPATH=$(GOPATH) go build -o project2 -v $^ # docs markdown README.mkd > README.html help2man --no-info --include=help2man.roff --name "Project2" ./project2 > project2.roff man -Tps -l project2.roff > project2-man.ps ps2pdf project2-man.ps project2-man.pdf # mark as done touch $@ JenkinsWith a working Makefile, building under Jenkins will now be easier. The objective is to have Jenkins automatically build a new binary/package whenever a developer pushes to one of the git/mercurial/bzr repos that makeup the various components of your project.
However if your project contains multiple components, you’ll soon come across a problem. For example Project2 was using github.com/droundy/goopt and github.com/mattn/go-sqlite3. To see the problem, create a New Job using Build a free-style software project. Under Source Code Management, choose something like Git. Under the second Advanced button, you’ll need to change the option Local subdirectory for repo (optional) to point to the path of a component’s repo. But this setting is a global setting for all git repos – so the build won’t work as you add a second and third component.
The solution is to install to Jenkins the Multiple SCMs Plugin. Then in your Jenkins Job you’ll be able to set the local subdirectory for each component. For example in Project2:
- the goopt local subdirectory was set to project2/src/github.com/droundy/goopt
- the go-sqlite3 local subdirectory was set to project2/src/github.com/mattn/go-sqlite3
Here are some other useful setting for building Go projects on Jenkins:
- if you’re building 32 and 64 bit binaries (on different build servers), use the setting Restrict where this project can be run with something like “32bit&&precise&&ubuntu”
- separate out your Go code from other languages – Advanced Project Options, Use custom workspace, “/var/lib/jenkins/workspace/go”
Sonia Hamilton: Golang – profiling libraries and GoSNMP for SNMP
There is already a great article on Profiling Go Programs. However that article only discusses how to profile a standalone binary – what about a library?
For example, I’ve been working on the GoSNMP SNMP library, here’s how I profiled it (it wasn’t obvious):
# produce cpu profiling information from the tests - this part was well documented % go test -cpuprofile cpu.out # compile the test binary to pkg.test but do not run it (from `go help test`) # this part wasn't obvious % go test -c # now run pprof using `go teste -c` output # using gosnmp.test - this part wasn't obvious % go tool pprof gosnmp.test cpu.outDoing a memory profile was similar:
% go test -memprofile mem.out % go test -c % go tool pprof gosnmp.test mem.outSonia Hamilton: Ubuntu – HowTo Recover Encrypted Home Directory
There are many pages out there discussing how to recover an Ubuntu encrypted home directory (see also below). These are merely notes for my future reference; they need tidying at there may be errors/mis-attributions in it.
Start by booting from an Ubuntu Live CD.
PasswordsThree different “passwords” are referred to when recovering:
* boot password ie the password used when your laptop is first booted and the partitions are decrypted
* user password ie your unix account password
* mount password – will look something like f0bddb4c533fddb2c89e890098ed65d1. The one that you didn’t write down when prompted to do so… See “Recovering Your Mount Passphrase” https://help.ubuntu.com/community/EncryptedPrivateDirectory#Recovering_Your_Mount_Passphrase
If you selected the default Ubuntu encryption setup, the partitions will be laid out like this:
/dev/sda1 * 2048 499711 248832 83 Linux /dev/sda2 501758 976771071 488134657 5 Extended /dev/sda5 501760 976771071 488134656 83 Linux- /dev/sda1 contains /boot ie kernel and grub
- /dev/sda5 is an encrypted partition (crypto_LUKS) that contains LVM. The Logical Volumes will be for /root, /home and swap. /home will be encrypted with a second level of encryption if you chose “encrypt home directory” during installation.
* confirm /dev/sda5 is the correct partition [1]:
cryptsetup -v luksDump /dev/sda5* mount the encrypted partition containing the LVM volumes:
cryptsetup -v luksOpen /dev/sda5 sda5_crypt vgdisplay (you may need to rename the volume group using vgchange if it conflicts with an existing one. A good motivation for using different VG names on each machine) lvdisplay | less mkdir /mnt/home mount -t ext4 /dev/vg/home /mnt/home Mounting Encrypted Home, LUKS vs eCryptfsThe partition was encrypted with LUKS, and /home will be encrypted with a second level of encryption (eCryptfs) if you chose “encrypt home directory” during installation.
Note for future installs: you’d only want to have both if multiple people were using the same machine. Otherwise use only LUKS especially for laptops; eCryptfs is an extra hurdle during recovery and doesn’t give extra security. Also, using only LUKS is more secure than only encrypting your home directory using eCryptfs:
* it will encrypt other things beside /home eg swap, /tmp
* you’ll only type in your long LUKS passphrase occasionally (ie at reboot), whereas the eCryptfs password will be typed in every time you login or unlock the screen ie will be more vulnerable to shoulder-surfing, and more likely to be too short
However, eCryptfs does have some advantages (http://www.privacydusk.com/tag/ecryptfs-vs-luks/):
* All the cryptographic metadata is stored in the header of the file. This means that the encrypted file can be copied and moved from one location to another not leaving any metadata behind
* Files can be encrypted with multiple keys so that multiple different users can have access to encrypted but shared files. You can have different files encrypted by different users and each user can access only his files
* “remount” /mnt/home on home:
umount /mnt/home
mount -t ext4 /dev/vg/home /home
# add a user with the same name as the broken system
adduser –no-create-home sonia
su sonia
ecryptfs-mount-private
The Ubuntu documentation on EncryptedPrivateDirectory has lots of information [4]. These commands are copied from there, in case the page moves or disappears.
(((
sudo ecryptfs-add-passphrase –fnek
Passphrase: (Enter the mount passphrase you recorded when you setup the mount–this passphrase is different from your login passphrase.)
You should now get two lines looking like this:
Inserted auth tok with sig [9986ad986f986af7] into the user session keyring
Inserted auth tok with sig [76a9f69af69a86fa] into the user session keyring (write down the second value in the square brackets)
mkdir /mnt/Private
mount -t ecryptfs /mnt/home/sonia/.Private /mnt/Private
Selection: 3 (use a passphrase key type)
Passphrase: (Enter the mount passphrase you recorded when you setup the mount–this passphrase is different from your login passphrase.)
Selection: aes (use the aes cipher)
Selection: 16 (use a 16 byte key)
Enable plaintext passthrough: n
Enable filename encryption: y (This and the following options only apply if you are using filename encryption)
Filename Encryption Key (FNEK) Signature: (the value you wrote down from the second line above)
)))
[1] http://blog.miketoscano.com/?p=72
[2] http://goshawknest.wordpress.com/2010/04/16/how-to-recover-crypted-home-directory-in-ubuntu/
[3] https://help.ubuntu.com/community/EncryptedPrivateDirectory
[4] https://help.ubuntu.com/community/EncryptedPrivateDirectory#Recovering_Your_Data_Manually
Sonia Hamilton: git bisect run – example
Git bisect is a great tool for finding bugs in a program. But many examples show manual uses of git bisect – here’s an example of automating the process using git bisect run.
tl;drUsing git bisect run is easy if you’ve make small atomic commits and you have good tests. run makes a large debug easier (compared to manually doing git bisect good and git bisect bad) – you’re less likely to make errors due to boredom. And run means you can use an iterative process – use rebase to split bad commits then just run again.
ExampleSo I had an elusive bug in a long running process (an snmp poller, calculator and aggregator for a large network). I had a point where the program was good, but I’d added more features since good and now results were bad. The first step was to write a shell script to be called from git bisect run:
% cat bisect.sh #!/bin/bash # copy this to ~ before running with `git bisect run ~/bisect.sh` cp ~/Makefile . make clean # make modifies manpage output, so stash after build if make &> /dev/null ; then git stash git stash clear else git stash git stash clear exit 125 fi sudo cat /var/tmp/empty > /var/log/abc/abc-poller.log sudo ./abc-poller --tmp --once -d 2 -c 150 || exit 125 echo "=== poller finished" percent=`godir=/var/tmp/data/abcmon/poll_queue/new ~/checker | \ tail -1 | awk '{print $5}' | awk -F. '{print $1}'` echo "=== percent is $percent" (( percent < 5 ))Things that make writing the test script easier:
- first get it working outside of git bisect run – usually means echoing results along the way
- already having a test suite that produces a quantified pass/fail output, In my case the I had already written the checker program, whose last line of output contained a “percentage failure” figure
Next step was having an abbreviated log of commits to refer to:
% git log --oneline f00f232 sql.go - better debugging # bad 9780d44 dummy .gitignore, so out dir preserved 0de0796 Makefile for nsch1abcs01 a2f6c96 defaults - 20 workers, udp 15 b8ee3d9 GOMAXPROCS() 04f21ba start v0.0.2 dbc6a60 Makefile: Jenkins as default for env vars 557a5e3 more work on stats ef6a453 remove excessive debugging ccc4644 remove file buffering - wasn't writing..??? 98bf4b1 stats write failing 9a9682d move type queue_t struct 467aeed buffered writes for queue file 148a8cc stats: + device_run, device_ok c61be42 done chan *Stats_t; calculate_value() bool 0e41461 debugging - print out device_id as %5s 7cbe167 default workers 5000, correct stop/start commands 544e8d7 gather statistics 6990cf2 rename data chan to device_id 5e8562b rename sql -> sqlconn; global var fd52d89 remove dead code 4d7b9b1 device_for() - err if count != 1 6ceb305 deb: fail if version main.go isn't same changelog 5ce1855 deb: rules producing abc-poller_0.0.1_amd64.deb db5bff4 deb: cleaned up Makefile, roffs 76674c2 JSON -> SQL; version 0.0.1 4cf198e deb: basic removal of 64 references d980f26 deb: rename 64 to vanilla b933570 deb: remove 32 bit stuff 970bd82 current debug level is 2; adjust output 70f8921 Revert "deb build - don't init, cron while testing" 855ce00 default debug is 2; misc tidy f3d80e1 remove timeoutOpt - no longer used 48ced38 misc tidys before release a3fe13c runonceOpt, revert cycling code # good 24cd998 use passed in udpOpt 5ca1830 remove stash/sender.old.go 06b9566 remove gsnmpgo; use gosnmp bffd430 rules: add note about "too many open files"Mark bad and good, start the run, go and have a coffee :-)
% git bisect start f00f232 a3fe13c % git bisect run ~/bisect.sh # lots of outputI get the result that the ominously named 557a5e3 more work on stats is the first bad commit – I remember it as one of those large “kitchen sink” commits done at the end of the day. So “first rule of fightclub git” remembered – always do small atomic commits.
I have a useful shell function gri() – I used that to interactively rebase and break up 557a5e3 into many small commits:
gri () { git rebase -i HEAD~${1:-7} }After rebasing git log looked like this – notice the many small commits named “bisect1″ etc:
382b3ee defaults - 20 workers, udp 15 11db314 GOMAXPROCS() 4b547ea start v0.0.2 c92feef Makefile: Jenkins as default for env vars 8fcc595 bisect6: calc/noncalc ad3a18f bisect5: tweak debug msgs 311bb8d bisect4: mv stats init 957c234 bisect3: remove stats from send_gosnmp() 3c01d62 bisect2: use Add(); Calcs/NonCalcs 947cb27 bisect1: move Stats_t; Add() ef6a453 remove excessive debugging ccc4644 remove file buffering - wasn't writing..??? 98bf4b1 stats write failingAnd here’s the real win of writing bisect.sh – you can just keep rebasing and running until you’ve narrowed down the bad code to a few lines:
=== poller finished === percent is 49 947cb27fd57642dc545ee23090d7ae8fd8b14b3f is the first bad commit commit 947cb27fd57642dc545ee23090d7ae8fd8b14b3f Author: Sonia Hamilton <sonia@snowfrog.net> Date: Thu Mar 14 10:02:42 2013 +1100 bisect1: move Stats_t; Add()I do another interactive rebase, fix the logic error, and then HEAD is good.
Quid pro quo (you can't play ping pong by yourself)
Julien Goodwin: On programming languages
Languages I've used in the last three months:
- C++
- Java
- Go
- Python
- bash
- Javascript & HTML (Including several templating languages)
In the two years since I started there's also:
- perl
- tcsh
- PHP
- SLAX (An alernate syntax version of XSLT)
These end up being a fairly unsurprising mix of standard sysadmin, web and systems programmer faire, with the real outliers being Go, the new c-ish systems language created at Google (several of the people working on the language sit just on the other side of a wall from me), and tcsh & SLAX which come from working with Juniper's JunOS which is built on FreeBSD with an XML-based configuration.
The confrontation waiting to happen
Matt Palmer: Thought for the day
When the Syrian Electronic Army hacked The Onion’s twitter account, what did they do to cause panic and mayhem? Post real news stories?
Andrew Pollock: [life] City2South run report
Yesterday I ran in the City2South. It was a beautiful day for it, and I really enjoyed the run. Thanks to generous support of my donors, I raised $252 for the Heart Foundation.
My official time was 1:20:41, which I'm really happy with. I'd only ever run 14km on the preceding Tuesday, and I ran that in 1:32:57, so to do this run 12 minutes faster, on a completely different course felt like quite the accomplishment. I also ran personal best times for 5K and 10K. It's hard to believe that the guy who came first ran it in 44 minutes.
the course was really nice, except for running up Highgate Hill at kilometre 12. That was a bit harsh, but I managed to run all the way up it, nonetheless. It only hit me this afternoon when I was replaying the course in my mind just how far I ran. My normal 10K course doesn't feel all that long because it loops back on itself a lot, so it's deceptive how much ground I cover.
From a technical perspective, the race was done very well. I liked that they had a Facebook app that in real time posted updates when I crossed the start, 5K, 10K and finish lines, and the official results were online by the time I got home. That said, as I write, the website is throwing all sorts of errors when I try to download my official finishing certificate, or see my photos and finish line video.
All things considered, it was a pretty nice way to spend a Sunday morning. I was up at 5am to be on the 5:44am ferry from Hawthorne, and back home again by 10am.
I'm very keen to try running a half marathon now, but my next run is the 10km Bridge to Brisbane in September. That one will be more interesting because presumably it involves running up the Gateway Bridge, and I'll be pushing Zoe in a jogging stroller. I'm not expecting any personal best times for that one.
Michael Still: We all know that the LCA2014 CFP is open, right?
So, if you're interested in speaking at linux.conf.au 2014, in Perth between 6 and 10 January 2014 you should hit up those CFPs now!
Tags for this post: conference lca2014 cfp
Related posts: LCA 2006: CFP closes today; Got Something to Say? The LCA 2013 CFP Opens Soon!; Call for papers opens soon
Comment
The thermostat and the frying pan
David Rowe: Not Activiating Mt Remarkable
Last Saturday I had my first Summits on The Air (SOTA) attempt on top of Mount Remarkable here in South Australia.
As a first step on Friday I registered my SOTA attempt on the Sotawatch web site
On Saturday morning I started by testing my FT-817 and Alexloop magnetic loop antenna at our camp. While tuning up I managed to talk to a VK2 (portable in VK5) who was few 100 km away in the Flinders ranges. Good test.
I then hiked for a few hours to get to the top of Mt Remarkable, set up my radio and antenna, and called CQ on 40m and 20m. Alas, I made no contacts. However it was so nice to experience S0 noise on 40 and 20m, so much different to my urban S9 hash experience on those bands. I couldn’t hear much activity on 40m but could hear many international stations on 20m. They just couldn’t hear me!
The members of the SOTA Australia Yahoo Group have been most helpful with many suggestions on how I can do better next time. In particular I can “self spot” using a smart phone app like sotagoat or a web site. I’ll certainly give it another go in future.
Some pictures of my little adventure:
In the last picture the magnetic loop is just behind my head, the FT-817 just visible above the white note book. I use a 1m dowel as the antenna mast which doubles as a walking stick for the hike. A large part of this walk is over paths covered with large rubble. as shown in the pictures above. I was told this is from ancient volvanic activity. The rubble moves a bit under your feet, making for slow going. There is a light plane crash about 2/3 of the way up – the alloy remains of the plane still shiny after 30 years. The walk was about 6 hours return for me from the caravan park at the base of Mt Remarkable. However I am a slow walker, and had a sore knee from a bike crash a few days before!
Andrew Cowie: Strong eventual consistency
Most people will have seen the Call Me Maybe blog posts about data loss in the face of network partition. A few times in that series the author discusses “CRDT” as an alternative approach to the concurrency problem.
Midway through the last post in the series is what is almost an off-the-cuff comment, but I think it’s everything:
“Consistency is a property of your data, not of your nodes.”We tend to get overwhelmed with replication configurations, high-availability solutions, sharding strategies, and worrying about how a given database will react under various failure modes.
And yet, the essential truth that we're so busy worrying about what's stored on disk that we can forget that don't care about consistency of what's on disk. We need to care about the consistency of our data. It's easy for a misbehaving program to write garbage, but not to worry! we're absolutely certain that garbage is consistently replicated across the cluster. Yeah, well done there.
So the much bigger challenge in high-availability distributed systems, is making sure we have sane rules for propagating changes so that we can have a safe view of our data.
About 10 years ago I was working with a Java based object-oriented database (which is a grandiose name for what was as much a disk-backed datastore as anything else, but if you're morbidly curious about what sort of API such a beast would have, you can read about db4o in a series of posts I wrote about it). It was surprisingly easy to use, and came along at a time when I was prepared to do just about anything to escape the object-relational mapping hell.
They got significant adoption in embedded devices where zero-administration is a necessity and developers don't wanting to deal with the machinery of a full scale RDMBS just to store e.g. configuration parameters. But surprise, it wasn't long before users started asking for replication features. Now, usually when you hear that term you think of master/slave replication being done at database engine level in a high-availability setup. In this case, however, they had disconnected devices re-establishing connectivity to enterprise datastores, and because of that you had to cope with significant conflicts when it came time to synchronize.
Because the data model was articulated in terms of Java code (to a naive first approximation, you were just storing Java objects), you had the data model living in the same place as the application code, domain layer, and validation logic. This meant that when it came time to cope with those conflicts, the natural place to put do that was in the same Java code. This was interesting, because for just about every other database engine out there data is opaque. Oh, sure, RDBMS have types (though that there are people who think VARCHAR(256) actually tells you anything useful remains a source of wonder; alas, I digress), but if you have a high availability configuration and you've allowed concurrent activity during a network partition, then you have to deal with diverged replicas and thus have to merge them. Database doesn't know what to do; how could it? No: consistency is a property of your data, not the datastore; the rules to decide how to synchronize are a business decision, so where better to put it than in the business logic?
Peter Miller suggests the example of booking flights: multiple passengers can end up allocated the same seat on an oversold flight, but the decision about who gets which seat happens at check-in and conflict resolution is a business one made by the airline staff, not the database.
Throughout the Jepsen posts, you'll see occasional mention of "CRDTs" as an alternative to the problems of attempting to achieve simultaneous write safety in a distributed system. Finding out just what a CRDT is took a bit more doing that I would have expected; hence wanting to write this post.
Convergent and Commutative Replicated Data TypesIt's easy to have Consistency when you impose synchronous access to your data. But the locks needed to give that property don't scale to distributed systems; you need to have data that can cope with delay. The idea of self-healing systems have been around for a while, but there hasn't been much formal study of what data types meet these requirements. If you're at all interested, I'd encourage you to have a read of "A comprehensive study of Convergent and Commutative Replicated Data Types" by Shapiro, Preguiça, Baquero, and Zawirski.
http://hal.inria.fr/docs/00/55/55/88/PDF/techreport.pdf
They use set notation and a form of psuedocode to describe the different data types which all makes the read a bit more serious than it needs to be, but having had my head buried in this paper for a few days I can say the effort has paid off. They articulate a set of conditions that would make either a state based system able to handle merges — which basically works out because the requirement is for the datatype to be a join semilatice; if it is, then they show the replicas will converge — or an operation based one (aka command pattern to us programmer types) — where the requirement is for manipulations of the datatype to be commutative, and if so, ditto [They also show these are equivalent, which is handy].
Here's an schematic illustration of a state-based convergent replicated data type:
The idea being that if you have a merge function, then it doesn't matter where a state change is made; it will eventually make its way to all replicas.
Which raises the topic of eventual consistency. Anyone who has worked with Amazon S3 has discovered (the hard way, inevitably) that mutating an existing value has wildly undefined behaviour as to when other readers will see that change. CRDTs, on the other hand, exhibit "strong eventual consistency" (or perhaps better "strong eventual convergence", as Murat Demirbas put his analysis of the topic), whereby the propagation behaviour is well defined.
At first you'd think that this would seriously cramp your style, but the real contribution of the paper is they then explore around a bit and examine a number of different datatypes that meet these requirements.
The paper also includes an impressive reference list & discussion of prior art in the space, so it's worth a read. There's also "Conflict-free Replicated Data Types" by the same authors which formalizes SEC http://pagesperso-systeme.lip6.fr/Marc.Shapiro/papers/CRDTs_SSS-2011.pdf
Back to the effect of network partitions on data safety:
What about Ceph?Good question.
What I would be interested in now is how Ceph's various inter-related pieces hold up in the face of the sort of aggressive network partition testing conducted in the Jepsen survey. Reading a recent blog article about how the Ceph monitor services have re-implemented their use of Paxis struck me as being extraordinarily complicated. "One Paxos to rule them all"? Oh dear.
I'm doing a back-of-the-envelope examination but I think I already know the answer: you're not going to get a write acknowledged until it is durably stored — which is Consistency. Ceph is a complex system, and parts of it can be offline when others are continuing to provide service. So you'd have to break it down to the provision of a single piece of mutable data before you could study the Availability of the system properly. I'd love to find someone who would like us do a real analysis using the Jepsen techniques; be interesting to see.
But this all reminds us why we're interested in CRDTs in the first place: systems where you can build synchronous communication (or an external appearance thereof care of the use of consensus protocols internally) to achieve Consistency are in essence limited to highly controlled clusters in an individual data center. Most real world systems involve components distributed across geographic, temporal, and logical distances, and that means you must take into account the limitations of the speed of information propagation. While most people immediately think about the light-speed problem, it applies just as much to any distributed environment; and in any real world information system we need to serve clients concurrently, and that means the technique of using a CRDT where possible might very well be worth the effort.
AfC