You are here

Planet Linux Australia

Subscribe to Planet Linux Australia feed
Planet Linux Australia - http://planet.linux.org.au
Updated: 3 weeks 19 hours ago

Chris Smart: Git hook to help with OpenStack development

Mon 30th Jan 2017 15:01

I wrote a small Git hook which may be useful in helping OpenStack devs run tests (and any script they like) before a commit is made (see Superuser magazine article).

This way we can save everyone time in the review process by fixing simple issues before they break in the check-pipeline.

Installation is easy (see the GitHub page) and all prompts default to no, so that the dev can easily just hit Enter to skip and continue (but still be reminded).

Categories: thinktime

sthbrx - a POWER technical blog: Installing Centos 7.2 on IBM Power System's S822LC for High Performance Computing (Minksy) with USB device

Mon 30th Jan 2017 08:01
Introduction

If you are installing Linux on your IBM Power System's S822LC server then the instructions in this article will help you to start and run your system. These instructions are specific to installing CentOS 7 on an IBM Power System S822LC for High Performance Computing (Minsky), but also work for RHEL 7 - just swap CentOS for RHEL.

Prerequisites

Before you power on the system, ensure that you have the following items:

  • Ethernet cables;
  • USB storage device of 7G or greater;
  • An installed ethernet network with a DHCP server;
  • Access to the DHCP server's logs;
  • Power cords and outlet for your system;
  • PC or notebook that has IPMItool level 1.8.15 or greater; and
  • a VNC client.

Download CentOS ISO file from the Centos Mirror. Select the "Everything" ISO file.

Note: You must use the 1611 release (dated 2016-12-22) or later due to Linux Kernel support for the server hardware.

Step 1: Preparing to power on your system

Follow these steps to prepare your system:

  1. If your system belongs in a rack, install your system into that rack. For instructions, see IBM POWER8 Systems information.
  2. Connect an Ethernet cable to the left embedded Ethernet port next to the serial port on the back of your system and the other end to your network. This Ethernet port is used for the BMC/IPMI interface.
  3. Connect another Enternet cable to the right Ethernet port for network connection for the operating system.
  4. Connect the power cords to the system and plug them into the outlets.

At this point, your firmware is booting.

Step 2: Determining the BMC firmware IP address

To determine the IP address of the BMC, examine the latest DHCP server logs for the network connected to the server. The IP address will be requested approximately 2 minutes after being powered on.

It is possible to set the BMC to a static IP address by following the IBM documentation on IPMI.

Step 3: Connecting to the BMC firmware with IPMItool

After you have a network connection set up for your BMC firmware, you can connect using Intelligent Platform Management Interface (IPMI). IPMI is the default console to use when connecting to the Open Power Abstraction Layer (OPAL) firmware.

Use the default authentication for servers over IPMI is:

  • Default user: ADMIN
  • Default password: admin

To power on your server from a PC or notebook that is running Linux®, follow these steps:

Open a terminal program on your PC or notebook with Activate Serial-Over-Lan using IPMI. Use other steps here as needed.

For the following impitool commands, server_ip_address is the IP address of the BMC from Step 2, and ipmi_user and ipmi_password are the default user ID and password for IPMI.

Power On using IPMI

If your server is not powered on, run the following command to power the server on:

ipmitool -I lanplus -H server_ip_address -U ipmi_user -P ipmi_password chassis power on Activate Serial-Over-Lan using IPMI

Activate your IPMI console by running this command:

ipmitool -I lanplus -H server_ip_address -U ipmi_user -P ipmi_password sol activate

After powering on your system, the Petitboot interface loads. If you do not interrupt the boot process by pressing any key within 10 seconds, Petitboot automatically boots the first option. At this point the IPMI console will be connected to the Operating Systems serial. If you get to this stage accidently you can deactivate and reboot as per the following two commands.

Deactivate Serial-Over-Lan using IPMI

If you need to power off or reboot your system, deactivate the console by running this command:

ipmitool -I lanplus -H server_ip_address -U user-name -P ipmi_password sol deactivate Reboot using IPMI

If you need to reboot the system, run this command:

ipmitool -I lanplus -H server_ip_address -U user-name -P ipmi_password chassis power reset Step 4: Creating a USB device and booting

At this point, your IPMI console should be contain a Petitboot bootloader menu as illustrated below and you are ready to install Centos 7 on your server.

Use one of the following USB devices:

  • USB attached DVD player with a single USB cable to stay under 1.0 Amps, or
  • 7 GB (or more) 2.0 (or later) USB flash drive.

Follow the following instructions:

  1. To create the bootable USB device, follow the instructions in the CentOS wiki Host to Set Up a USB to Install CentOS.
  2. Insert your bootable USB device into the front USB port. CentOS AltArch installer will automatically appear as a boot option on the Petitboot main screen. If the USB device does not appear select Rescan devices. If your device is not detected, you might have to try a different type.
  3. Arrow up to select the CentOS boot option. Press e (Edit) to open the Petitboot Option Editor window
  4. Move the cursor to the Boot arguments section and to include the following information: ro inst.stage2=hd:LABEL=CentOS_7_ppc64le:/ console=hvc0 ip=dhcp (if using RHEL the LABEL will be similar to RHEL-7.3\x20Server.ppc64le:/)

Notes about the boot arguments:

  • ip=dhcp to ensure network is started for VNC installation.
  • console hvc0 is needed as this is not the default.
  • inst.stage2 is needed as the boot process won't automatically find the stage2 install on the install disk.
  • append inst.proxy=URL where URL is the proxy URL if installing in a network that requires a proxy to connect externally.

You can find additional options at Anaconda Boot Options.

  1. Select OK to save your options and return to the Main menu
  2. On the Petitboot main screen, select the CentOS AltArch option and then press Enter.
Step 5: Complete your installation

After you select to boot the CentOS installer, the installer wizard walks you through the steps.

  1. If the CentOS installer was able to obtain a network address via DHCP, it will present an option to enable the VNC. If no option is presented check your network cables.
  2. Select the Start VNC option and it will provide an OS server IP adress. Note that this will be different to the BMC address previously optained.
  3. Run a VNC client program on your PC or notebook and connect to the OS server IP address.

During the install over VNC, there are a couple of consoles active. To switch between them in the ipmitool terminal, press ctrl-b and then between 1-4 as indicated.

Using the VNC client program:

  1. Select "Install Destination"
  2. Select a device from "Local Standard Disks"
  3. Select "Full disk summary and boot device"
  4. Select the device again from "Selected Disks" with the Boot enabled
  5. Select "Do not install boot loader" from device. which results in .

Without disabling boot loader, the installer complains about an invalid stage1 device. I suspect it needs a manual Prep partition of 10M to make the installer happy.

If you have a local Centos repository you can set this by selecting "Install Source" - the directories at this url should look like CentOS's Install Source for ppc64le.

Step 6: Before reboot and using the IPMI Serial-Over-LAN

Before reboot, generate the grub.cfg file as Petitboot uses this to generate its boot menu:

  1. Using the ipmitool's shell (ctrl-b 2):
  2. Enter the following commands to generate a grub.cfg file
chroot /mnt/sysimage rm /etc/grub.d/30_os-prober grub2-mkconfig -o /boot/grub2/grub.cfg exit

/etc/grub.d/30_os-prober is removed as Petitboot probes the other devices anyway so including it would create lots of duplicate menu items.

The last step is to restart your system.

Note: While your system is restarting, remove the USB device.

After the system restarts, Petitboot displays the option to boot CentOS 7.2. Select this option and press Enter.

Conclusion

After you have booted CentOS, your server is ready to go! For more information, see the following resources:

Categories: thinktime

Tim Serong: Random Test Subject

Mon 30th Jan 2017 01:01

Almost every time I fly, it seems like I get pulled aside for the random explosives trace detection test. I always assumed it was because I usually look like a crazy mountain man (see photo). But, if you google around for “airport random explosives test”, you’ll find forum posts from security staff assuring everyone they’re not doing profiling, and even a helpful FAQ from Newcastle Airport (“Why are you always chosen for ‘explosive testing’?“) which says the process is “as the officer finishes screening one person, they are required to ask the next person walking through screening to undertake the ETD test”.

So maybe it’s just bad luck. Except possibly for that time at Hobart airport last week, where I was seeing off a colleague after linux.conf.au 2017. As far as I could tell, we were the only two people approaching security, and my colleague was in front. He was waved through to the regular security screening, and I was asked over for an explosives test, to which I replied “you’re most welcome to test me if you like, but I’m not actually going through security into departures”. The poor guy looked a bit nonplussed at this, then moved on to the next traveler who’d since appeared in line behind us.

What to do about this in future? Obviously, I need a new t-shirt, with text something like one of these:

If anyone else would like a t-shirt along these lines, the images above conveniently link to my Redbubble store. Or, if you’d rather DIY, there’s PNGs here, here and here (CC-BY-SA as usual, and no, they’re not broken, it’s white text on a transparent background).

Finally, for some real randomness, check out Keith Packard’s ChaosKey To Production presentation. I’m not actually affiliated with Keith, but the ChaosKey sure looks nifty.

Categories: thinktime

Clinton Roy: South Coast Track Report

Sun 29th Jan 2017 23:01

Please note this is a work in progress

I had previously stated my intention to walk the South Coast Track. I have now completed this walk and now want a space where I can collect all my thoughts.

Photos: Google Photos album

The sections I’m referring to here come straight from the guide book. Due to the walking weather and tides all being in our favour, we managed to do the walk in six days. We flew in late on the first day and did not finish section one of the walk, on the second day we finished section one and then completed section two and three. On day three it was just the Ironbound range. On day four it was just section five. Day five we completed section six and the tiny section seven. Day six was section eight and day seven was cockle creak (TODO something’s not adding up here)

The hardest day, not surprisingly, was day three where we tackled the Ironbound range, 900m up, then down. The surprising bit was how easy the ascent was and how god damn hard the descent was. The guide book says there are three rest camps on the descent, with one just below the peak, a perfect spot for lunch. Either this camp is hidden (e.g. you have to look behind you) or it’s overgrown, as we all missed it. This meant we ended up skipping lunch and were slipping down the wed, muddy awful descent side for hours. When we came across the mid rest camp stop, because we’d been walking for so long, everyone assumed we were at the lower camp stop and that we were therefore only an hour or so away from camp. Another three hours later or so we actually came across the lower camp site, and the by that time all sense of proportion was lost and I was starting to get worried that somehow we’d gotten lost and were not on the right trail and that we’d run out of light. In the end I got into camp about an hour before sundown (approx eight) and B&R got in about half an hour before sundown. I was utterly exhausted, got some water, pitched the tent, collapsed in it and fell asleep. Woke up close to midnight, realised I hadn’t had any lunch or dinner, still wasn’t actually feeling hungry. I forced myself to eat a hot meal, then collapsed in bed again.

TODO: very easy to follow trail.
TODO: just about everything worked.
TODO: spork
TODO: solar panel
TODO: not eating properly
TODO: needing more warmth

I could not have asked for better walking companions, Richard and Bec.


Filed under: camping, Uncategorized
Categories: thinktime

Julien Goodwin: Charging my ThinkPad X1 Gen4 using USB-C

Sun 29th Jan 2017 01:01
As with many massive time-sucking rabbit holes in my life, this one starts with one of my silly ideas getting egged on by some of my colleagues in London (who know full well who they are), but for a nice change, this is something I can talk about.

I have a rather excessive number of laptops, at the moment my three main ones are a rather ancient Lenovo T430 (personal), a Lenovo X1 Gen4, and a Chromebook Pixel 2 (both work).

At the start of last year I had a T430s in place of the X1, and was planning on replacing both it and my personal ThinkPad mid-year. However both of those older laptops used Lenovo's long-held (back to the IBM days) barrel charger, which lead to me having a heap of them in various locations at home and work, but all the newer machines switched to their newer rectangular "slim" style power connector and while adapters exist, I decided to go in a different direction.

One of the less-touted features of USB-C is USB-PD[1], which allows devices to be fed up to 100W of power, and can do so while using the port for data (or the other great feature of USB-C, alternate modes, such as DisplayPort, great for docks), which is starting to be used as a way to charge laptops, such as the Chromebook Pixel 2, various models of the Apple MacBook line, and more.

Instead of buying a heap of slim-style Lenovo chargers, or a load of adapters (which would inevitably disappear over time) I decided to bridge towards the future by making an adapter to allow me to charge slim-type ThinkPads (at least the smaller ones, not the portable workstations which demand 120W or more).

After doing some research on what USB-PD platforms were available at the time I settled on the TI TPS65986 chip, which, with only an external flash chip, would do all that I needed.

Devkits were ordered to experiment with, and prove the concept, which they did very quickly, so I started on building the circuit, since just reusing the devkit boards would lead to an adapter larger than would be sensible. As the TI chip is a many-pin BGA, and breaking it out on 2-layers would probably be too hard for my meager PCB design skills, I needed a 4-layer board, so I decided to use KiCad for the project.

It took me about a week of evenings to get the schematic fully sorted, with much of the time spent reading the chip datasheet, or digging through the devkit schematic to see what they did there for some cases that weren't clear, then almost a month for the actual PCB layout, with much of the time being sucked up learning a tool that was brand new to me, and also fairly obtuse.

By mid-June I had a PCB which should (but, spoiler, wouldn't) work, however as mentioned the TI chip is a 96-ball 3x3mm BGA, something I had no hope of manually placing for reflow, and of course, no hope of hand soldering, so I would need to get these manufactured commercially. Luckily there are several options for small scale assembly at very reasonable prices, and I decided to try a new company still (at the time of ordering) in closed trials, PCB.NG, they have a nice simple procedure to upload board files, and a slightly custom pick & place file that includes references to the exact component I want by Digikey[link] part number. Best of all the pricing was completely reasonable, with a first test run of six boards only costing my US$30 each.

Late in June I recieved a mail from PCB.NG telling me that they'd built my boards, but that I had made a mistake with the footprint I'd used for the USB-C connector and they were posting my boards along with the connectors. As I'd had them ship the order to California (at the time they didn't seem to offer international shipping) it took a while for them to arrive in Sydney, courtesy a coworker.

I tried to modify a connector by removing all through hole board locks, keeping just the surface mount pins, however I was unsuccessful, and that's where the project stalled until mid-October when I was in California myself, and was able to get help from a coworker who can perform miracles of surface mount soldering (while they were working on my board they were also dead-bug mounting a BGA). Sadly while I now had a board I could test it simply dropped off my priority list for months.

At the start of January another of my colleagues (a US-based teammate of the London rabble-rousers) asked for a status update, which prompted me to get off my butt and perform the testing. The next day I added some reinforcement to the connector which was only really held on by the surface mount pins, and was highly likely to rip off the board, so I covered it in epoxy. Then I knocked up some USB A plug/socket to bare wires test adapters using some stuff from the junk bin we have at the office maker space for just this sort of occasion (the socket was actually a front panel USB port from an old IBM x-series server). With some trepidation I plugged the board into my newly built & tested adapter, and powered the board from a lab supply set to limit current in case I'd any shorts in the board. It all came up straight away, and even lit the LEDs I'd added for some user feedback.

Next was to load a firmware for the chip. I'd previously used TI's tool to create a firmware image, and after some messing around with the SPI flash programmer I'd purchased managed to get the board programmed. However the behaviour of the board didn't change with (what I thought was) real firmware, I used an oscilloscope to verify the flash was being read, and a twinkie to sniff the PD negotiation, which confirmed that no request for 20v was being sent. This was where I finished that day.

Over the weekend that followed I dug into what I'd seen and determined that either I'd killed the SPI MISO port (the programmer I used was 5v, not 3.3v), or I just had bad firmware and the chip had some good defaults. I created a new firmware image from scratch, and loaded that.

Sure enough it worked first try. Once I confirmed 20v was coming from the output ports I attached it to my recently acquired HP 6051A DC load where it happily sank 45W for a while, then I attached the cable part of a Lenovo barrel to slim adapter and plugged it into my X1 where it started charging right away.

At linux.conf.au last week I gave (part of) a hardware miniconf talk about USB-C & USB-PD, which open source hardware folk might be interested in. Over the last few days while visiting my dad down in Gippsland I made the edits to fix the footprint and sent a new rev to the manufacturer for some new experiments.

Of course at CES Lenovo announced that this years ThinkPads would feature USB-C ports and allow charging through them, and due to laziness I never got around to replacing my T430, so I'm planning to order a T470 as soon as they're available, making my adapter obsolete.





Rough timeline:
  • April 21st 2016, decide to start working on the project
  • April 28th, devkits arrive
  • May 8th, schematic largely complete, work starts on PCB layout
  • June 14th, order sent to CM
  • ~July 6th, CM ships order to me (to California, then hand carried to me by a coworker)
  • early August, boards arrive from California
  • Somewhere here I try, and fail, to reflow a modified connector onto a board
  • October 13th, California cowoker helps to (successfully) reflow a USB-C connector onto a board for testing
  • January 6th 2017, finally got around to reinforce the connector with epoxy and started testing, try loading firmware but no dice
  • January 10th, redo firmware, it works, test on DC load, then modify a ThinkPad-slim adapater and test on a real ThinkPad
  • January 25/26th, fixed USB-C connector footprint, made one more minor tweak, sent order for rev2 to CM, then some back & forth over some tolerance issues they're now stricter on.


1: There's a previous variant of USB-PD that works on the older A/B connector, but, as far as I'm aware, was never implemented in any notable products.
Categories: thinktime

Michael Still: Recording performance information from short lived processes with prometheus

Sat 28th Jan 2017 17:01
Now that I'm recording basic statistics about the behavior of my machines, I now want to start tracking some statistics from various scripts I have lying around in cron jobs. In order to make myself sound smarter, I'm going to call these short lived scripts "ephemeral scripts" throughout this document. You're welcome.

The promethean way of doing this is to have a relay process. Prometheus really wants to know where to find web servers to learn things from, and my ephemeral scripts are both not permanently around and also not running web servers. Luckily, prometheus has a thing called the pushgateway which is designed to handle this situation. I can run just one of these, and then have all my little scripts just tell it things to add to its metrics. Then prometheus regularly scrapes this one process and learns things about those scripts. Its like a game of Telephone, but for processes really.

First off, let's get the pushgateway running. This is basically the same as the node_exporter from last time:

$ wget https://github.com/prometheus/pushgateway/releases/download/v0.3.1/pushgateway-0.3.1.linux-386.tar.gz $ tar xvzf pushgateway-0.3.1.linux-386.tar.gz $ cd pushgateway-0.3.1.linux-386 $ ./pushgateway

Let's assume once again that we're all adults and did something nicer than that involving configuration management and init scripts.

The pushgateway implements a relatively simple HTTP protocol to add values to the metrics that it reports. Note that the values wont change once set until you change them again, they're not garbage collected or aged out or anything fancy. Here's a trivial example of adding a value to the pushgateway:

echo "some_metric 3.14" | curl --data-binary @- http://pushgateway.example.org:9091/metrics/job/some_job

This is stolen straight from the pushgateway README of course. The above command will have the pushgateway start to report a metric called "some_metric" with the value "3.14", for a job called "some_job". In other words, we'll get this in the pushgateway metrics URL:

# TYPE some_metric untyped some_metric{instance="",job="some_job"} 3.14

You can see that this isn't perfect because the metric is untyped (what types exist? we haven't covered that yet!), and has these confusing instance and job labels. One tangent at a time, so let's explain instances and jobs first.

On jobs and instances

Prometheus is built for a universe a little bit unlike my home lab. Specifically, it expects there to be groups of processes doing a thing instead of just one. This is especially true because it doesn't really expect things like the pushgateway to be proxying your metrics for you because there is an assumption that every process will be running its own metrics server. This leads to some warts, which I'll explain in a second. Let's start by explaining jobs and instances.

For a moment, assume that we're running the world's most popular wordpress site. The basic architecture for our site is web frontends which run wordpress, and database servers which store the content that wordpress is going to render. When we first started our site it was all easy, as they could both be on the same machine or cloud instance. As we grew, we were first forced to split apart the frontend and the database into separate instances, and then forced to scale those two independently -- perhaps we have reasonable database performance so we ended up with more web frontends than we did database servers.

So, we go from something like this:



To an architecture which looks a bit like this:



Now, in prometheus (i.e. google) terms, there are three jobs here. We have web frontends, database masters (the top one which is getting all the writes), and database slaves (the bottom one which everyone is reading from). For one of the jobs, the frontends, there is more than one instance of the job. To put that into pictures:



So, the topmost frontend job would be job="fe" and instance="0". Google also had a cool way to lookup jobs and instances via DNS, but that's a story for another day.

To harp on a point here, all of these processes would be running a web server exporting metrics in google land -- that means that prometheus would know that its monitoring a frontend job because it would be listed in the configuration file as such. You can see this in the configuration file from the previous post. Here's the relevant snippet again:

- job_name: 'node' static_configs: - targets: ['molokai:9100', 'dell:9100', 'eeebox:9100']

The job "node" runs on three targets (instances), named "molokai:9100", "dell:9100", and "eeebox:9100".

However, we live in the ghetto for these ephemeral scripts and want to use the pushgateway for more than one such script, so we have to tell lies via the pushgateway. So for my simple emphemeral script, we'll tell the pushgateway that the job is the script name and the instance can be an empty string. If we don't do that, then prometheus will think that the metric relates to the pushgateway process itself, instead of the ephemeral process.

We tell the pushgateway what job and instance to use like this:

echo "some_metric 3.14" | curl --data-binary @- http://localhost:9091/metrics/job/frontend/instance/0

Now we'll get this at the metrics URL:

# TYPE some_metric untyped some_metric{instance="",job="some_job"} 3.14 some_metric{instance="0",job="frontend"} 3.14

The first metric there is from our previous attempt (remember when I said that values are never cleared out?), and the second one is from our second attempt. To clear out values you'll need to restart the pushgateway process. For simple ephemeral scripts, I think its ok to leave the instance empty, and just set a job name -- as long as that job name is globally unique.

We also need to tell prometheus to believe our lies about the job and instance for things reported by the pushgateway. The scrape configuration for the pushgateway therefore ends up looking like this:

- job_name: 'pushgateway' honor_labels: true static_configs: - targets: ['molokai:9091']

Note the honor_labels there, that's the believing the lies bit.

There is one thing to remember here before we can move on. Job names are being blindly trusted from our reporting. So, its now up to us to keep job names unique. So if we export a metric on every machine, we might want to keep the job name specific to the machine. That said, it really depends on what you're trying to do -- so just pay attention when picking job and instance names.

On metric types

Prometheus supports a couple of different types for the metrics which are exported. For now we'll discuss two, and we'll cover the third later. The types are:

  • Gauge: a value which goes up and down over time, like the fuel gauge in your car. Non-motoring examples would include the amount of free disk space on a given partition, the amount of CPU in use, and so forth.
  • Counter: a value which always increases. This might be something like the number of bytes sent by a network card -- the value only resets when the network card is reset (probably by a reboot). These only-increasing types are valuable because its easier to do maths on them in the monitoring system.
  • Histograms: a set of values broken into buckets. For example, the response time for a given web page would probably be reported as a histogram. We'll discuss histograms in more detail in a later post.


I don't really want to dig too deeply into the value types right now, apart from explaining that our previous examples haven't specified a type for the metrics being provided, and that this is undesirable. For now we just need to decide if the value goes up and down (a gauge) or just up (a counter). You can read more about prometheus types at https://prometheus.io/docs/concepts/metric_types/ if you want to.

A typed example

So now we can go back and do the same thing as before, but we can do it with typing like adults would. Let's assume that the value of pi is a gauge, and goes up and down depending on the vagaries of space time. Let's also show that we can add a second metric at the same time because we're fancy like that. We'd therefore need to end up doing something like (again heavily based on the contents of the README):

cat <<EOF | curl --data-binary @- http://pushgateway.example.org:9091/metrics/job/frontend/instance/0 # TYPE some_metric gauge # HELP approximate value of pi in the current space time continuum some_metric 3.14 # TYPE another_metric counter # HELP another_metric Just an example. another_metric 2398 EOF

And we'd end up with values like this in the pushgateway metrics URL:

# TYPE some_metric gauge some_metric{instance="0",job="frontend"} 3.14 # HELP another_metric Just an example. # TYPE another_metric counter another_metric{instance="0",job="frontend"} 2398

A tangible example

So that's a lot of talking. Let's deploy this in my home lab for something actually useful. The node_exporter does not report any SMART health details for disks, and that's probably a thing I'd want to alert on. So I wrote this simple script:

#!/bin/bash hostname=`hostname | cut -f 1 -d "."` for disk in /dev/sd[a-z] do disk=`basename $disk` # Is this a USB thumb drive? if [ `/usr/sbin/smartctl -H /dev/$disk | grep -c "Unknown USB bridge"` -gt 0 ] then result=1 else result=`/usr/sbin/smartctl -H /dev/$disk | grep -c "overall-health self-assessment test result: PASSED"` fi cat <<EOF | curl --data-binary @- http://localhost:9091/metrics/job/$hostname/instance/$disk # TYPE smart_health_passed gauge # HELP whether or not a disk passed a "smartctl -H /dev/sdX" smart_health_passed $result EOF done

Now, that's not perfect and I am sure that I'll re-write this in python later, but it is actually quite useful already. It will report if a SMART health check failed, and now I could write an alerting rule which looks for disks with a health value of 0 and send myself an email to go to the hard disk shop. Once your pushgateways are being scraped by prometheus, you'll end up with something like this in the console:



I'll explain how to turn this into alerting later.

Tags for this post: prometheus monitoring ephemeral_script pushgateway
Related posts: A pythonic example of recording metrics about ephemeral scripts with prometheus; Basic prometheus setup; Mona Lisa Overdrive; The Diamond Age ; Buying Time; The System of the World

Comment
Categories: thinktime

Michael Still: Basic prometheus setup

Fri 27th Jan 2017 17:01
I've been playing with prometheus for monitoring. It feels quite familiar to me because its based on an internal google technology called borgmon, but I suspect that means it feels really weird to everyone else.

The first thing to realize is that everything at google is a web server. Your short lived tool that copies some files around probably runs a web server. All of these web servers have built in URLs which report the progress and status of the task at hand. Prometheus is built to: scrape those web servers; aggregate the data; store the data into a time series database; and then perform dashboarding, trending and alerting on that data.

The most basic example is to just export metrics for each machine on my home network. This is the easiest first step, because we don't need to build any software to do this. First off, let's install node_exporter on each machine. node_exporter is the tool which runs a web server to export metrics for each node. Everything in prometheus land is written in go, which is new to me. However, it does make running node exporter easy -- just grab the relevant binary from https://prometheus.io/download/, untar, and run. Let's do it in a command line script example thing:

$ wget https://github.com/prometheus/node_exporter/releases/download/v0.14.0-rc.1/node_exporter-0.14.0-rc.1.linux-386.tar.gz $ tar xvzf node_exporter-0.14.0-rc.1.linux-386.tar.gz $ cd node_exporter-0.14.0-rc.1.linux-386 $ ./node_exporter

That's all it takes to run the node_exporter. This runs a web server at port 9100, which exposes the following metrics:

$ curl -s http://localhost:9100/metrics | grep filesystem_free | grep 'mountpoint="/data"' node_filesystem_free{device="/dev/mapper/raidvg-srvlv",fstype="xfs",mountpoint="/data"} 6.811044864e+11

Here you can see that the system I'm running on is exporting a filesystem_free value for the filesystem mounted at /data. There's a lot more than that exported, and I'd encourage you to poke around at that URL a little before continuing on.

So that's lovely, but we really want to record that over time. So let's assume that you have one of those running on each of your machines, and that you have it setup to start on boot. I'll leave the details of that out of this post, but let's just say I used my existing puppet infrastructure.

Now we need the central process which collects and records the values. That's the actual prometheus binary. Installation is again trivial:

$ wget https://github.com/prometheus/prometheus/releases/download/v1.5.0/prometheus-1.5.0.linux-386.tar.gz $ tar xvzf prometheus-1.5.0.linux-386.tar.gz $ cd prometheus-1.5.0.linux-386

Now we need to move some things around to install this nicely. I did the puppet equivalent of:

  • Moving the prometheus file to /usr/bin
  • Creating an /etc/prometheus directory and moving console_libraries and consoles into it
  • Creating a /etc/prometheus/prometheus.yml config file, more on the contents on this one in a second
  • And creating an empty data directory, in my case at /data/prometheus


The config file needs to list all of your machines. I am sure this could be generated with puppet templating or something like that, but for now here's my simple hard coded one:

# my global config global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Attach these labels to any time series or alerts when communicating with # external systems (federation, remote storage, Alertmanager). external_labels: monitor: 'stillhq' # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: # - "first.rules" # - "second.rules" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=` to any timeseries scraped from this config. - job_name: 'prometheus' # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ['molokai:9090'] - job_name: 'node' static_configs: - targets: ['molokai:9100', 'dell:9100', 'eeebox:9100']

Here you can see that I want to scrape each of my web servers which exports metrics every 15 seconds, and I also want to calculate values (such as firing alerts) every 15 seconds too. This might not scale if you have bajillions of processes or machines to monitor. I also label all of my values as coming from my domain, so that if I ever aggregate these values with another prometheus from somewhere else the origin will be clear.

The other interesting bit for now is the scrape configuration. This lists the metrics exporters to monitor. In this case its prometheus itself (molokai:9090), and then each of my machines in the home lab (molokai, dell, and eeebox -- all on port 9100). Remember, port 9090 is the prometheus binary itself and port 9100 is that node_exporter binary we now have running on all of our machines.

Now if we start prometheus, it will do its thing. There is some configuration which needs to be passed on the command line here (instead of in the configration file), so my command line looks like this:

/usr/bin/prometheus -config.file=/etc/prometheus/prometheus.yml \ -web.console.libraries=/etc/prometheus/console_libraries \ -web.console.templates=/etc/prometheus/consoles \ -storage.local.path=/data/prometheus

Prometheus also presents an interactive user interface on port 9090, which is handy. Here's an example of it graphing the load average on each of my machines (it was something which caused a nice jaggy line):



You can see here that the user interface has a drop down for selecting values that are known, and that the key at the bottom tells you things about each time series in the graph. So for example, if we added {instance="eeebox:9100"} to the end of the value in the text box at the top, then we'd be filtering for values with that label set, and would as a result only show one value in the graph (the one for eeebox).

If you're interested in very simple dashboarding of basic system metrics, that's actually all you need to do. In my next post about prometheus I'm going to show how to write your own binary which exports values to be graphed. In my case, the temperature outside my house.

Tags for this post: prometheus monitoring node_exporter
Related posts: Recording performance information from short lived processes with prometheus; A pythonic example of recording metrics about ephemeral scripts with prometheus; Mona Lisa Overdrive; The Diamond Age ; Buying Time; The System of the World

Comment
Categories: thinktime

Binh Nguyen: Life in the Philippines, Duterte Background, and More

Fri 27th Jan 2017 05:01
- interesting history. Aetan people were original inhabitants of Philippines. Now a minority group. Malaysian and Indonesian people settled there thereafter. Then Chinese and Muslims came. Finally, Spanish and US colonialism and now relative independence... Two main wars that changed it's history. That between it's colonisers (US and Spain) and between itself and the US which helped them
Categories: thinktime

Lev Lafayette: Data Centre Preparation: A Summary

Thu 26th Jan 2017 19:01

When installing a new system (whether HPC, cloud, or just a bunch of servers, disks, etc), they must be housed. Certainly this can be without any specialist environment, especially if one is building a small test cluster; for example with half-a-dozen old but homogeneous systems, each connected with 100BASE-TX ethernet to a switch etc.

read more

Categories: thinktime

Binh Nguyen: Does the Military-Industrial Complex Make Sense?, Random Thoughts, and More

Thu 26th Jan 2017 04:01
Given the importance of the Military-Industrial Complex now plays in many economies I thought it'd be worthwhile to take a look at whether it actually makes sense? - the military industrial complex doesn't work the way that a lot of people think. It's actually linked in to the way the financial system works and the way the world panned out after the World Wars. Since the US/West came out
Categories: thinktime

Tridge on UAVs: ArduPilot on the CL84 TiltWing VTOL

Wed 25th Jan 2017 18:01

ArduPilot now supports TiltWing VTOL aircraft! First test flights done earlier today in Canberra

We did the first test flights of the Unique Models CL84 TiltWing VTOL with ArduPilot today. Support for tilt-wing aircraft is in the 3.7.1 stable release, with enhancements to support the tilt mechanism of the CL84 added for the upcoming 3.8.0 release.

This CL84 model operates as a tricopter in VTOL flight, and as a normal aileron/elevator plane in forward flight. The unusual thing from the point of view of ArduPilot is that the tilt mechanism uses a retract style servo, which means it can be commanded to be fully up or fully down, but you can't ask it to hold any angle in between. That makes for some interesting challenges in the VTOL transition code.

For the 3.8.0 release there is a new parameter Q_TILT_TYPE that controls whether the tilt mechanism for tiltrotors and tiltwings is a continuous servo or a binary (retract style) servo. In either case the Q_TILT_RATE parameter sets the rate at which the servo changes angle (in degrees/second).

This aircraft has been previously tested in hover by Greg Covey (see http://discuss.ardupilot.org/t/tiltrotor-support-for-plane/8805) but has not previously been tested with automatic transitions.
Many thanks to Grant, Peter, James and Jack from CanberraUAV for their assistance with testing the CL84 today and the videos and photos!

Categories: thinktime

Lev Lafayette: Keeping The Build Directory in EasyBuild and Paraview Plugins

Tue 24th Jan 2017 15:01

By default, EasyBuild will delete the build directory of an successful installation and will save failures of attempted installs for diagnostic purposes. In some cases however, one may want to save the build directory. This can be useful, for example, for diagnosis of *successful* builds. Another example is the installation of plugins for applications such as Paraview, which *require* access to the the successful buildir.

read more

Categories: thinktime

Lev Lafayette: The Provision of HPC Resources to Top Universities

Tue 24th Jan 2017 13:01

Recently, the University of Melbourne is ranked #1 in Australia and #33 in the world, according to the Times Higher Education World University Rankings of 2015-2016 [1]. The rankings are based on a balanced metric between citations, industry income, international outlook, research, and teaching.

read more

Categories: thinktime

Binh Nguyen: Saving Capitalist Democracy 2, Random Thoughts, and More

Tue 24th Jan 2017 01:01
Obviously, this next part is a continuation of one of my others posts:  http://dtbnguyen.blogspot.com/2016/11/saving-capitalist-democracy-random.html - just get better at doing things. If counties such as Cuba can deliver good healthcare on limited resources surely more prosperous countries can as well? http://www.smh.com.au/lifestyle/health-and-wellbeing/
Categories: thinktime

Ben Martin: OHC2017 zero to firmware in < 2 hours

Mon 23rd Jan 2017 23:01
I thought I'd make some modifications along the way in the build, so I really couldn't do a head to head with the build time I had heard about (a lowish number of minutes). The on/off switch being where it was didn't fit my plans so I made that an off boarder and also moved the battery to off board so that I might use the space below the screen for something, perhaps where the stylus lives in the case.


I did manage to go from opening the packet to firmware environment setup, built, and uploaded in less than 2 hours total. No bridges, no hassles, cable shrinks around the place and 90 degree headers across the bottom of the board for future tinkering.

This is going to look extremely stylish in a CNCed hardwood case. My current plan is to turn it into a smart remote control. Rotary encoder for volume, maybe modal so that the desired "program" can be selected quickly from a list without needing to flick or page through things.

Categories: thinktime

Michael Still: Gods of Metal

Mon 23rd Jan 2017 23:01



ISBN: 9780141982267
LibraryThing
In this follow-up to Command and Control, Schlosser explores the conscientious objectors and protestors who have sought to highlight not just the immorality of nuclear weapons, but the hilariously insecure state the US government stores them in. In all seriousness, we are talking grannies with heart conditions being able to break in.

My only real objection to this book is that is more of a pamphlet than a book, and feels a bit like things that didn't make it into the main book. That said, it is well worth the read.

Tags for this post: book eric_schlosser nuclear weapons safety protest
Related posts: Command and Control; Random linkage; Fast Food Nation; Starfish Prime; Why you should stand away from the car when the cop tells you to; Random fact for the day Comment Recommend a book
Categories: thinktime

Binh Nguyen: Explaining Prophets 3, Random Thoughts, and More

Sat 21st Jan 2017 04:01
Obvious continuation of my other two posts: http://dtbnguyen.blogspot.com/2016/12/explaining-prophets-fake-news-and-more_26.html http://dtbnguyen.blogspot.com/2017/01/explaining-prophets-2-what-is-liberal.html - many of the world's top scientists seemed to be have possessed the same abilities with regards to 'pre-cognition' and 'lucid dreaming'. Like the other scientists I mentioned in the
Categories: thinktime

Simon Lyall: Linux.conf.au 2017 – Friday – Closing

Sat 21st Jan 2017 01:01

Code of Consult and Safety

  • Badge
    • Putting prefered pronoun
    • Emoji
  • Free Childcare
    • Sponsored by Github
    • Approx 10 kids
  • Assistance Grants
  • Attendees
    • Breakdown by gender etc
    • Roughly 25% of attendees and speakers not men
  • More numbers
    • 104 Matrix chat users
    • 554 attendees
    • 2900 coffee cups
    • Network claimed to 7.5Gb/s
    • 1.6 TB over the week, 200Mb/s max
    • 30 Session Chairs
    • 12 Miniconfs
    • 491 Proposals (130 more than the others)
    • 6 Tutorials, 75 talks, 80 speakers
    • 4 Keynote speakers
    • 21 Sponsors

Linux.conf.au 2018 – Sydney

  • A little bit of history repeating
  • 2001, 2007, 2018
  • Venue is UTS
  • 5 minutes to food, train station
  • https://lca2018.org
  • @lca2018 on twitter
  • Looking for a few extra helpers

Raffle

  • In support of Outreachy
  • 3 interns funded

Final Bit

  • Thanks to team members

 

 

Share

Categories: thinktime

David Rowe: Balloon Meets Gum Tree

Fri 20th Jan 2017 17:01

Today I attended the launch of Horus 38, a high altitude ballon flight carrying 4 payloads, one of which was the latest version of the SSDV system Mark and I have been working on.

Since the last launch, Mark and I have put a lot of work into carefully integrating a rate 0.8 LDPC code developed by Bill, VK5DSP. The coded 115 kbit/s system is now working error free on the bench down to -112dBm, and can transfer a new hi-res image in just a few seconds. With a tx power of 50mW, we estimate a line of site range of 100km. We are now out-performing commercial FSK telemetry chip sets using our open source system.

However disaster struck soon after launch at Mt Barker High School oval. High winds blew the payloads into a tree and three of them were chopped off, leaving the balloon and a lone payload to continue into the stratosphere. One of the payloads that hit the tree was our SSDV, tumbling into a neighboring back yard. Oh well, we’ll have another try in December.

Now I’ve been playing a lot of Kerbal Space Program lately. It’s got me thinking about vectors, for example in Kerbal I learned how to land two space craft at exactly the same point on the Mun (Moon) using vectors and some high school equations of motion. I’ve also taken up sailing – more vectors involved in how sails propel a ship.

The high altitude balloon consists of a latex, helium filled weather balloon a few meters in diameters. Strung out beneath that on 50m of fishing line are a series of “payloads”, our electronic gizmos in little foam boxes. The physical distance helps avoid interference between the radios in each box.

While the balloon was held near the ground, it was keeled over at an angle:

It’s tethered, and not moving, but is acted on by the force of the lift from the helium and drag from the wind. These forces pivot the balloon around an arc with a radius of the tether. If these forces were equal the balloon would be at 45 degrees. Today it was lower, perhaps 30 degrees.

When the balloon is released, it is accelerated by the wind until it reaches a horizontal velocity that matches the wind speed. The payloads will also reach wind speed and eventually hang vertically under the balloon due to the force of gravity. Likewise the lift accelerates the balloon upwards. This is balanced by drag to reach a vertical velocity (the ascent rate). The horizontal and vertical velocity components will vary over time, but lets assume they are roughly constant over the duration of our launch.

Now today the wind speed was 40 km/hr, just over 10 m/s. Mark suggested a typical balloon ascent rate of 5 m/s. The high school oval was 100m wide, so the balloon would take 100/10 = 10s to traverse the oval from one side to the gum tree. In 10 seconds the balloon would rise 5×10 = 50m, approximately the length of the payload string. Our gum tree, however, rises to a height of 30m, and reached out to snag the lower 3 payloads…..

Categories: thinktime

Pages