Planet Linux Australia
This year, the Kernel Summit is divided into two components:
- An invitation-only maintainer summit of 30 people total, and;
- An open kernel summit technical track which is open to all attendees of OSS Europe.
The security session is part of the latter. The preliminary agenda for the kernel summit technical track was announced by Ted Ts’o here:
There is also a preliminary agenda for the security session, here:
Currently, the agenda includes an update from Kees Cook on the Kernel Self Protection Project, and an update from Jarkko Sakkinen on TPM support. I’ll provide a summary of the recent Linux Security Summit, depending on available time, perhaps focusing on security namespacing issues.
This agenda is subject to change and if you have any topics to propose, please send an email to the ksummit-discuss list.
Two months ago, Troy Hunt, the security professional behind Have I been pwned?, released an incredibly comprehensive password list in the hope that it would allow web developers to steer their users away from passwords that have been compromised in past breaches.
While the list released by HIBP is hashed, the plaintext passwords are out there and one should assume that password crackers have access to them. So if you use a password on that list, you can be fairly confident that it's very easy to guess or crack your password.
I wanted to check my active passwords against that list to check whether or not any of them are compromised and should be changed immediately. This meant that I needed to download the list and do these lookups locally since it's not a good idea to send your current passwords to this third-party service.
PLEASE NOTE NEW LOCATION
Tuesday, November 7, 2017
6:30 PM to 8:30 PM
Mail Exchange Hotel
688 Bourke St, Melbourne VIC 3000
Mail Exchange Hotel, 688 Bourke St, Melbourne VIC 3000
Food and drinks will be available on premises.
Linux Users of Victoria is a subcommittee of Linux Australia.November 7, 2017 - 18:30
Eight years ago I installed a pack of 36 Lithium cells in my EV. After about 50,000km and several near-death battery pack experiences (over discharge) the range decreased beneath a useful level so I have just purchased a new pack.
Same sort of cells, CALB 100AH, 3.2V per cell (80km range). The pack was about AUD$6,000 delivered and took an afternoon to install. I’ve adjusted my Zivan NG3 to cut out at an average of 3.6 v/cell (129.6V), and still have the BMS system that will drop out the charger if any one cell exceeds 4.1V.
The original pack was rated at 10 years (3000 cycles) and given the abuse we subjected it to I’m quite pleased it lasted 8 years. I don’t have a fail-safe battery management system like a modern factory EV so we occasionally drove the car when dead flat. While I could normally pick this problem quickly from the instrumentation my teenage children tended to just blissfully drive on. Oh well, this is an experimental hobby, and mistakes will be made. The Wright brothers broke a few wings……
I just took the car with it’s new battery pack for a 25km test drive and all seems well. The battery voltage is about 118V at rest, and 114V when cruising at 60 km/hr. It’s not dropping beneath 110V during acceleration, much better than the old pack which would sag beneath 100V. I guess the internal resistance of the new cells is much lower.
I plan to keep driving my little home-brew EV until I can by a commercial EV with a > 200km range here in Australia for about $30k, which I estimate will happen around 2020.
It’s nice to have my little EV back on the road.
I have returned to cycling a couple weeks ago and I am taking part in the MS Sydney to the Gong Ride - The Ride to Fight Multiple Sclerosis.
Though it would be a huge fun and a great challenge to ride over 80km along the Sydney coast, this is a fundraising event and entry fee only covers event staging costs. Every dollar you DONATE will go directly to ensuring the thousands of Australians with multiple sclerosis are able to receive the support and care they need to live well.
Please DONATE now to support my ride and change the lives of Australians living with multiple sclerosis.
Thank you for your support.
Sanjeev Sharma – When DevOps met SRE: From Apollo 13 to Google SRE
- Author of Two DevOps Bookks
- Apollo 13
- Who were the real heroes? The guys back at missing control. The Astronaunts just had to keep breathing and not die
- Best Practice for Incident management
- Consider Alternatives
- Change it around
- Big Hurdles to adoption of DevOps in Enterprise
- Literature is Only looking at one delivery platform at a time
- Big enterprise have hundreds of platforms with completely different technologies, maturity levels, speeds. All interdependent
- He Divides
- Industrialised Core – Value High, Risk Low, MTBF
- Agile/Innovation Edge – Value Low, Risk High, Rapid change and delivery, MTTR
- Need normal distribution curve of platforms across this range
- Need to be able to maintain products at both ends in one IT organisation
- 6 capabilities needed in IT Organisation
- Planning and architecture.
- Your Delivery pipeline will be as fast as the slowest delivery pipeline it is dependent on
- Modernizing to Microservices based architecture: Refactoring code and data and defining the APIs
- Application Deployment Automation and Environment Orchestration
- Devs are paid code, not maintain deployment and config scripts
- Ops must provide env that requires devs to do zero setup scripts
- Test Service and Environment Virtualisation
- If you are doing 2week sprints, but it takes 3-weeks to get a test server, how long are your sprints
- Release Management
- No good if 99% of software works but last 1% is vital for the business function
- Operational Readiness for SRE
- Shift between MTBF to MTTR
- MTTR = Mean time to detect + Mean time to Triage + Mean time to restore
- + Mean time to pass blame
- Antifragile Systems
- Things that neither are fragile or robust, but rather thrive on chaos
- Cattle not pets
- Servers may go red, but services are always green
- DevOps: “Everybody is responsible for delivery to production”
- SRE: “(Everybody) is responsible for delivering Continuous Business Value”
- Planning and architecture.
Marcus Bristol (Pushpay) – Moving fast without crashing
- Low tolerance for errors in production due to being in finance
- Deploy twice per day
- Just Culture – Balance safety and accountability
- What rule?
- Who did it?
- How bad was the breach?
- Who gets to decide?
- Example of Retributive Culture
- KPIs reflect incidents.
- If more than 10% deploys bad then affect bonus
- Reduced number of deploys
- Restorative Culture
- Blameless post-mortem
- Can give detailed account of what happened without fear or retribution
- Happens after every incident or near-incident
- Written Down in Wiki Page
- So everybody has the chance to have a say
- Summary, Timeline, impact assessment, discussion, Mitigations
- Mitigations become highest-priority work items
- Our Process
- Feature Flags
- Lots of small PRs
- Code Review
- Testers paired to devs so bugs can be fixed as soon as found
- Automated tested
- Pollination (reviews of code between teams)
- Posts to Slack when feature flag has been changed
- Nags about feature flags that seems to be hanging around in QA
- Nags about Flags that have been good in prod for 30+ days
- Every merge
- PRs awaiting reviews for long time (days)
- Missing postmortun migrations
- Status of builds in build farm
- When deploy has been made
- Health of API
- Answer queries on team member list
- Create ship train of PRs into a build and user can tell bot to deploy to each environment
Michael Coté – Not actually a DevOps Talk
- Goal: deliver value, weekly reliably, with small patches
- Management must be the first to fail and transform
- Standardize on a platform: special snow flakes are slow, expensive and error prone (see his slide, good list of stuff that should be standardize)
- Ramping up: “Pilot low-risk apps, and ramp-up”
- Pair programming/working
- Half the advantage is people speed less time on reddit “research”
- Don’t go to meetings
- Automate compliance, have what you do automatic get logged and create compliance docs rather than building manually.
- Crafting Your Cloud-Native Strategy
Sajeewa Dayaratne – DevOps in an Embedded World
- Challenges on Embedded
- Hardware – resource constrinaed
- Debugging – OS bugs, Hardware Bugs, UFO Bugs – Oscilloscopes and JTAG connectors are your friend.
- Environment – Thermal, Moisture, Power consumption
- Deploy to product – Multi-month cycle, hard of impossible to send updates to ships at sea.
- Principles of Devops , equally apply to embedded
- High Frequency
- Reduce overheads
- Improve defect resolution
- Reduce response times
- Small Sonar, Navigation for medium boats, Displays for sail (eg Americas cup). Navigation displays for large ships
- Dev around world, factory in Mexico
- 5 million lines of code
- 61 Hardware Products supported – Increasing steadily, very long lifetimes for hardware
- Complex network of products – lots of products on boat all connected, different versions of software and hardware on the same boat
- Old codebase
- Backward compatible with old hardware
- Needs to support new hardware
- Desire new features on all products
- What does this mean
- Defects were found too late
- Very high cost of bugs found late
- Software stabilization taking longer
- Manual test couldn’t keep up
- Cost increasing , including opportunity cost
- Does CI/CD provide answer?
- But will it work here?
- Case Study from HP. Large-Scale Agile Development by Gary Gruver
- Our Plan
- Improve tolls and archetecture
- Build Speeds
- Automated testing
- Code quality control
- Previous VCS
- Proprietary tool with limit support and upgrades
- Limited integration
- Lack of CI support
- No code review capacity
- Move to git
- Code reviews
- Integrated CI
- Supported by tools
- Had a configurable codebase already
- Fairly common hardware platform (only 9 variations)
- Had runtime feature flags
- Cyclic dependancies – 1.5 years to clean these up
- Singletons – cut down
- Promote unit testability – worked on
- Many branches – long lived – mega merges
- Went to a single Branch model, feature flags, smaller batch sizes, testing focused on single branch
- Improve build speed
- Start 8 hours to build Linux platform, 2 hours for each app, 14+ hours to build and package a release
- Increase speed
- Parallel Builds
- What did
- 4-5hs down to 1h
- Test automation
- Existing was mock-ups of the hardware to not typical
- Started with micro-test
- Unit testing (simulator)
- Unit testing (real hardware)
- Build Tools
- Software tools (n2k simulator, remote control)
- Hardware tools ( Mimic real-world data, re purpose existing stuff)
- UI Test Automation
- Build or Buy
- Functional testing vs API testing
- HW Test tools
- Took 6 hours to do full test on hardware.
- Commit -> pull request
- Automated Build / Unit Tests
- Daily QA Build
- Configuration as code
- Code Quality tools
- Simulate more hardware
- Increase analytics and reporting
- Fully simulated test env for dev (so the devs don’t need the hardware)
- Scale – From internal infrastructure to the cloud
- Grow the team
- Lessons Learnt
- Collect Data
- Get Executive Buy in
- Change your tolls and processes if needed
- Test automation is the key
- Invest in HW
- Focus on good software design for Everything
This was done in Fusion360 which allows bringing in STL files of things like phones and cutting those out of another body. It took a while to work out the ball end toolpath but I finally worked out how to get something that worked reasonably well. The chomps in the side allow fingers to securely lift the phone off the charger.
It will be interesting to play with sliced objects in wood. Layering 3D cuts to build up objects that are 10cm (or about 4 layers) tall.
Mirror, mirror, on the wall: testing Conway’s Law in open source communities – Lindsay Holmwood
- The map between the technical organisation and the technical structure.
- Easy to find who owns something, don’t have to keep two maps in your head
- Needs flexibility of the organisation structure in order to support flexibility in a technical design
- Conway’s “Law” really just adage
- Complexity frequently takes the form of hierarchy
- Organisations that mirror perform badly in rapidly changing and innovative enviroments
Metrics that Matter – Alison Polton-Simon (Thoughtworks)
- Metrics Mania – Lots of focus on it everywhere ( fitbits, google analytics, etc)
- How to help teams improve CD process
- Define CD
- Software consistently in a deployable state
- Get fast, automated feedback
- Do push-button deployments
- Identifying metrics that mattered
- Talked to people
- Contextual observation
- Rapid prototyping
- Pilot offering
- 4 big metrics
- Deploy ready builds
- Cycle time
- Mean time between failures
- Mean time to recover
- Number of Deploy-ready builds
- How many builds are ready for production?
- Routine commits
- Testing you can trust
- Product + Development collaboration
- Cycle Time
- Time it takes to go from a commit to a deploy
- Efficient testing (test subset first, faster testing)
- Appropriate parallelization (lots of build agents)
- Optimise build resources
- Case Study
- Monolithic Codebase
- Hand-rolled build system
- Unreliable environments ( tests and builds fail at random )
- Validating a Pull Request can take 8 hours
- Coupled code: isolated teams
- Wide range of maturity in testing (some no test, some 95% coverage)
- No understanding of the build system
- Releases routinely delay (10 months!) or done “under the radar”
- Focus in case study
- Reducing cycle time, increasing reliability
- Extracted services from monolith
- Pipelines configured as code
- Build infrastructure provisioned as docker and ansible
- Cycle time for one team 4-5h -> 1:23
- Deploy ready builds 1 per 3-8 weeks -> weekly
- Mean time between failures
- Quick feedback early on
- Robust validation
- Strong local builds
- Should not be done by reducing number of releases
- Mean time to recover
- How long back to green?
- Monitoring of production
- Automated rollback process
- Informative logging
- Case Study 2
- 1.27 million lines of code
- High cyclomatic complexity
- Tightly coupled
- Long-running but frequently failing testing
- Isolated teams
- Pipeline run duration 10h -> 15m
- MTTR Never -> 50 hours
- Cycle time 18d -> 10d
- Created a dashboard for the metrics
- Meaningless Metrics
- The company will build whatever the CEO decides to measure
- Lines of code produced
- Number of Bugs resolved. – real life duplicates Dilbert
- Developers Hours / Story Points
- Lack of team buy-in
- Easy to agme
- Unintended consiquences
- Measuring inputs, not impacts
- Make your own metrics
- Map your path to production
- Highlights pain points
Using Bots to Scale incident Management – Anthony Angell (Xero)
- Who we are
- Single Team
- Just a platform Operations team
- SRE team is formed
- Ops teams plus performance Engineering team
- Incident Management
- In Bad old days – 600 people on a single chat channel
- Created Framework
- what do incidents look like, post mortems, best practices,
- How to make incident management easy for others?
- ChatOps (Based on Hubot)
- Automated tour guide
- Multiple integrations – anything with Rest API
- Reducing time to restore
- Release register – API hook to when changes are made
- Issue report form
- how many users & location
- when started
- anyone working on it already
- Anything else to add.
- Chat Bot for incident
- Populates for an pushes to production channel, creates pagerduty alert
- Creates new slack channel for incident
- Can automatically update status page from chat and page senior managers
- Can Create “status updates” which record things (eg “restarted server”), or “yammer updates” which get pushed to social media team
- Creates a task list automaticly for the incident
- Page people from within chat
- At the end: Gives time incident lasted, archives channel
- Post Mortum
- More integrations
- Report card
- Change tracking
- Incident / Alert portal
- High Availability – dockerisation
DevSecOps – Anthony Rees
“When Anthrax and Public Enemy came together, It was like Developers and Operations coming together”
- Everybody is trying to get things out fast, sometimes we forget about security
- Structural efficiency and optimised flow
- Compliance putting roadblock in flow of pipeline
- Even worse scanning in production after deployment
- Compliance guys using Excel, Security using Shell-scripts, Develops and Operations using Code
- Chef security compliance language – InSpec
- Insert Sales stuff here
- Lots of pre-written configs available
Immutable SQL Server Clusters – John Bowker (from Xero)
- Pet Based infrastructure
- Not in cloud, weeks to deploy new server
- Hard to update base infrastructure code
- 110 Prod Servers (2 regions).
- 1.9PB of Disk
- Octopus Deploy: SQL Schemas, Also server configs
- Half of team in NZ, Half in Denver
- Data Engineers, Infrastructure Engineers, Team Lead, Product Owner
- Where we were – The Burning Platform
- Changed mid-Migration from dedicated instances to dedicated Hosts in AWS
- Big saving on software licensing
- Already had Clustered HA
- Existing automation
- 6 day team, 15 hours/day due to multiple locations of team
- Migration had to have no downtime
- Went with node swaps in cluster
- Split team. Half doing migration, half creating code/system for the node swaps
- We learnt
- Dedicated hosts are cheap
- Dedicated host automation not so good for Windows
- Discovery service not so good.
- Syncing data took up to 24h due to large dataset
- Powershell debugging is hard (moving away from powershell a bit, but powershell has lots of SQL server stuff built in)
- AWS services can timeout, allow for this.
- Things we Built
- Lots Step Templates in Octopus Deploy
- Metadata Store for SQL servers – Dynamite (Python, Labda, Flask, DynamoDB) – Hope to Open source
- Lots of PowerShell Modules
- Node Swaps going forward
- Working towards making this completely automated
- New AMI -> Node swap onto that
- Avoid upgrade in place or running on old version
The 2017 Linux Security Summit (LSS) was held last month in Los Angeles over the 14th and 15th of September. It was co-located with Open Source Summit North America (OSSNA) and the Linux Plumbers Conference (LPC).
Once again we were fortunate to have general logistics managed by the Linux Foundation, allowing the program committee to focus on organizing technical content. We had a record number of submissions this year and accepted approximately one third of them. Attendance was very strong, with ~160 attendees — another record for the event.
On the day prior to LSS, attendees were able to access a day of LPC, which featured two tracks with a security focus:
Realtime notes were made of these microconfs via etherpad:
I was particularly interested in the topic of better integrating LSM with containers, as there is an increasingly common requirement for nesting of security policies, where each container may run its own apparently independent security policy, and also a potentially independent security model. I proposed the approach of introducing a security namespace, where all security interfaces within the kernel are namespaced, including LSM. It would potentially solve the container use-cases, and also the full LSM stacking case championed by Casey Schaufler (which would allow entirely arbitrary stacking of security modules).
This would be a very challenging project, to say the least, and one which is further complicated by containers not being a first class citizen of the kernel. This leads to security policy boundaries clashing with semantic functional boundaries e.g. what does it mean from a security policy POV when you have namespaced filesystems but not networking?
Discussion turned to the idea that it is up to the vendor/user to configure containers in a way which makes sense for them, and similarly, they would also need to ensure that they configure security policy in a manner appropriate to that configuration. I would say this means that semantic responsibility is pushed to the user with the kernel largely remaining a set of composable mechanisms, in relation to containers and security policy. This provides a great deal of flexibility, but requires those building systems to take a great deal of care in their design.
There are still many issues to resolve, both upstream and at the distro/user level, and I expect this to be an active area of Linux security development for some time. There were some excellent followup discussions in this area, including an approach which constrains the problem space. (Stay tuned)!
A highlight of the TPMs session was an update on the TPM 2.0 software stack, by Philip Tricca and Jarkko Sakkinen. The slides may be downloaded here. We should see a vastly improved experience over TPM 1.x with v2.0 hardware capabilities, and the new software stack. I suppose the next challenge will be TPMs in the post-quantum era?
There were further technical discussions on TPMs and container security during subsequent days at LSS. Bringing the two conference groups together here made for a very productive event overall.
This year, due to the overlap with LPC, we unfortunately did not have any LWN coverage. There are, however, excellent writeups available from attendees:
- The 2017 Linux Security Summit by Paul Moore; and
- 2017 Linux Security Summit (Day 1) and (Day 2) by Tyler Hicks.
There were many awesome talks.
The CII Best Practices Badge presentation by David Wheeler was an unexpected highlight for me. CII refers to the Linux Foundation’s Core Infrastructure Initiative , a preemptive security effort for Open Source. The Best Practices Badge Program is a secure development maturity model designed to allow open source projects to improve their security in an evolving and measurable manner. There’s been very impressive engagement with the project from across open source, and I believe this is a critically important effort for security.
During Dan Cashman’s talk on SELinux policy modularization in Android O, an interesting data point came up:
Interesting data from the talk: 44% of Android kernel vulns blocked by SELinux due to attack surface reduction. https://t.co/FnU544B3XP
— James Morris (@xjamesmorris) September 15, 2017
We of course expect to see application vulnerability mitigations arising from Mandatory Access Control (MAC) policies (SELinux, Smack, and AppArmor), but if you look closely this refers to kernel vulnerabilities. So what is happening here? It turns out that a side effect of MAC policies, particularly those implemented in tightly-defined environments such as Android, is a reduction in kernel attack surface. It is generally more difficult to reach such kernel vulnerabilities when you have MAC security policies. This is a side-effect of MAC, not a primary design goal, but nevertheless appears to be very effective in practice!
Another highlight for me was the update on the Kernel Self Protection Project lead by Kees, which is now approaching its 2nd anniversary, and continues the important work of hardening the mainline Linux kernel itself against attack. I would like to also acknowledge the essential and original research performed in this area by grsecurity/PaX, from which this mainline work draws.
From a new development point of view, I’m thrilled to see the progress being made by Mickaël Salaün, on Landlock LSM, which provides unprivileged sandboxing via seccomp and LSM. This is a novel approach which will allow applications to define and propagate their own sandbox policies. Similar concepts are available in other OSs such as OSX (seatbelt) and BSD (pledge). The great thing about Landlock is its consolidation of two existing Linux kernel security interfaces: LSM and Seccomp. This ensures re-use of existing mechanisms, and aids usability by utilizing already familiar concepts for Linux users.
— LinuxSecuritySummit (@LinuxSecSummit) September 14, 2017
Overall I found it to be an incredibly productive event, with many new and interesting ideas arising and lots of great collaboration in the hallway, lunch, and dinner tracks.
Slides from LSS may be found linked to the schedule abstracts.
We did not have a video sponsor for the event this year, and we’ll work on that again for next year’s summit. We have discussed holding LSS again next year in conjunction with OSSNA, which is expected to be in Vancouver in August.
We are also investigating a European LSS in addition to the main summit for 2018 and beyond, as a way to help engage more widely with Linux security folk. Stay tuned for official announcements on these!
Thanks once again to the awesome event staff at LF, especially Jillian Hall, who ensured everything ran smoothly. Thanks also to the program committee who review, discuss, and vote on every proposal, ensuring that we have the best content for the event, and who work on technical planning for many months prior to the event. And of course thanks to the presenters and attendees, without whom there would literally and figuratively be no event :)
See you in 2018!
Since forking the Mon project to etbemon  I’ve been spending a lot of time working on the monitor scripts. Actually monitoring something is usually quite easy, deciding what to monitor tends to be the hard part. The process monitoring script ps.monitor is the one I’m about to redesign.
Here are some of my ideas for monitoring processes. Please comment if you have any suggestions for how do do things better.
For people who don’t use mon, the monitor scripts return 0 if everything is OK and 1 if there’s a problem along with using stdout to display an error message. While I’m not aware of anyone hooking mon scripts into a different monitoring system that’s going to be easy to do. One thing I plan to work on in the future is interoperability between mon and other systems such as Nagios.Basic Monitoring ps.monitor tor:1-1 master:1-2 auditd:1-1 cron:1-5 rsyslogd:1-1 dbus-daemon:1- sshd:1- watchdog:1-2
I’m currently planning some sort of rewrite of the process monitoring script. The current functionality is to have a list of process names on the command line with minimum and maximum numbers for the instances of the process in question. The above is a sample of the configuration of the monitor. There are some limitations to this, the “master” process in this instance refers to the main process of Postfix, but other daemons use the same process name (it’s one of those names that’s wrong because it’s so obvious). One obvious solution to this is to give the option of specifying the full path so that /usr/lib/postfix/sbin/master can be differentiated from all the other programs named master.
The next issue is processes that may run on behalf of multiple users. With sshd there is a single process to accept new connections running as root and a process running under the UID of each logged in user. So the number of sshd processes running as root will be one greater than the number of root login sessions. This means that if a sysadmin logs in directly as root via ssh (which is controversial and not the topic of this post – merely something that people do which I have to support) and the master process then crashes (or the sysadmin stops it either accidentally or deliberately) there won’t be an alert about the missing process. Of course the correct thing to do is to have a monitor talk to port 22 and look for the string “SSH-2.0-OpenSSH_”. Sometimes there are multiple instances of a daemon running under different UIDs that need to be monitored separately. So obviously we need the ability to monitor processes by UID.
In many cases process monitoring can be replaced by monitoring of service ports. So if something is listening on port 25 then it probably means that the Postfix “master” process is running regardless of what other “master” processes there are. But for my use I find it handy to have multiple monitors, if I get a Jabber message about being unable to send mail to a server immediately followed by a Jabber message from that server saying that “master” isn’t running I don’t need to fully wake up to know where the problem is.SE Linux
One feature that I want is monitoring SE Linux contexts of processes in the same way as monitoring UIDs. While I’m not interested in writing tests for other security systems I would be happy to include code that other people write. So whatever I do I want to make it flexible enough to work with multiple security systems.Transient Processes
Most daemons have a second process of the same name running during the startup process. This means if you monitor for exactly 1 instance of a process you may get an alert about 2 processes running when “logrotate” or something similar restarts the daemon. Also you may get an alert about 0 instances if the check happens to run at exactly the wrong time during the restart. My current way of dealing with this on my servers is to not alert until the second failure event with the “alertafter 2” directive. The “failure_interval” directive allows specifying the time between checks when the monitor is in a failed state, setting that to a low value means that waiting for a second failure result doesn’t delay the notification much.
To deal with this I’ve been thinking of making the ps.monitor script automatically check again after a specified delay. I think that solving the problem with a single parameter to the monitor script is better than using 2 configuration directives to mon to work around it.CPU Use
Mon currently has a loadavg.monitor script that to check the load average. But that won’t catch the case of a single process using too much CPU time but not enough to raise the system load average. Also it won’t catch the case of a CPU hungry process going quiet (EG when the SETI at Home server goes down) while another process goes into an infinite loop. One way of addressing this would be to have the ps.monitor script have yet another configuration option to monitor CPU use, but this might get confusing. Another option would be to have a separate script that alerts on any process that uses more than a specified percentage of CPU time over it’s lifetime or over the last few seconds unless it’s in a whitelist of processes and users who are exempt from such checks. Probably every regular user would be exempt from such checks because you never know when they will run a file compression program. Also there is a short list of daemons that are excluded (like BOINC) and system processes (like gzip which is run from several cron jobs).Monitoring for Exclusion
A common programming mistake is to call setuid() before setgid() which means that the program doesn’t have permission to call setgid(). If return codes aren’t checked (and people who make such rookie mistakes tend not to check return codes) then the process keeps elevated permissions. Checking for processes running as GID 0 but not UID 0 would be handy. As an aside a quick examination of a Debian/Testing workstation didn’t show any obvious way that a process with GID 0 could gain elevated privileges, but that could change with one chmod 770 command.
On a SE Linux system there should be only one process running with the domain init_t. Currently that doesn’t happen in Stretch systems running daemons such as mysqld and tor due to policy not matching the recent functionality of systemd as requested by daemon service files. Such issues will keep occurring so we need automated tests for them.
Automated tests for configuration errors that might impact system security is a bigger issue, I’ll probably write a separate blog post about it.
The problem is that python mocks are magical. Its an object where you can call any method name, and the mock will happily pretend it has that method, and return None. You can then later ask what "methods" were called on the mock.
However, you use the same mock object later to make assertions about what was called. Herein is the problem -- the mock object doesn't know if you're the code under test, or the code that's making assertions. So, if you fat finger the assertion in your test code, the assertion will just quietly map to a non-existent method which returns None, and your code will pass.
Here's an example:
from unittest import mock
def dummy(a, b):
return a + b
f = foo()
print('Asserting a call should work if the call was made')
print('Assertion for expected call passed')
print('Asserting a call should raise an exception if the call wasn\'t made')
mock_worked = False
except AssertionError as e:
mock_worked = True
print('Expected failure, %s' % e)
if not mock_worked:
print('*** Assertion should have failed ***')
print('Asserting a call where the assertion has a typo should fail, but '
mock_worked = False
except AssertionError as e:
mock_worked = True
print('Expected failure, %s' % e)
if not mock_worked:
print('*** Assertion should have failed ***')
if __name__ == '__main__':
If I run that code, I get this:
- $ python3 mock_assert_errors.py
Asserting a call should work if the call was made
Assertion for expected call passed
Asserting a call should raise an exception if the call wasn't made
Expected failure, Calls not found.
Expected: [call(3, 4)]
Actual: [call(1, 2)]
Asserting a call where the assertion has a typo should fail, but doesn't
*** Assertion should have failed ***
[call(1, 2), call.typo_assert_has_calls([call(3, 4)])]
So, we should have been told that typo_assert_has_calls isn't a thing, but we didn't notice because it silently failed. I discovered this when I noticed an assertion with a (smaller than this) typo in its call in a code review yesterday.
I don't really have a solution to this right now (I'm home sick and not thinking straight), but it would be interesting to see what other people think.
Tags for this post: python unittest.mock mock testing
Related posts: Implementing SCP with paramiko; Packet capture in python; A pythonic example of recording metrics about ephemeral scripts with prometheus; mbot: new hotness in Google Talk bots; Starfish Prime; Calculating a SSH host key with paramiko