Pulse – Application Heartbeat Monitor
Pulse is a new library I wrote in a weekend because I couldn’t find anything out there which performs this simple service. And I was tired of getting calls from GoGet on my day off (from my job as CTO of Fleetcutter).
I needed to monitor a somewhat fragile, legacy Windows application server which already had a heartbeat database record it updates every minute. Great, but nothing was monitoring the heartbeat record, so no cigar there. I also have a bunch of other servers written in different languages hosted on different platforms. I also have a few separate support companies (customer service, managed hosting and 24-hr IT support and application triage) so I’d like to notify each of specific levels/types of problems, and each has their own management infrastructure and contact preferences.
Therefore I wanted something that was simple and reliable but extendable and flexible.
So I wrote Pulse: https://bitbucket.org/scipilot/pulse
I believe the architecture is a good example of Dependency Injection, the Strategy Pattern, the Repository Pattern, IoC Container (flexible), and automated testing (reliable). In fact, during initial development I wrote all the interfaces first, then the test platform, then began to fill in the functionality while writing the tests, in a loose form of TDD. I like the idea that the code was initially run solely via the test platform (i.e. no dummy manually run top-level-scripts). This meant that when I eventually deployed the system it pretty much worked first time! The main issues discovered on deployment were (apart from one major foobar) deployment-related things like the autoload location and the need to ‘park’ work-in-progress configurations.
In fact, in its first day of operation it caught a major outage seconds after the cause and several minutes before any humans noticed. But… I ignored the notification emails thinking they were false positives!
But now I’m out of the loop, I’m fully off-duty on an island in the Indian Ocean. Now my 24hr support has a new pair of ears stethoscoped directly into the butterflies that beat in the heart of our veteran application.
Pulse is a standalone component which provides simple application/service heartbeat monitoring.
Simply put: it checks your application is still alive.
- A timestamp is updated by your service or the daemon, and a monitor daemon regularly checks the hearbeats are current.
- If a service fails to register a heartbeat within the configured thresholds, notifications are triggered.
Client service/applications can have one or multiple heartbeats monitored – called Pulses.
All components are customisable: notifiers, config, log, storage (via dependency-injection in an IoC container). The notifications are the most likely to need customisation, e.g. sending alerts to your own dashboard API. If you customised the storage engine, you could perhaps integrate saving heatbeats from another source.
e.g. 1 – Integrated Application Heartbeat Update and Monitor Service Sending Email Notifications
This example shows a PHP application using the Pulse class to store regular heatbeats. The monitor service is using the email notifier and regularly checks the heartbeat database.
Note if the heartbeat database is a simple file (serialised, sqlite) then the monitor service would need to be installed locally.
e.g. 2 – Multiple Application Heartbeats and Multiple Notification Types
This example demonstrates scaling the application to monitor multiple applications and use different notification strategies to send different levels of alerts, or notificiations from the different applications, to different support staff.
e.g. 3 – Remote Probe Ability for Legacy or non-PHP Apps
This example shows the
Probe feature querying a remote application’s status, without integrating into it.
There are three states your service can be in:
- OK – All is well. Nothing to report.
- ALERT – Service degraded. Things are getting bumpy, but you might not want to call out the troops yet…
- ALARM – System down. Urgent panic, wake up the boss.
- OK again – on returning to normal you will get one notification.
You only get a notification when the service changes state.
You don’t have to use the ALERT state, but it gives you a bit of extra granularity to triage any flapping. Often a system will have a ‘grey area’ where you don’t want to really panic everyone, but perhaps you should keep an eye on things. This ALERT state can therefore be targetted at a different response team (see below).
Standard composer installation:
> composer require scipilot/pulse
or more likely you will fork it and require your own.
There are two halves to the system:
1. Heatbeat Update
This can be done in one of two ways:
a) “Locally”: Each service can require the component and call Pulse->beat() regularly. Of course this only works for PHP services, and ones you can modify by integrating with this component.
b) “Remotely”: The Pulse daemon can poll your system using “Scopes” to update the heartbeats. This is the most unintrusive method, and can interface with existing database timetamps for example.
Scopes allow you to implement custom probes into other systems to extract heartbeats. Currently there is only one: a DatabaseRecordScope which queries an SQL DB for a single timestamp.
The IoC model allows you to easily implement another interface to query an API, top, PID file etc.
2. Pulse Monitoring
The monitoring daemon must be run in a supervised process (or it can be polled by your own scheduler).
Both uses obviously must share the same config, by default these are files in the
storage folder, so the same installation must be used. If you need to separate the daemon from your services, you would need to implement another
Storage strategy (i.e. implement a
Storage class and inject into a modified container).
- Copy the sample config to config.json
- Edit your email address in the default ‘to’.
php /bin/daemon.php &; php /bin/example_service.php &
- Check your inbox!
Also see the Unit Tests for usages (although I’ve always hated reading people say that).
Pulse is configured with alert and alarm thresholds, friendly name, unique ID.
Notifications also require some configuration too (see below).
storage/config-SAMPLE.json for a bootstrap config.
You can either hand-craft a json config file, or programmatically create
Scope objects, which can persist their own configuration for re-load later. See the
Initially there is only an
EmailNotify implementation. it requires the following configuration:
notify.EmailNotify.default.email.to notify.EmailNotify.default.email.subject - default subject notify.EmailNotify.default.email.body - with placeholders: _TYPE_, _LEVEL_, _NAME_, _MESSAGE_
Body template e.g.:
"Heads up! Alert type: _TYPE_, at level: _LEVEL_, was received for service _NAME_. \n Details are as follows: _MESSAGE_"
Optionally you can customise any specific messages for both monitoring alerts, and internal errors:
notify.EmailNotify.<TYPE>.<LEVEL>.email.to - specific alert overrides notify.EmailNotify.<TYPE>.<LEVEL>.email.subject notify.EmailNotify.<TYPE>.<LEVEL>.email.body - with placeholders: _TYPE_, _LEVEL_, _MESSAGE_
e.g. you might want outage alarms sent to L1 Support pagers, but internal warnings sent to developers or devops.
See ‘INotify’ for these constants.
The system logs INFO by default into the
This will just be service status changes, and any system errors (e.g. notification send failures). They will be pretty small (if your services are stable), but you might want to rotate them anyway.
You can change the level of detail up to debug or down to just warnings, by calling
app->log->setVerbosity(). (There is no config option for this, at present)
- IoC App Container - Config DI - Log DI - Storage DI - Notification DI - PulseRegistry - Pulse->beat - ScopeRegistry - Scope->Listen - Monitor Daemon->run: - Listen to Scopes - Scan Pulses - Feed API (TODO!)
The Feed API has not yet been implemented. The idea is to allow remote applications both register Beats, and fetch Pulse status.
There are extensive (complete?) PHPUnit tests which are mostly standalone, except for some email and database configurations required for full-system testing.
To run the DB-oriented tests you will need the example service database installed:
CREATE DATABASE pulse_test; CREATE TABLE `system` ( `id` int(11) unsigned NOT NULL AUTO_INCREMENT, `last_seen` datetime DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB; INSERT INTO `system` (`id`, `last_seen`) VALUES ('1', NOW()); CREATE USER 'pulse_test'@'localhost' IDENTIFIED BY 'pulse_test'; GRANT SELECT, UPDATE ON `pulse\_test`.* TO 'pulse_test'@'localhost'; FLUSH PRIVILEGES;
storage/config-UNITTEST.json for more details or to customise it.
This library is licenced under the LGPL v3.
Simply put: you can use this software pretty much as you wish as long as you retain this licence, make the source available and don’t blame me for anything.
I’d also really like to see any changes / fixes / suggestions – thanks!