Pulse APM

Pulse – Application Heartbeat Monitor

Pulse is a new library I wrote in a weekend because I couldn’t find anything out there which performs this simple service. And I was tired of getting calls from GoGet on my day off (from my job as CTO of Fleetcutter).

I needed to monitor a somewhat fragile, legacy Windows application server which already had a heartbeat database record it updates every minute. Great, but nothing was monitoring the heartbeat record, so no cigar there. I also have a bunch of other servers written in different languages hosted on different platforms. I also have a few separate support companies (customer service, managed hosting and 24-hr IT support and application triage) so I’d like to notify each of specific levels/types of problems, and each has their own management infrastructure and contact preferences.

Therefore I wanted something that was simple and reliable but extendable and flexible.

So I wrote Pulse: https://bitbucket.org/scipilot/pulse

I believe the architecture is a good example of Dependency Injection, the Strategy Pattern, the Repository Pattern, IoC Container (flexible), and automated testing (reliable). In fact, during initial development I wrote all the interfaces first, then the test platform, then began to fill in the functionality while writing the tests, in a loose form of TDD. I like the idea that the code was initially run solely via the test platform (i.e. no dummy manually run top-level-scripts). This meant that when I eventually deployed the system it pretty much worked first time! The main issues discovered on deployment were (apart from one major foobar) deployment-related things like the autoload location and the need to ‘park’ work-in-progress configurations.

In fact, in its first day of operation it caught a major outage seconds after the cause and several minutes before any humans noticed. But… I ignored the notification emails thinking they were false positives!

But now I’m out of the loop, I’m fully off-duty on an island in the Indian Ocean. Now my 24hr support has a new pair of ears stethoscoped directly into the butterflies that beat in the heart of our veteran application.

Overview

Pulse is a standalone component which provides simple application/service heartbeat monitoring.

Simply put: it checks your application is still alive.

  1. A timestamp is updated by your service or the daemon, and a monitor daemon regularly checks the hearbeats are current.
  2. If a service fails to register a heartbeat within the configured thresholds, notifications are triggered.

Client service/applications can have one or multiple heartbeats monitored – called Pulses.

All components are customisable: notifiers, config, log, storage (via dependency-injection in an IoC container). The notifications are the most likely to need customisation, e.g. sending alerts to your own dashboard API. If you customised the storage engine, you could perhaps integrate saving heatbeats from another source.

e.g. 1 – Integrated Application Heartbeat Update and Monitor Service Sending Email Notifications

This example shows a PHP application using the Pulse class to store regular heatbeats. The monitor service is using the email notifier and regularly checks the heartbeat database.

example 1

Note if the heartbeat database is a simple file (serialised, sqlite) then the monitor service would need to be installed locally.

e.g. 2 – Multiple Application Heartbeats and Multiple Notification Types

This example demonstrates scaling the application to monitor multiple applications and use different notification strategies to send different levels of alerts, or notificiations from the different applications, to different support staff.

example 2

e.g. 3 – Remote Probe Ability for Legacy or non-PHP Apps

This example shows the Probe feature querying a remote application’s status, without integrating into it.

example 3

Alarms

There are three states your service can be in:

  1. OK – All is well. Nothing to report.
  2. ALERT – Service degraded. Things are getting bumpy, but you might not want to call out the troops yet…
  3. ALARM – System down. Urgent panic, wake up the boss.
  4. OK again – on returning to normal you will get one notification.

You only get a notification when the service changes state.

You don’t have to use the ALERT state, but it gives you a bit of extra granularity to triage any flapping. Often a system will have a ‘grey area’ where you don’t want to really panic everyone, but perhaps you should keep an eye on things. This ALERT state can therefore be targetted at a different response team (see below).

Installation

Standard composer installation:

> composer require scipilot/pulse

or more likely you will fork it and require your own.

Usage

There are two halves to the system:

1. Heatbeat Update

This can be done in one of two ways:

a) “Locally”: Each service can require the component and call Pulse->beat() regularly. Of course this only works for PHP services, and ones you can modify by integrating with this component.

b) “Remotely”: The Pulse daemon can poll your system using “Scopes” to update the heartbeats. This is the most unintrusive method, and can interface with existing database timetamps for example.

Scopes allow you to implement custom probes into other systems to extract heartbeats. Currently there is only one: a DatabaseRecordScope which queries an SQL DB for a single timestamp.

The IoC model allows you to easily implement another interface to query an API, top, PID file etc.

2. Pulse Monitoring

The monitoring daemon must be run in a supervised process (or it can be polled by your own scheduler).

Both uses obviously must share the same config, by default these are files in the storage folder, so the same installation must be used. If you need to separate the daemon from your services, you would need to implement another Storage strategy (i.e. implement a Storage class and inject into a modified container).

Usage Examples

See /bin/example_service.php and /bin/daemon.php respectively.

  1. Copy the sample config to config.json
  2. Edit your email address in the default ‘to’.
  3. Run php /bin/daemon.php &; php /bin/example_service.php &
  4. Check your inbox!

Also see the Unit Tests for usages (although I’ve always hated reading people say that).

Config

Each Pulse is configured with alert and alarm thresholds, friendly name, unique ID.

Notifications also require some configuration too (see below).

See storage/config-SAMPLE.json for a bootstrap config.

You can either hand-craft a json config file, or programmatically create Pulse or Scope objects, which can persist their own configuration for re-load later. See the PulseRegistry and ScopeRegistry.

Notifications

Email Notifications

Initially there is only an EmailNotify implementation. it requires the following configuration:

notify.EmailNotify.default.email.to
notify.EmailNotify.default.email.subject            - default subject
notify.EmailNotify.default.email.body               - with placeholders: _TYPE_, _LEVEL_, _NAME_, _MESSAGE_

Body template e.g.:

"Heads up! Alert type: _TYPE_, at level: _LEVEL_, was received for service _NAME_. \n Details are as follows: _MESSAGE_"

Optionally you can customise any specific messages for both monitoring alerts, and internal errors:

notify.EmailNotify.<TYPE>.<LEVEL>.email.to          - specific alert overrides
notify.EmailNotify.<TYPE>.<LEVEL>.email.subject
notify.EmailNotify.<TYPE>.<LEVEL>.email.body        - with placeholders: _TYPE_, _LEVEL_, _MESSAGE_

e.g. you might want outage alarms sent to L1 Support pagers, but internal warnings sent to developers or devops.

See ‘INotify’ for these constants.

Logs

The system logs INFO by default into the storage folder.

This will just be service status changes, and any system errors (e.g. notification send failures). They will be pretty small (if your services are stable), but you might want to rotate them anyway.

You can change the level of detail up to debug or down to just warnings, by calling app->log->setVerbosity(). (There is no config option for this, at present)

Architecture

- IoC App Container
    - Config DI
    - Log DI
    - Storage DI
    - Notification DI
- PulseRegistry
    - Pulse->beat
- ScopeRegistry
    - Scope->Listen
- Monitor Daemon->run:
    - Listen to Scopes
    - Scan Pulses
- Feed API (TODO!)

The Feed API has not yet been implemented. The idea is to allow remote applications both register Beats, and fetch Pulse status.

Roadmap

  • NewRelicNotify
  • SMSNotify
  • DatabaseStorage
  • RestNotify
  • StatusPageIONotify
  • Alternate language service bindings (Python, JavaScript)

Tests

There are extensive (complete?) PHPUnit tests which are mostly standalone, except for some email and database configurations required for full-system testing.

To run the DB-oriented tests you will need the example service database installed:

CREATE DATABASE pulse_test;
CREATE TABLE `system` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `last_seen` datetime DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;
INSERT INTO `system` (`id`, `last_seen`) VALUES ('1', NOW());

CREATE USER 'pulse_test'@'localhost' IDENTIFIED BY 'pulse_test';
GRANT SELECT, UPDATE ON `pulse\_test`.* TO 'pulse_test'@'localhost';
FLUSH PRIVILEGES;

See storage/config-UNITTEST.json for more details or to customise it.

Licence

This library is licenced under the LGPL v3.

Simply put: you can use this software pretty much as you wish as long as you retain this licence, make the source available and don’t blame me for anything.

I’d also really like to see any changes / fixes / suggestions – thanks!

https://bitbucket.org/scipilot/pulse

This entry was posted in Projects and tagged , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.