Monday 27 February 2017

Rule-Based Fault Management for Environmental Monitoring IoT system

Final report


Description of the project


Thidea of the project was to build a fault management solution for distributed IoT systems. Fault managment, along with security, is among the most important management features of IoT networks. The solution had to demonstrate three main fault management functions:
  • detect faults in IoT system from symptons (events and messages, containing raw data about faults);
  • isolate and diagnose the causes of faults;
  • apply the fault recovery procedures.
The rule-based reasoning technic had to be used for the implementation of these functions.
The environmental monitoring IoT system was selected as an example of the diagnosable system.


The current status


The following is already done:
  • the hardware part of environmental monitoring system created;
  • the sensor node software fully implemented;
  • the Raspberry Pi based field IoT gateway configured to work as the Wireless Access Point;
  • Mosquitto MQTT broker installed and configured on the IoT gateway component;
  • the IoT gateway software partially implemented (this work is in progress, see GitHub repo with the sources);
  • one DigitalOcean 1Gb CentOS 7 droplet created;
  • Docker and Docker Compose installed on the droplet;
  • Docker daemon secured with TLS;
  • Eclipse Hono v0.5-M3 deployed on the droplet and tested.
Well, there is still much to be done :)


Changes in the components of the solution


Current changes are related to the implementation of the sensor node and field gateway components:



The sensor node component's ESP-12 modules replaced with SparkFun ESP8266 Things. The Raspberry Pi based IoT gateway now configured as the WAP for sensor nodes and connected to the router using LAN connection.
I initially considered Kura for the IoT gateway, but after some research think it's a bit overcomplicated for the case of the project. I needed to implement bidirectional MQTT - AMQP bridge to connect Mosquiotto broker, which running locally on Raspberry PI and Eclipse Hono AMQP telemetry endpoint, running on the DigitalOcean droplet. And as I can see there is no direct way to implement it with Kura. I would have to develop a bundle from scratch to support request-response scenario for the local Mosquitto. Also Kura's CloudService is EDC specific and it seems like it doesn't fit Hono API. So I decided to implement custom IoT micro-gateway using Apache Camel and run it as Linux service on IoT gateway component. In this case it will be possible to connect to Hono using Hono API through AMQP protocol.


Lessons learned


As it is seen now, the declared scope of the project turned out to be too broad for the stated time frame. And, unfortunately, working full-time, it was not always possible to fully engage in the Challenge.

Saturday 18 February 2017

Rule-Based Fault Management for Environmental Monitoring IoT system

Sensor node software


In this post I'm showing the software part of the sensors. The code is implemented with Arduino IDE. Libs used:
  • ESP - provides ESP8266 specific functions;
  • Wire - provides I2C protocol support;
  • ESP8266WiFi - WiFi related functions;
  • EEPROM - allows to work with persistent data storadge;
  • PubSubClient - a client library for MQTT support.
The software is devided into two parts: a) BME280 I2C driver lib and b) Arduino sketch with the sensor functionality. Next I'm describing the implementation of each part.

BME280 driver


The BME280 sensor driver is implemented as Arduino library. The BME280 datasheet was the main reference for the implementation. The library functional spec is following:
  • I2C interface support only;
  • Forced mode support only; 
  • allows to dynamically set the BME280 I2C address;
  • allows to dynamically set SDA and SCL pins fot I2C interface;
  • oversampling isn't used (acceptable for the environmental measurements);
  • IIR filter isn't used (also acceptable for the environmental measurements);
The public API of the library includes two constructors and three methods:
The UML diagram depicts the library usage workflow:

The diagram is self-explanatory. One thing can be mentioned here that according to BME280 datasheet (see 3.3.3. Forced mode) the Forced mode has to be selected again for each subsequent measurement cycle. For this purpose the last statement of readAll() method sets the sensor mode on each method call (5.4.5 Register 0xF4 "ctrl_meas"): 
The BME280 ADC output values for temperature, pressure and humidity are compensated using formulas from the datasheet (4.2.3 Compensation formulas).
You can find sources for the BME280 driver in the GitHub repo.


Arduino sketch



The project's sensor node software is implemented as an Arduino sketch. The main features that was implemented in the sketch:
  • wireless WiFi communication channel with the field IoT gateway;
  • MQTT client, provides connectivity with MQTT broker which runs on a field IoT gateway;
  • bidirectional communication with BME280 sensor;
  • the use of a persistent storage for the current state of the sensor node during restarts;
  • three modes of  operation:
    • active;
    • power save;
    • suspended.
  • one-way message exchange pattern implementation over MQTT for environmental telemetry data messages;
  • request-reply message exchange pattern implementation over MQTT for control messages, sended from the field gateway to the sensor node.

Startup

The UML sequence diagram depicts the startup process after powering the sensor node:

The programm starts with reading the MAC address as it is used as a MQTT client name and included in MQTT topic tree structure. Then BME280 is initialized using I2C protocol. Then the EEPROM memory allocated - 5 bytes total (4 bytes for the sleep period value in seconds + 1 byte for the current mode id). After that WiFi connection is established with a WiFi access point, based on the Raspberry Pi. The same Raspberry PI is running local Mosquitto MQTT broker instance and the field IoT gateway software. Next steps are for MQTT  PubSub client callback function setup, MQTT topic names initialization and the sensor node current operating mode processing.
The following topics are configured by the sensor node:


Topic name templateTopic name exampleTopic typePurpose
env/<MAC_address>/statusenv/5ccf7f2f1d04/statusPublishSensor node state messages.
Possible values: 'off', 'sleeping', 'on'
env/<MAC_address>/temperatureenv/5ccf7f2f1d04/temperaturePublishTelemetry data messages: temperature in DegC
env/<MAC_address>/pressureenv/5ccf7f2f1d04/pressurePublishTelemetry data messages: atmospheric pressure in hPa
env/<MAC_address>/humidityenv/5ccf7f2f1d04/humidityPublishTelemetry data messages: humidity in %RH
env/<MAC_address>/requestenv/5ccf7f2f1d04/requestSubscribeCommand messages: request
env/<MAC_address>/requestenv/5ccf7f2f1d04/replyPublishCommand
messages: reply

The preprocessMode() function implements a conditional workflow, that depends on the current sensor node mode. Three operational modes are supported:

According to SparkFun's ESP8266 Thing Hookup Guide XPD pin has to be connected to DTR pin to enable the sleep capability.

The UML activity diagram represents the details of the workflow:

Two interesting points here:
1. As the Arduino Client for MQTT only supports Clean Sessions (see for example this note), the command messages that are addressed to the sensor node, can only be sent while the node is connected to the local MQTT broker, i.e. the node never receives the command message if it was sent while the node is disconnected from the broker in the deep sleep state. So as a workaround, before sending the sensor node to deep sleep in PWR_SAVE or SUSPENDED mode, the node waits for an incoming command message at a fixed interval of 5 seconds and then runs PubSub client's loop() method to process incomming messages.
2. Here is the strange thing: I've never managed to successfully process incoming request message if the loop() is called once. In this case the callback function is never called. I was able to fix it by inserting the second loop() call:


First integration test

I ran integration tests using the following setup:





I created several gifs to visualize the tests.
This gif reflects the following scenario:
1. The sensor node is configured to connect to the Mosquitto MQTT broker which runs on a field IoT gateway, based on Raspbery PI;
2. mqtt-spy utility is configured to connect to the same MQTT broker;
3. Four MQTT subscribtions are created to the topics in mqtt-spy:
    env/5ccf7f2f1dc8/status
  env/5ccf7f2f1dc8/temperature
  env/5ccf7f2f1dc8/humidity
  env/5ccf7f2f1dc8/pressure
4. The sensor node run in PWR_SAVE mode and initially is in the deep sleep state;
5. The payload of a message in the env/5ccf7f2f1dc8/status topic contains 'off' string value which corresponds the payload of a retain message of the sensor node;
6. When the power save period ends, the sensor node establishes wireless MQTT connection with the broker;
7. The node publishes new message to env/5ccf7f2f1dc8/status with a payload, containing 'on' string indicating that the status of the node is changed.
8. The node reads new environmental data from BME280 and publishes three new messages to env/5ccf7f2f1dc8/temperature, env/5ccf7f2f1dc8/humidity and env/5ccf7f2f1dc8/pressure topics (see Preprocess Mode Activity Diagram).
9. The node publishes new message to env/5ccf7f2f1dc8/status with a payload, containing 'sleeping' string indicating that the status of the node is changed.
10. The node is sent to the deep sleep state.
11. The broker disconnects the network connection as it doesn't receive any packets from the sensor node within one and a half times the Keep Alive time period.
12. The broker publishes the last will message of the node to env/5ccf7f2f1dc8/status with a payload, containing 'off' string.


Commands and command messages

The sensor node supports request-reply message exchange for command messages. The current implementation supports three commands:
  • getsensid - returns the BME280 sensor identifier in hex string format;
  • getbattery - returns the supply voltage (VCC) value;
  • setmode - sets the operational mode of the node.
For a command messages very lightweight application level protocol was created. The payload of the command message is formatted as a CSV-string.

Request command message format:
<msg_id>,<command_name>,<param 1>,<param 2>,...,<param n>
where
  msg_id - unique identifier of the message (required),
  command_name - predefined command name (required),
  param 1, param 2, param n - the command input parameters (optional). 

Reply command message format:
<correl_id>,<status>,<payload>
where
  correl_id - correlation identifier, must match the msg_id  value of the corresponding request message (required),
  status - command execution status. Only one possible value 200 is supported in the current implementation (required),
  payload - command output payload (optional). 

 Examples:

  Request command message payload: 0001,getbattery
  Reply command message payload: 0001,200,3153

  Request command message payload: 0002,setmode,1
  Reply command message payload: 0001,200

Command messages are handled by the message callback function of MQTT PubSub client. The callback function dispatches the command from the message to the corresponding command handler function:



You can find sources of the sketch in the GitHub repo.




Wednesday 25 January 2017


Rule-Based Fault Management for Environmental Monitoring IoT system

Hardware setup

For the hardware part of the system I'm using two DIY environmental sensors where each sensor consists of two parts: the external part, mounted on the outside of the window and the internal part, mounted in the apartment of the apartment building.

The internal part

The internal part is a ESP8266 module powered with LiPo 3.7V 700mAh battery. After submitting the proposal I started to work on its hardware components. And my initial development configuration looked like this:

But since my project won the Gift Certificate I've decided to apply it and after a little waiting got this goodies from SparkFun:

and modified the prototyping setup:

This is a final configuration that I used to develop the sensor driver software. I'll describe the sensor software development in the next post.

SparkFun ESP8266 Thing has onboard voltage regulator and charger for 3.7V LiPo batteries. It is a great addition for solutions with autonomous power supply.

The components are mounted on a small solderable breadboard with 2x10-pin stackable headers and 4-pin break away header for the sensor ribbon cable on it:

Assembled boards and LiPo batteries are placed in a protective enclosure which made of latticed PVC-plates. The enclosure has a separate section for each board.The back panel is removed to provide access to the ESP8266 Thing power switch and Micro USB connector.

The external part

For the external part I made two simple sensor holders. The holder allows to place the BME280 sensor at a distance from the wall of the building to slightly reduce the impact of the wall proximity on the temperature measurement. Also it provides partial protection from the weather conditions.

Simple raw materials:

And it has not done without hot glue :)



Unfortunately I burned one SparkFun BME280 sensor breakout board, so I had to go with 4-pin BME280, which I bought on eBay earlier last year. Here is bottom-up view of covers with sensors inside:
I attached sensor holders to the window and put the fully assembled device on a sill:

Done.

Btw, it's true: there is a lot of snow in winter in Russia :)

Sunday 25 December 2016

Rule-Based Fault Management for Environmental Monitoring IoT system


Fault Management is an important part of the IoT Management area in general. And several approaches exist to fault detection and diagnosis in information systems. For a 3.0 edition of the Challenge I’m working on a Fault Management solution for a distributed Environmental Monitoring IoT system. The solution is based on a Rule-Based principles of errors (symptoms) detection and faults isolation and diagnose.

Components of the system

The solution will be implemented and deployed as a distributed system, composed of several components. Here is short description of the main components.
Component view of the solution

1. Sensor component, consisting of two identical battery powered WiFi enabled IoT sensors based on ESP8266 and BME280. The component provides temperature, humidity a barometric pressure values using push/pull communication schema with its field gateway via MQTT protocol.

2. IoT Gateway component, based on Raspberry Pi component, running Raspbian OS. RPI has a wireless WiFi USB dongle connected. Eclipse Kura will be installed on the device and used as IoT gateway implementation. The gateway communicates with IoT sensors via MQTT protocol. Also SNMP protocol agent for Raspbian OS will be deployed to enable RPi to receive SNMP commands and send SNMP trap events.

The following components will be deployed on two or more cloud CentOS 7 instances, provided by Vscale. Some components will be run inside Docker containers.

3. Connectivity component, based on Eclipse Hono. The component will provide bidirectional communication channel for the IoT gateway and its cloud backend. Two types of communication protocols for interaction will be used: SNMP and MQTT. Telemetry data from sensors and control commands from the backend will be transmitted via MQTT. SNMP will be used for receiving TRAP and INFORM events from the OS components of the IoT gateway, and for sending GET requests from the backend. For SNMP I’m going to develop SNMP Protocol Adapter for Hono. So from the gateway side there will be two separate data flows: 
sensor readings and monitoring, error events and alarms.

4. Fault Management component, based on JBoss Drools. The aim of this component is to receive symptom events (errors, alarms), detect errors, isolate and diagnose the causes of faults and apply recover activities. For this purpose a set of rules will be created for decision making and complex event processing. The component will be able to send control messages to check the status of other components and to send commands to components as a recovery procedure (trying to restart the failed component for example). The component will be deployed as a self-contained decision service, which communicates with Hono and Data storage components via Apache Camel routes.

5. Data Storage component, implemented via Redis data structure store. Two instances of the data storage will be used: one as the temporary Environmental telemetry data storage and second as the Fault Management database, which persists symptom events, notifications and alerts data.

6. Integration сomponent, based on Apache Camel. This component will implement the business scenario of the solution by receiving Environmental telemetry data (temperature, humidity and barometric pressure values) from the Connectivity component and transmitting this data to the local geoinformation SaaS service - Public Monitoring Project (narodmon.ru) to display sensor readings on the world map.

7. UI component. Will be implemented as a standalone web application. The component will deliver real-time fault and the system status data to the user, providing online visualization and notification functionality using WebSocket protocol.

8. UI client component implemented as HTML5 web client application. The component will display data, provided by the UI component web app.

Sample Use Cases

Use Case 1: Two Environmental data sensors deployed in the field. One sensor is in active mode, periodically sending data readings to its field gateway. Another sensor is a reserve and is in standby mode. The first sensor stops functioning due to the power problems, for example. The Fault Management system detects the situation in which readings data cease to flow from the gateway. The system inits fault isolation procedure by sending status request control message to the malfunctioning sensor. As the sensor doesn’t respond, the system executes recovery activity by sending wake command message to the second sensor. The second sensor switches to the active mode thus continuing the operation of the overall system.

Use Case 2: The Fault Management system receives multiple SNMP trap events with the information on large memory usage by OS of the IoT field gateway. The system sends SNMP GET request to retrieve the OS performance data. The response confirms the bad OS performance. The system executes the recovery procedure by sending the reset command message to the field gateway OS.

At this time I'm working on the Sensor Component hardware and software part.

Friday 26 February 2016

Hi! Only a few hours left before the end of the Challenge, so in this post I'm going to wrap up the current status of my project. Still I hope this post is not the last one :)

The idea of the project


The idea was to build a smart toy with a main purpose to help babies to develop crawling activity. So what has been done so far?

  Robot chassis

The robot chassis prototype has been built. For the basis of the chassis I'm using Boe-Bot Parts Kit from Parallax:



I've made slight modifications that allow me to mount the Raspberry Pi A+ on the Aluminum Chassis:


Also I've replaced Parallax standard servos from the kit with SM-S4303R continuous rotation servos. SM-S4303R has slightly higher rotation speed. The robot power source is 4xAA NiMh rechargeable battery pack:

 
Since the output voltage of the pack equals 4.8 V and does not provide enough power for Raspberry Pi, servos and other electronic parts, I'm using MT3608 DC-DC Step Up Power Apply Module to increase voltage to 5 V. Also my experiments have shown that Raspberry Pi is very sensitive to current spikes produced by servos and freezes every time when they start to rotate. To fix this I'm decided to use separate circuit to power Raspberry Pi. As a result power circuit contains two MT3608, connected in parallel to the power source:
 
 

Robot configuration highlights

The robot is connected to the wireless router via Wi-Fi wireless USB Adapter Dongle  and configured to use static IP address inside a LAN.
To control the servos of the robot I'm using ServoBlaster driver software. Pins 15 and 16 of the Raspberry Pi header are configured to control servos. So, /dev/servoblaster-cfg configuration file content:

    p1pins=15,16
    p5pins=

    Servo mapping:
         0 on P1-15          GPIO-22
         1 on P1-16          GPIO-23

Robot software

The project sources can be found at https://github.com/sergevas/bcbot. It is a maven-based multi-module project. The module bcbot-robot includes functionality intended to deploy to the Raspberry Pi. Current version of the module includes implementation of several Apache Camel routes and Californium CoAP resourses. The software runs on Raspberry Pi as a standalone Java application. The RobotMain class boots entire application. The class has several dependences:
 Main is used to boot up Camel Context, CoapServer is used to initialize embedded Californium CoAP Server instance.

Robot CoAP resources

Several CoAP resources were implemented and can be found in xyz.sergevas.iot.bcbot.robot.coap package. Current implementation contains hierarchy of the CoAP resources with MoveResource  class as a root resource for the robot motion control. To configure CoAP server org.apache.camel.main.MainListener was implemented:

public static class RobotManagementEvent extends MainListenerSupport {
      
        @Override
        public void afterStart(MainSupport main) {
            try {
                ROBOT_COAP_SERVER.add(new RobotResource("robot", main.getCamelContexts().get(0)));
                LOG.debug("Starting the robot CoAP server...");
                ROBOT_COAP_SERVER.start();
                LOG.debug("The robot CoAP server started...");
            } catch (Exception e) {
                LOG.error("Unable to configure and start the robot CoAP server...", e);
                throw new RuntimeException("Unable to configure and start the robot CoAP server...",
                    e);
            }
        }
      
        @Override
        public void afterStop(MainSupport main) {
            super.afterStop(main);
            LOG.debug("Stopping the robot CoAP server...");
            ROBOT_COAP_SERVER.stop();
        }
    }

Servo control

To control the robot servos Apache Camel routes were implemented in the class ServoRoute. Exec Camel component interacts with the Raspberry Pi file system in order to update /dev/servoblaster file thus controlling the servo speed and rotation direction.  Here is the example of URI that executes echo command with Linux command interpreter:

to("exec:sh?args=-c%20%22echo%20{{servo.right.pin}}={{servo.right.speed.slow.fw}}%20%3E%20"
+ "/dev/servoblaster%20;%20echo%20{{servo.left.pin}}={{servo.left.speed.slow.fw}}%20%3E%20"
+ "/dev/servoblaster%22")

All constant values moved to the properties file. After substitution of values:
exec:sh?args=-c "echo 0=162 > /dev/servoblaster ; echo 0=140 > /dev/servoblaster"  
The routes are called from CoAP resources through the proxy service that is created using Camel ProxyBuilder class:
public class BackwardResource extends CoapResource {
 
private static final Logger LOG = Logger.getLogger(ForwardResource.class);
 
 private RobotMove robotMove;

 public BackwardResource(String name, CamelContext camelContext) throws Exception {
  super(name);
  robotMove = new ProxyBuilder(camelContext).endpoint(ROBOT_MOVE_BACKWARD_ROUTE)
                    .build(RobotMove.class);
 }
 
 @Override
 public void handlePUT(CoapExchange exchange) {
  LOG.debug(format("Start handlde PUT for CoapExchange with text [%s]",
                    exchange.getRequestText()));
  String robotSpeedMode = exchange.getRequestText();
  LOG.debug(
          format("Start Calling Camel route to move the robot backward with speed mode [%s]",
              robotSpeedMode));
  robotMove.move(robotSpeedMode);
  LOG.debug("End calling Camel route...");
  exchange.respond(CHANGED);
 }
}

Deployment model

The project is built and deployed in one click as an executable fat jar using maven Shade Plugin. Linux service wrapper was created to run the robot software on the Raspberry Pi. The service .sh file is created during the deploy maven phase from the Apache Velocity template  every time the build process runs.

Demo

 I have recorded a short video that demonstrates how the robot can be controlled remotely using Copper CoAP user agent for Firefox. You can see the robot resources: 

Currently only PUT CoAP methods are supported by the resources.
As a message body, resources expect a string with the following possible values:
"SLOW", "MEDIUM" and "FAST". These values represent the robot speed modes. The values are propagated as an input message payload to Camel routes and used in content based router EIP implementation:
from(ROBOT_MOVE_FORWARD_ROUTE)
    .choice()
        .when(bodyAs(String.class).isEqualToIgnoreCase(SERVO_SPEED_MODE_SLOW))
            .to("exec:sh?args=-c%20%22echo%20{{servo.right.pin}}={{servo.right.speed.slow.fw}}%20%3E%20/dev/servoblaster"
                + "%20;%20echo%20{{servo.left.pin}}={{servo.left.speed.slow.fw}} %20%3E%20/dev/servoblaster%22")
        .when(bodyAs(String.class).isEqualToIgnoreCase(SERVO_SPEED_MODE_MEDIUM))
            .to("exec:sh?args=-c%20%22echo%20{{servo.right.pin}}={{servo.right.speed.medium.fw}}%20%3E%20/dev/servoblaster"
                + "%20;%20echo%20{{servo.left.pin}}={{servo.left.speed.medium.fw}}%20%3E%20/dev/servoblaster%22")"
        .when(bodyAs(String.class).isEqualToIgnoreCase(SERVO_SPEED_MODE_FAST))
            .to("exec:sh?args=-c%20%22echo%20{{servo.right.pin}}={{servo.right.speed.fast.fw}}%20%3E%20/dev/servoblaster"
                + "%20;%20echo%20{{servo.left.pin}}={{servo.left.speed.fast.fw}}%20%3E%20/dev/servoblaster%22")
    .end().to("log:xyz.sergevas.iot.bcbot?level=DEBUG&showHeaders=true");

Feasibility Test

And here is a small test that can be considered successful :). The efforts of my six-month son encouraged me to continue my work.