Intel® Secure Device Onboard (Intel® SDO) Scales Devices to IoT Platforms

Intel® Secure Device Onboard (Intel® SDO) Scales Devices to IoT Platforms
— Read on

Build an IoT analytics solution with big data tools

The Internet of things seems futuristic, but real systems are delivering real analytics value today. Here’s some real-world IoT advice from the field

Build an IoT analytics solution with big data tools
Credit: Thinkstock

With all the hype around the Internet of things, you have a right to be skeptical that the reality could ever match the promise. I would feel that way myself — if it weren’t for some recent firsthand experience.

I recently had the opportunity to work on a project that involved applying IoT technologies to medical devices and pharmaceuticals in a way that could have a profound impact on health care. Seeing the possibilities afforded by “predictive health care” opened my eyes to the value of IoT more than any other project I’ve been associated with.

Of course, the primary value in an IoT system is in the ability to perform analytics on the acquired data and extract useful insights, though make no mistake — building a pipeline for performing scalable analytics with the volume and velocity of data associated with IoT systems is no walk in the park. To help you avoid some of the difficulties we encountered, allow me to share a few observations on how to develop an ideal IoT analytics stack.

Acquiring and storing your data

Myriad protocols enable the receipt of events from IoT devices, especially at the lower levels of the stack. For our purposes, it doesn’t matter whether your device connects to the network using Bluetooth, cellular, Wi-Fi, or a hardware connection, only that it can send a message to a broker of some sort using a defined protocol.

One of the most popular protocols and widely supported protocols for IoT applications isMQTT (Message Queue Telemetry Transport). Plenty of alternatives exist as well, including Constrained Application Protocol, XMPP, and others.

Given its ubiquity and wide support, along with the availability of numerous open source client and broker applications, I tend to recommend starting with MQTT, unless you have compelling reasons to choose otherwise. Mosquitto is one of the best-known and widely used open source MQTT brokers, and it’s a solid choice for your applications. The fact that it’s open source is especially valuable if you’re building a proof of concept on a small budget and want to avoid the expense of proprietary systems.

Regardless of which protocol you choose, you will eventually have messages in hand, representing events or observations from your connected devices. Once a message is received by a broker such as Mosquitto, you can hand that message to the analytics system. A best practice is to store the original source data before performing any transformations or munging. This becomes very valuable when debugging issues in the transform step itself — or if you need to replay a sequence of messages for end-to-end testing or historical analysis.

For storing IoT data, you have several options. In some projects I’ve used Hadoop and Hive, but lately I’ve been working with NoSQL document databases like Couchbase with great success. Couchbase offers a nice combination of high-throughput, low-latency characteristics. It’s also a schema-less document database that supports high data volume along with the flexibility to add new event types easily. Writing data directly to HDFS is a viable option, too, particularly if you intend to use Hadoop and batch-oriented analysis as part of your analytics workflow.

For writing source data to a persistent store, you can either attach custom code directly to the message broker at the IoT protocol level (for example, the Mosquitto broker if using MQTT) or push messages to an intermediate messaging broker such as Apache Kafka — and use different Kafka consumers for moving messages to different parts of your system. One proven pattern is to push messages to Kafka and two consumer groups on the topic, where one has consumers that write the raw data to your persistence store, while the other moves the data into a real-time stream processing engine like Apache Storm.

If you aren’t using Kafka and are using Storm, you can also simply wire a bolt into your topology that does nothing but write messages out to the persistent store. If you are using MQTT and Mosquitto, a convenient way to tie things together is to have your message delivered directly to an Apache Storm topology via the MQTT spout.

Preprocessing and transformations

Data from devices in their raw form are not necessarily suited for analytics. Data may be missing, requiring an enrichment step, or representations of values may need transformation (often true for date and timestamp fields).

This means you’ll frequently need a preprocessing step to manage enrichments and transformations. Again, there are multiple ways to structure this, but another best practice I’ve observed is the need to store the transformed data alongside the raw source data.

Now, you might think: “Why do that when I can always just transform it again if I need to replay something?” As it turns out, transformations and enrichments can be expensive operations and may add significant latency to the overall workflow. It’s best to avoid the need to rerun the transformations if you rerun a sequence of events.

Transformations can be handled several ways. If you are focused on batch mode analysis and are writing data to HDFS as your primary workflow, then Pig — possibly using custom user-defined functions — works well for this purpose. Be aware, however, that while Pig does the job, it’s not exactly designed to have low-latency characteristics. Running multiple Pig jobs in sequence will add a lot of latency to the workflow. A better option, even if you aren’t looking for “real-time analysis” per se, might be using Storm for only the preprocessing phase of the workflow.

Analytics for business insights

Once your data has been transformed into a suitable state and stored for future use, you can start dealing with analytics.

Apache Storm is explicitly designed for handling continuous streams of data in a scalable fashion, which is exactly what IoT systems tend to deliver. Storm excels at managing high-volume streams and performing operations over them, like event correlation, rolling metric calculations, and aggregate statistics. Of course, Storm also leaves the door open for you to implement any algorithm that may be required.

Our experience to date has been that Storm is an extremely good fit for working with streaming IoT data. Let’s look at how it can work as a key element of your analytics pipeline.

In Storm, by default “topologies” run forever, performing any calculation that you can code over a boundless stream of messages. Topologies can consist of any number of processing steps, aka bolts, which distribute over nodes in a cluster; Storm manages the message distribution for you. Bolts can maintain state as needed to perform “sliding window” calculations and other kinds of rolling metrics. A given bolt can also be stateless if it needs to look at only one event at a time (for example, a threshold trigger).

The calculated metrics in your Storm topology can then be used to suit your business requirements as you see fit. Some values may trigger a real-time notification using email or XMPP or update a real-time dashboard. Other values may be discarded immediately, while some may need to be stored in a persistent store. Depending on your application, you may actually find it makes sense to keep more than you throw away, even for “intermediate” values.

Why? Simply put, you have to “reap” data from any stateful bolts in a Storm topology eventually, unless you have infinite RAM and/or swap space available. You may eventually need to perform a “meta analysis” on those calculated, intermediate values. If you store them, you can achieve this without the need to replay the entire time window from the original source events.

How should you store the results of Storm calculations? To start with, understand that you can do anything in your bolts, including writing to a database. Defining a Storm topology that writes calculated metrics to a persistent store is as simple as adding code to the various bolts that connect to your database and pushing the resulting values to the store. Actually, to follow the separation-of-concerns principle, it would be better to add a new bolt to the topology downstream of the bolt that performs the calculations and give it sole responsibility for managing persistence.

IoT analytics pipeline

Storm topologies are extremely flexible, giving you the ability to have any bolt send its output to any number of subsequent bolts. If you want to store the source data coming into the topology, this is as easy as wiring a persistence bolt directly to the spout (or spouts) in question. Since spouts can send data to multiple bolts, you can both store the source events and forward them to any number of subsequent processing steps.

For storing these results, you can use any database, but as noted above, we’ve found that Couchbase works well in these applications. The key point to choosing a database: You want to complement Storm — which has no native query/search facility and can store a limited amount of data in RAM — with a system that provides strong query and retrieval capabilities. Whatever database you choose, once your calculated metrics are stored, it should be straightforward to use the native query facilities in the database for generating reports. From here, you want the ability to utilize Tableau, BIRT, Pentaho, JasperReports, or similar tools to create any required reports or visualizations.

Storing data in this way also opens up the possibility of performing additional analytics at a later time using the tool of your choice. If one of your bolts pushes data into HDFS, you open up the possibility of employing entire swath of Hadoop-based tools for subsequent processing and analysis.

Building analytics solutions that can handle the scale of IoT systems isn’t easy, but the right technology stack makes the challenge less daunting. Choose wisely and you’ll be on your way to developing an analytics system that delivers valuable business insights from data generated by a swarm of IoT devices.




Best Corporate Security Blog



Other nominees:

McAfee Blog: click here

CloudFlare Blog: click here

SecureWorks Blog: click here

Solutionary Minds Blog: click here

Kaspersky Lab Securelist Blog: click here

Veracode Blog: click here

Trend Micro Blog: click here


Naked Security Blog: click here

Best Security Podcast

Other nominees:

Liquidmatrix Security Digest: click here

EuroTrashSecurity: click here

SANS Internet Storm Center: click here

Southern Fried Security: click here

Risky Business: click here

Sophos Security Chet Chat: click here

And the winner is:

Paul Dotcom: click here

The Most Educational Security Blog

Other nominees:

BH Consulting’s Security Watch Blog: click here

Security Uncorked Blog: click here

Dr. Kees Leune’s Blog: click here

Securosis Blog: click here Blog: click here

Critical Watch Blog: click here

The Security Skeptic Blog: click here

The New School of Information Security Blog: click here

And the winner is:

Krebs On Security: click here

The Most Entertaining Security Blog

Other nominees:

Packet Pushers Blog: click here

Securosis Blog: click here

Errata Security Blog: click here

Naked Security Blog: click here

Uncommon Sense Security Blog: click here

PSilvas Blog: click here

And the winner is:

J4VV4D’s Blog: click here

The Blog That Best Represents The Security Industry

Other nominees:

SpiderLabs Anterior Blog: click here

1 Raindrop Blog: click here

Naked Security Blog: click here

The Firewall (Forbes) Blog: click here

Threat Level (Wired) Blog: click here

Securosis Blog: click here

Michael Peters Blog: click here

And the winner is:

Krebs On Security Blog: click here

The Single Best Blog Post or Podcast Of The Year

Other nominees:

The Epic Hacking of Mat Honan and Our Identity Challenge: click here

Application Security Debt and Application Interest Rates: click here

Why XSS is serious business (and why Tesco needs to pay attention): click here

Levelling up in the real world: click here

Secure Business Growth, Corporate Responsibility with Ben Tomhave: click here

And the winner is:

Meet The Hackers Who Sell Spies The Tools To Crack Your PC (And Get Paid Six-Figure Fees): click here

The Security Bloggers Hall Of Fame

The other nominees are:

Richard Bejtlich

Gunnar Peterson

Naked Security Blog

Wendy Nather

And the winner is:

Jack Daniel

%d bloggers like this: