Home

Awesome

Deep Learning UDF for KSQL / ksqlDB for Streaming Anomaly Detection of MQTT IoT Sensor Data

I built a KSQL UDF for sensor analytics. It leverages the new API features of KSQL to build UDF / UDAF functions easily with Java to do continuous stream processing on incoming events. If you want to build your own UDF, please check out this blog post for a detailed "how to" and potential issues during development and testing: How to Build a UDF and/or UDAF in KSQL 5.0.

Use Case: Connected Cars - Real Time Streaming Analytics using Deep Learning

Continuously process millions of events from connected devices (sensors of cars in this example):

Architecture: Sensor Data via Confluent MQTT Proxy to Kafka Cluster for KSQL Processing and Real Time Analytics

This project focuses on the ingestion of data into Kafka via MQTT and processing of data via KSQL:

Live Demo Video - MQTT with Kafka Connect and MQTT Proxy

If you want to see Apache Kafka / MQTT integration in a video, please check out the following 15min recording showing a demo my two Github examples:

Apache Kafka + MQTT Integration

Source Code

Here is the full source code for the Anomaly Detection KSQL UDF.

It is pretty easy to develop UDFs. Just implement the function in one Java method within a UDF class:

            @Udf(description = "apply analytic model to sensor input")
            public String anomaly
                (@UdfParameter("sensorinput") String sensorinput)
            { "YOUR BUSINESS LOGIC" }

How to run it?

Requirements

The code is developed and tested on Mac and Linux operating systems. As Kafka does not support and work well on Windows, this is not tested at all.

Step-by-step demo

Install Confluent Platform and Mosquitto (or any other MQTT Client).

Then follow these steps to deploy the UDF, create MQTT events and process them via KSQL leveraging the analytic model.

Other information and projects related to Kafka and MQTT

If you want to find more details about Kafka + MQTT integration, take a look at my slides from Kafka Summit 2018 in San Francisco: IoT Integration with MQTT and Apache Kafka. The video recording is available on the website of Kafka Summit for free: Kafka MQTT Integration - Video Recording.

To see the other part (integration with sink applications like Elasticsearch / Grafana), please take a look at the project "KSQL for streaming IoT data", which shows how to realize the integration with ElasticSearch via Kafka Connect.

The Github project Apache Kafka + Kafka Connect + MQTT Connector + Sensor Data also integrates with MQTT devices. Though, this project uses Confluent's MQTT Connector for Kafka Connect, i.e. a different approach where you use a MQTT Broker in between the devices and Kafka.

If you want to see a more powerful and complete demo, check out Streaming Machine Learning at Scale from 100000 IoT Devices with HiveMQ, Apache Kafka and TensorFLow.