{"id":865,"date":"2018-08-03T06:30:02","date_gmt":"2018-08-03T06:30:02","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2018\/08\/03\/scalable-iot-ml-platform-with-apache-kafka-deep-learning-mqtt\/"},"modified":"2018-08-03T06:30:02","modified_gmt":"2018-08-03T06:30:02","slug":"scalable-iot-ml-platform-with-apache-kafka-deep-learning-mqtt","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2018\/08\/03\/scalable-iot-ml-platform-with-apache-kafka-deep-learning-mqtt\/","title":{"rendered":"Scalable IoT ML Platform with Apache Kafka + Deep Learning + MQTT"},"content":{"rendered":"<p>Author: Kai Waehner<\/p>\n<div>\n<p>I built a scenario for a hybrid machine learning infrastructure leveraging <a href=\"https:\/\/kafka.apache.org\/\">Apache Kafka<\/a> as scalable central nervous system. The public cloud is used for training analytic models at extreme scale (e.g. using <a href=\"https:\/\/www.tensorflow.org\/\" target=\"_blank\" rel=\"noopener\">TensorFlow<\/a> and TPUs on Google Cloud Platform (GCP) via <a href=\"https:\/\/cloud.google.com\/ml-engine\/\" target=\"_blank\" rel=\"noopener\">Google ML Engine<\/a>. The predictions (i.e. model inference) are executed on premise at the edge in a local Kafka infrastructure (e.g. leveraging Kafka Streams or KSQL for streaming analytics).<\/p>\n<p>This post focuses on the on premise deployment. I created a <a href=\"https:\/\/github.com\/kaiwaehner\/ksql-udf-deep-learning-mqtt-iot\" target=\"_blank\" rel=\"noopener\">Github project with a KSQL UDF for sensor analytics<\/a>. It leverages the new\u00a0<a href=\"https:\/\/docs.confluent.io\/current\/ksql\/docs\/udf.html\" rel=\"nofollow\">API features of KSQL to build UDF \/ UDAF functions easily with Java<\/a>\u00a0to do continuous stream processing on incoming events.<\/p>\n<h2><a id=\"user-content-use-case-connected-cars---real-time-streaming-analytics-using-deep-learning\" class=\"anchor\" href=\"https:\/\/github.com\/kaiwaehner\/ksql-udf-deep-learning-mqtt-iot#use-case-connected-cars---real-time-streaming-analytics-using-deep-learning\" name=\"user-content-use-case-connected-cars---real-time-streaming-analytics-using-deep-learning\"><\/a>Use Case: Connected Cars &#8211; Real Time Streaming Analytics using Deep Learning<\/h2>\n<p>Continuously process millions of events from connected devices (sensors of cars in this example):<\/p>\n<p><a href=\"http:\/\/www.kai-waehner.de\/blog\/wp-content\/uploads\/2018\/08\/Connected_Cars_IoT_Deep_Learning.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-1329 alignnone\" src=\"http:\/\/www.kai-waehner.de\/blog\/wp-content\/uploads\/2018\/08\/Connected_Cars_IoT_Deep_Learning-300x135.png\" alt=\"Connected_Cars_IoT_Deep_Learning\" width=\"300\" height=\"135\"><\/a><\/p>\n<p>I built different analytic models for this. They are trained on public cloud leveraging TensorFlow, H2O and Google ML Engine. Model creation is not focus of this example. The final model is ready for production already and can be deployed for doing predictions in real time.<\/p>\n<p>Model serving can be done via a model server or natively embedded into the stream processing application. See the <a href=\"https:\/\/dzone.com\/articles\/model-serving-stream-processing-vs-rpc-rest-with-j\" target=\"_blank\" rel=\"noopener\">trade-offs of RPC vs. Stream Processing for model deployment and a &#8220;TensorFlow + gRPC + Kafka Streams&#8221; example here<\/a>.<\/p>\n<h2>Demo: Model Inference at the Edge with MQTT, Kafka and KSQL<\/h2>\n<p>The Github project <a href=\"https:\/\/github.com\/kaiwaehner\/ksql-udf-deep-learning-mqtt-iot\" target=\"_blank\" rel=\"noopener\">generates car sensor data, forwards it via Confluent MQTT Proxy to Kafka cluster for KSQL processing and real time analytics<\/a>.<\/p>\n<p>This project focuses on the ingestion of data into Kafka via <a href=\"http:\/\/mqtt.org\/\" target=\"_blank\" rel=\"noopener\">MQTT<\/a> and processing of data via KSQL: <a href=\"http:\/\/www.kai-waehner.de\/blog\/wp-content\/uploads\/2018\/08\/MQTT_Proxy_Confluent_Cloud.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-1330 alignnone\" src=\"http:\/\/www.kai-waehner.de\/blog\/wp-content\/uploads\/2018\/08\/MQTT_Proxy_Confluent_Cloud-300x66.png\" alt=\"MQTT_Proxy_Confluent_Cloud\" width=\"300\" height=\"66\"><\/a><\/p>\n<p>A great benefit of <a href=\"https:\/\/www.confluent.io\/confluent-mqtt-proxy\/\" target=\"_blank\" rel=\"noopener\">Confluent MQTT Proxy<\/a> is simplicity for realizing IoT scenarios without the need for a MQTT Broker. You can forward messages directly from the MQTT devices to Kafka via the MQTT Proxy. This reduces efforts and costs significantly. This is a perfect solution if you &#8220;just&#8221; want to communicate between Kafka and MQTT devices.<\/p>\n<p>If you want to see the other part of the story (integration with sink applications like Elasticsearch \/ Grafana), please take a look at the Github project &#8220;<a href=\"https:\/\/github.com\/kaiwaehner\/ksql-fork-with-deep-learning-function\">KSQL for streaming IoT data<\/a>&#8220;. This realizes the integration with ElasticSearch and Grafana via Kafka Connect and the Elastic connector.<\/p>\n<h2>KSQL UDF &#8211; <a id=\"user-content-source-code\" class=\"anchor\" href=\"https:\/\/github.com\/kaiwaehner\/ksql-udf-deep-learning-mqtt-iot#source-code\" name=\"user-content-source-code\"><\/a>Source Code<\/h2>\n<p>It is pretty easy to develop UDFs. Just implement the function in one Java method within a UDF class:<\/p>\n<pre><code>            @Udf(description = \"apply analytic model to sensor input\")             public String anomaly(String sensorinput){ \"YOUR LOGIC\" } <\/code><\/pre>\n<p>Here is the full source code for the\u00a0<a href=\"https:\/\/github.com\/kaiwaehner\/ksql-udf-deep-learning-mqtt-iot\/blob\/master\/src\/main\/java\/com\/github\/megachucky\/kafka\/streams\/machinelearning\/Anomaly.java\" target=\"_blank\" rel=\"noopener\">Anomaly Detection KSQL UDF<\/a>.<\/p>\n<h2>How to run the demo with Apache Kafka and MQTT Proxy?<\/h2>\n<p>All steps to execute the demo are describe in the Github project.<\/p>\n<p>You just need to <a href=\"https:\/\/www.confluent.io\/download\/\" rel=\"nofollow\">install Confluent Platform<\/a>\u00a0and then follow these steps to\u00a0<a href=\"https:\/\/github.com\/kaiwaehner\/ksql-udf-deep-learning-mqtt-iot\/blob\/master\/live-demo.adoc\">deploy the UDF, create MQTT events and process them via KSQL leveraging the analytic model<\/a>.<\/p>\n<p>I use <a href=\"https:\/\/mosquitto.org\/download\/\" target=\"_blank\" rel=\"noopener\">Mosquitto to generate MQTT messages<\/a>. Of course, you can use any other MQTT client, too. That is the great benefit of an open and standardized protocol.<\/p>\n<h2>Hybrid Cloud Architecture for Apache Kafka and Machine Learning<\/h2>\n<p>If you want to learn more about the concepts behind a scalable, vendor-agnostic Machine Learning infrastructure, take a look at my presentation on Slideshare or watch the recording of the corresponding Confluent webinar &#8220;<a href=\"https:\/\/videos.confluent.io\/watch\/u9xQvGWrzEV7Ky4m6uidiW?\" target=\"_blank\" rel=\"noopener\">Unleashing Apache Kafka and TensorFlow in the Cloud<\/a>&#8220;.<\/p>\n<p><a href=\"https:\/\/www.slideshare.net\/KaiWaehner\/unleashing-apache-kafka-and-tensorflow-in-the-cloud-108325306\/KaiWaehner\/unleashing-apache-kafka-and-tensorflow-in-the-cloud-108325306\">https:\/\/www.slideshare.net\/KaiWaehner\/unleashing-apache-kafka-and-tensorflow-in-the-cloud-108325306\/KaiWaehner\/unleashing-apache-kafka-and-tensorflow-in-the-cloud-108325306<\/a><\/p>\n<p>\u00a0<\/p>\n<p>Please share any feedback! Do you like it, or not? Any other thoughts?<\/p>\n<\/div>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/xn\/detail\/6448529:BlogPost:748379\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Kai Waehner I built a scenario for a hybrid machine learning infrastructure leveraging Apache Kafka as scalable central nervous system. The public cloud is [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2018\/08\/03\/scalable-iot-ml-platform-with-apache-kafka-deep-learning-mqtt\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":866,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[26],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/865"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=865"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/865\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/866"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=865"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=865"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=865"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}