IoT application development doesn’t dissociate from a typical backend development process. But the approach is. Be careful about we’re talking for physical “things” and this makes a lot of sense. Here are my notes and best practices about developing an IoT Architecture as an IoT Solutions Architect. I’ll give examples for each layer for on-premise, for AWS or other public clouds.

You need to build the “architecture” first and it should cover the whole application; because of different layers like physical devices, mobile users, data processing, real-time data pipelines, automated reporting; which are not the subject for typical web applications. You can still develop the IoT application’s backend with same way which you know, after architecting the solution and cover the scenarios about physical things. On this guide, we’ll have a look at protocols, API’s, databases and data processors.

Protocols

First you need to be curious about IoT device’s hardware environment. Will you collect data from tiny, battery powered sensors or industrial gateways? These differences will determine your telemetry protocol between the backend and devices. If you’ll get time series sensor data often, go initially with MQTT. If you’re in a really harsh environment in terms of battery and system sources, take a look for COAP.

IoT Protocols on AWS

On AWS, you can go with AWS IoT Core; not only for tiny sensor devices but also industrial gateways etc. IoT Core provides MQTT with SSL/TLS enabled, MQTT over websockets and also HTTPS device API’s.

IoT Protocols for on-premises

For on-premise solution, have a look at Mosquitto or emqtt. You need to build a robust and fault-tolerant service like AWS IoT Core. To secure your MQTT pipeline; you need to manage your SSL/TLS certification, root CA’s, recertification, expirations etc by yourself. To build a fail-over scenario for your managed MQTT broker, have a look at using nginx for proxying/load balancing of brokers. Notice that you’ll consider tons of things while load balancing like using session persistence by checking status bytes on transceiving MQTT data.


API’s

Every API start with CRUD but CRUD operations itself aren’t sufficient. There’ll be operation based endpoints like activation/deactivation devices or requesting tokens if we’re talking about IoT. Because, there are physical “devices”. Build these API’s as possible as close to OpenAPI specifications. For example, if you develop API definitions in swagger, you can import them almost every cloud provider.

IoT API Development on AWS

On AWS, AWS IoT Core; provides HTTPS device and data API’s respectively. Also, you can build your API’s talks with your IoT ecosystem using API Gateway and Lambda services. You can import/export your API in swagger format in AWS API Gateway. By using Lambda, your team can benefit using different languages and serverless architecture which can go to infinite scale.

IoT API Development for on-premises

For on-premise solutions; you can prepare documentations and build your API using Swagger and its tools. Also consider Loopback if you’re familiar with nodejs, it has a huge community. You can still deploy Express, Laravel, Flask or Django as backend services. Be careful about async I/O and data consistency. You can even think about deploying your own serverless service on-premise, have a look at Apache OpenWhisk and Kubeless. This sounds totally reasonable nowadays. Be ready for the world of containers, since every serverless function is a container actually.


Database

Not only think that how will you store that huge data, also think about how to request and retrieve them? If I’m looking to see about a device’s single parameter but for a year, you shouldn’t push raw data to front-end. You need to change the granularity of the data on the fly, I mean dynamically. Look for time-series databases if you don’t perform heavy analytics requests to data.

IoT Databases and Data Storage on AWS

On AWS, AWS TimeStream database is on way. It’ll be another time-series database which fit perfect for IoT applications. You can consider using PostgreSQL on AWS RDS for analytics-focused device data. You can develop some Lambda functions and an API Gateway behind it. Thus, you have a fault-tolerant and robust database with scalable data API’s. For transactional records and logs; I use DynamoDB. Read documentation and best practices of DynamoDB really and really carefully. I spend almost a week on this but I’ve make the best and perfectly-scalable device and inventory database ever.

IoT Databases and Data Storage for on-premises

On-premise options are really complex. How can I start? Well, if you need something near noSQL, have a look at influxDB. It’s open source but replication and clustering is a paid-feature. If you need more like a time-series features with relational database; have a look at TimeScale database which is actually a plugin over PostgreSQL. You can go with commercial ones like Oracle or MSSQL but it’s obvious that they’re not perfect for the IoT scenarios. MongoDB is the best choice if you go with noSQL. Be careful at this point, please don’t just select MongoDB or noSQL because of its popularity. If you have heavy analytics queries on your data, MongoDB should scan your whole table. So, you should tune your table (actually a collection) schema at there. Who said noSQL databases and tables have no schema?


Data Processing

Is historical data is enough for you? What about real-time decisions? Or maybe, on a condition with a given behaviour of device data; you will need to access near-real-time data about that behaviour. Databases are not best place to do this kind of things.

IoT Data Processing on AWS

If you’ll dig into this, be careful. Learn and use what just you need. Otherwise you’ll lost totally. On AWS, if you need basic real-time condition based rules and actions linked to them; just use AWS IoT Rules. If you need somehow complex ones like time windowing, connecting different data pipelines; have a look at Kinesis. I’m not going further from this point because possibilities are endless.

IoT Data Processing for on-premises

If you’ll work on-premise; setting up for data processing will not as simple as setting up an REST API server. Brace yourself and first have a look at Apache Kafka, Apache Spark and Apache Cassandra. If you didn’t lost enough; have a look at WSO2 Stream Processor If you’re using TICK stack (influxDB and it’s buddies) and you don’t want to dig into much; you have Kapacitor to do that kind of stuff.

Please tell and discuss your ideas and thoughts about backend development for IoT on comments section.