You may think the amount of data created today is huge, but it’s peanuts compared to the volume, variety and velocity of data that will be generated by connected vehicles and IoT-enabled supply chains. Patrick Callaghan, solutions architect, DataStax, shares some insights into how we will cope.

Each connected device is constantly producing data from sensors to provide insight into what is going on right now. Multiply that data by the number of vehicles involved, and you can be facing umpteen updates from thousands of trucks, buses and cars. Shipping containers can also generate constant updates on their location, status and issues. It all adds up to a big data problem – and this is just based on what we track today.

Many global logistics companies are looking to increase the number of data points they track around each parcel or consignment, rising from 20 to 25 data points now to more than 100 in future. This will be essential to remain competitive in the supply chain and logistics sectors: more data that can be acted on in real time should help businesses deliver better service and prevent problems.

Dealing with the data deluge

This huge gain in data can be problematic. We already create more data today than ever before and can find it difficult to build an IT infrastructure to cope with it. Offerings like public cloud can help, but are they the right options for IoT data?

How data enters our organisations and what we do with in it our databases is a good starting point. Databases store data for various purposes – acting as transaction systems to carry out instructions, to store records long term, and to be acted on by analytics to provide insights. For IoT, data is created by thousands of sensors and the order of those data points can be essential in figuring out how to scale data operations.

Here a cloud database can help as public cloud services can scale to meet demand quickly, but there are some challenges to overcome. The most important issue is trust. Many supply chain and logistics companies – particularly those that operate internationally – run critical data sets and applications. Simply lifting and shifting those applications to cloud providers will be difficult, if not impossible.

Alongside this, the data itself is incredibly valuable and although the cloud providers are likely to design and build security around their IT more effectively than individual businesses, firms need control of how their data is stored, accessed and used.

Many retail, supply chain and logistics companies are feeling pressure from digital natives like Amazon, which might also be the provider of their public cloud infrastructure. For this reason, hybrid cloud deployments – which combine public and private clouds – are proving most popular with supply chain and logistics companies. Gartner estimates that around 90% of companies will adopt hybrid infrastructure management approaches by 2020, while the overall spend on cloud compute will reach $68.4 billion (€60.6 billion) in 2020.

Patrick Callaghan

Complexity is a big issue too: logistics companies have extremely complicated networks of applications running their businesses and creating data internally, as well as all the influx of IoT data. Bringing these two sets of data together has huge potential, but readying the data to be aggregated is a lot easier said than done.

While cloud-based compute and storage services can scale up rapidly to cope with all this information, the same can’t be said for databases. Most established ones are based on technologies developed for single instances or smaller clusters based in one location, not for use in the cloud.

Distributing data

Companies adopting cloud computing for new IoT services need more than cloud-based hosting for their databases – they need databases that are architected to run natively in the cloud. This means that they must be fully distributed and able to run across any number of nodes without a traditional ‘master’ node.

Similarly, a cloud database has to handle data being created by billions of different sensors, devices and people all the time, across different geographies and channels. This means being able to scale up and meet demand levels quickly and predictably.

For hybrid cloud deployments, being able to run across multiple systems means being independent of the underlying cloud, whether it is an internal private cloud or public cloud. Ideally, it should be able to run across multiple public clouds to take advantage of locations and services that are closest to customers. Running across multiple locations and being active everywhere – regardless of cloud service or who provides the host infrastructure – ensures they can cope with the huge input of data.

Looking at Gartner’s research on the leading supply chain companies in Europe, there is a real emphasis on collaboration and experimentation around new digital technologies. Cloud databases can help these companies deliver those kinds of services around IoT data, regardless of how big the set of data becomes.

For more information, click here

The author of this blog is Patrick Callaghan, solutions architect, DataStax

Comment on this article below or via Twitter: @IoTNow_OR @jcIoTnow