In the world of smart homes, buildings and cities, we collect far more data than we use. Considerable time and effort is spent designing computer programs that sift through this mass of data to find useful nuggets of information that allow us to develop actionable intelligence. Then we spend significant amounts of money creating storage capacity to store all the raw data.
Wouldn’t it be better to focus on reducing this data at the source, filtering out unwanted information before it begins to slow down and add costs to the process? No, according to many experts, including Hank Weghorst, Chief Technology Officer at Avention:
“Data is currency, and it has tremendous value beyond its initial application. Even the seemingly unusable or unimportant by-product data your company stores – often known as data exhaust – can have significant value, if you know what to do with it”.
There are innumerable ways to analyze the same data set. What might not be immediately useful now, could be invaluable in the future. Data you may not be able to process now, would likely be easier to process in the future. And what might never be helpful for you, might be exactly what someone else needs and is willing to pay for.
This is not new information for cloud computing companies. In fact it seems as if this excess data, or data exhaust, is the foundation upon which the cloud computing industry was built. Compared to in-house storage, cloud computing vendors offer customers simpler manageability, quicker implementation, regular updates, and promise greater security, all at a fraction of the price. These advantages are not a lucky coincidence.
The new era of cloud-based vendors realized there was more to cloud computing than simply providing data storage. They realized that by analyzing cumulative, cross-company data exhaust, they would create far greater intelligence than a single customer database could ever achieve. Furthermore, by adding public data and additional data sources, combining them with modern data science techniques and machine learning algorithms, they could take this intelligence to unprecedented levels.
Seeing this opportunity cloud-computing vendors such as Google ensured that they maintained the rights to use anonymous cross customer data exhaust. Once you have the ability to mine data for intelligence, then the more data you have, the more material you have to mine from. Not all data sets are equally useful but with computing economies of scale, quantity of data really pays off. Remember our article on AI in Smart Buildings back in May:
“In recent years Google has bought 14 AI and robotics companies. Considering “search” still makes up 80% of Google’s revenues it would be logical to assume that Google is improving its AI portfolio to improve its search capabilities. However when you consider the development of AI you realise that Google is actually using search to make its AI better.
Every search and subsequent click on a given result is training Google AI to know the “right” result for any given query. Then when you consider the 3.5 billion searches taking place on Google everyday, you begin to get an idea of the potential speed of upcoming AI developments”.
A mass of cross customer data does not come without challenges however. Each data set may be made up of different structures, sizes and formats, making harmonizing these data sets very difficult. In addition, the harmonising process itself may change the data set in ways that reduce our potential to analyse it (see paragraph 4). In order to make the most from all this distributed data, the raw, unaltered data must be stored in a way that easily and productively allows for its use in a broad variety of expected and yet-to-be-imagined ways. Enter NoSQL:
“Whereas we used to live in a comfortable existence of relational databases, with neat and tidy rows and columns of data, today's world is a morass of unstructured or semi-structured data. It was always thus, of course, but we lacked the data infrastructure to process it. With NoSQL databases like Apache Hadoop, Apache Spark, and more, we finally have the right tools at the right price point (free and open source) to tackle our data,” explains veteran technology columnist, Matt Asay.
The vast network of sensors and machines interpreting our world, and the immense, continually growing data streams, are essentially useless without a ‘brain’ to file and structure data in order to enact intelligence. “The part that seems to me ‘intelligent’ is the ability to find structure in data,” said Naveen Rao, co-founder and CEO of deep learning firm Nervana.
NoSQL is the extra flexibility needed to make the most of our increasingly unstructured world, while cloud computing companies are the ones with access to the most data and therefore the most intelligence.
[contact-form-7 id="3204" title="memoori-newsletter"]