Smart Cities

The Small Data Counter-Revolution & the Democratization of the Data Age

Since the emergence of the data age we have been taught the 4 Vs of good datasets: volume, velocity, variety, and veracity. These are the qualities that have given rise to the big data companies that have come to dominate the growing use of machine learning (ML) and artificial intelligence (AI). Those with the resources to enable the velocity, variety, and veracity of great volumes of data have been able to lead the data race with sheer problem-solving capacity. Recently, however, new approaches have emerged to disrupt the core principle of volume and give rise to a small data counter-revolution. The first is transfer learning, which transfers elements of big data models pre-trained for other tasks to tackle a new problem where data may be more difficult, or impossible, to collect. Google’s open-source BERT model, for example, has built up over 340 million parameters, while OpenAI’s closed GPT-3 is orders of magnitude bigger with 175 […]

Stay ahead of the pack

with the latest independent smart building research and thought leadership.

Have an account? Login

Subscribe Now for just $180 USD per year per user ( just $15 USD per month) for Access to Quality Independent Smart Building Analysis!

What Exactly Do you Get?

  • Read every article published in full and get unlimited access to our archive of over 1,400 articles.
  • 10% discount on ALL Memoori Research reports for Subscribers! So if you only buy ONE report you will get your subscription fee back!
  • Industry-leading Analysis Every Week, Direct to your Inbox.
  • AND Cancel at any time
Subscribe Now

Since the emergence of the data age we have been taught the 4 Vs of good datasets: volume, velocity, variety, and veracity. These are the qualities that have given rise to the big data companies that have come to dominate the growing use of machine learning (ML) and artificial intelligence (AI). Those with the resources to enable the velocity, variety, and veracity of great volumes of data have been able to lead the data race with sheer problem-solving capacity. Recently, however, new approaches have emerged to disrupt the core principle of volume and give rise to a small data counter-revolution.

The first is transfer learning, which transfers elements of big data models pre-trained for other tasks to tackle a new problem where data may be more difficult, or impossible, to collect. Google’s open-source BERT model, for example, has built up over 340 million parameters, while OpenAI’s closed GPT-3 is orders of magnitude bigger with 175 billion parameters, each developed to tackle the problem of understanding natural human language. It turns out, however, that a small selection of that data can be very effective in training more specific language tasks through techniques like few-shot learning.

“Taking essentially the entire internet as its tangential domain, GPT-3 quickly becomes proficient at these novel tasks by building on a powerful foundation of knowledge, in the same way Albert Einstein wouldn’t need much practice to become a master at checkers,” says Jiang Chen, VP of Machine Learning, Moveworks. “And although GPT-3 is not open source, applying similar few-shot learning techniques will enable new ML use cases in the enterprise — ones for which training data is almost nonexistent.”

While big data is necessary to solve broad problems like natural language processing, small data models can gather just the parts of that big data they require for solving specific issues, such as building automation support, for example. Rather than learn how to understand all languages, small data systems can focus on understanding company and task-specific languages through few-shot, one-shot, and even zero-shot machine learning techniques.

“For example, if we have a problem of categorizing bird species from photos, some rare species of birds may lack enough pictures to be used in the training images. Consequently, if we have a classifier for bird images, with the insufficient amount of the dataset, we’ll treat it as a few-shot or low-shot machine learning problem,” explains Dr. Michael J. Garbade, Founder of Education Ecosystem. “If we have only one image of a bird, this would be a one-shot ML problem. In extreme cases, where we do not have every class label in the training, and we end up with 0 training samples in some categories, it would be a zero-shot ML problem.”

The ability of smaller companies to utilize large open-source datasets, like BERT, to train their ML systems to effectively tackle specific challenges, changes the whole data landscape. Where previously, only the companies with the resources to collect, store, and process huge volumes of data could create the most effective ML systems, now even small companies can create effective solutions, albeit to more specific problems. Small companies have small data but they are learning to gather and apply it better, and they are also learning to share.

Another approach disrupting the traditional data hierarchy is collective learning, where many smaller companies working on the same problem, combine their data through a third-party to develop their models beyond what they can do with their limited resources alone. More recently, the company and task-specific qualities of transfer learning models have been brought together with collective learning approaches, the resultant data is shining a new light on the patterns of language found for the same task across different companies, for example.

“The combination of transfer learning and collective learning, among other techniques, is quickly redrawing the limits of enterprise ML,” says Chen. “For example, pooling together multiple customers’ data can significantly improve the accuracy of models designed to understand the way their employees communicate. Well beyond understanding language, of course, we’re witnessing the emergence of a new kind of workplace — one powered by machine learning on small data.”

Small data does not replace big data, it simply provides a platform for smaller companies to better serve their customers with ML and AI services, thereby democratizing the data business landscape. However, with the evolution of small data through transfer learning, collective learning, and their combination, new insight is being created that big data may not have discovered alone. Now, big data companies will no doubt be investigating the potential of small data process to support a wide range of goals, and they have the resources to also rule small data in a big way.

Most Popular Articles

Spacewell Nemetschek Group
Smart Buildings

Spacewell Smart Building Management Business & Financials 2022 Examined

In this Research Note, we examine Spacewell, part of the Manage segment of Nemetschek, which provides software for the management and operations phase in the building lifecycle. This article is based on their 2022 annual results, investor presentations, the Annual Report of the parent company, published on 23rd March 2023 and the group’s strategy in […]

Net Zero Buildings Balfour Hospital
Energy

Net Zero Buildings Explored – The Balfour Hospital

The Balfour Hospital in Kirkwall, on the remote Scottish islands of Orkney, has established itself as the first fully net-zero National Health Service (NHS) hospital in the UK in 2019. The £65 million facility, designed to accommodate 49 patients, has pioneered an innovative and sustainable approach to design, construction, and operation processes. Delivered by the […]

CXApp SPAC
Smart Buildings

Smart Building SPAC Trend Continues with Hybrid Work Play CXApp

Since 2019 the business world has been inundated with high-profile SPAC deals. And while we have certainly seen a slowdown in SPAC deals in recent months, the SPAC trend is by no means over. In the smart building industry, a new SPAC was completed this month, as workplace experience platform provider CXApp completed a business […]

Subscribe to the Newsletter & get all our Articles & Research Delivered Straight to your Inbox.

Please enter a valid email

Please enter your name

Please enter company name

By signing up you agree to our privacy policy