Why Good Data Is Essential to AI Success
Last month, The AI Journal conducted an industry survey to assess how COVID-19 was impacting the AI industry. We were delighted to participate and to have the opportunity to benchmark ourselves. The resulting study (AI in a Post-COVID World) was released last week. You can download it from HERE.
The survey provides an excellent window into the concrete challenges companies face when attempting to implement advanced technology amid legacy systems and pandemic-related disruptions. The reality is, not surprisingly, far more complicated than AI industry evangelists promote.
Since BCMstrategy, Inc. is creating alternative data that will be deployed in ML/AI applications, we were particularly curious to benchmark ourselves against other companies on the innovation frontier. Have a look at Section 2, which focused on the kinds of barriers companies perceive when attempting to implement ML/AI applications.
If you want to worry about the ML/AI industry, however, you need to focus on the terribly small percentage of survey respondents (28%) who said the scarcity of data was a significant impediment to ML/AI adoption.
The relationship between data and ML/AI is much deeper than many people appreciate. We are awash in data in our digital world. Data will only become more plentiful in the near future as IoT devices (particularly autonomous vehicles and industrial robots) throw off data at scale and as vast amounts of unstructured (verbal) data is converted to structured formats.
People who believe that insufficient data exists are at risk of fundamentally understanding how to use the data they have and how to deploy ML/AI systems appropriately, as noted in the quotes above. Industry representative COVID-19 complicates the situation considerably because, as we noted HERE (Twin Data Deficits) and HERE (Why Pandemic Era Deficits Matter) this past summer, the pandemic creates real breaks in time series data and may permanently alter behavior patterns. Mountains of data rendered obsolete by the pandemic cannot be made relevant by ML/AI systems.
Using data responsibly and appropriately within the ML/AI context is far from easy at the operational level. The temptation to take shortcuts is irresistible to many.
Many race to cram as much data from any source possible into ML and AI applications in order to proclaim they are the first to deploy this advanced technology. The problem, however, is that throwing any data into ML and AI tools increases significantly the risks that process automation generates incorrect or flawed outcomes. Since most of these systems incorporate convoluted and opaque processing methodologies without audit trails, verifying HOW an outcome was generated becomes impossible.
We believe that if the process cannot be verified, at a minimum one must be very finicky about the composition of the training data. Noone wants to be the next Tay. (Tay was an AI-powered chatbot deployed by Microsoft onto Twitter. The had to shut it down because within 16 hours the AI had learned from Twitter how to generate racist, offensive, uninformed, and generally inappropriate commentary. You can read more about the details from this Techcrunch article.)
"The key thing to remember is that AI is only a tool, how it is being used and in which governance framework it will be applied will determine its usefulness for individuals, businesses and societies as a whole.”
We couldn't agree more. Great care in curating training data today may seem boring to many, but it is the crucial ingredient to generating reliable, relevant, and responsible outcomes.
BCMstrategy, Inc. is a technology company that is bringing the data revolution to policy intelligence by using patented processes to generate structured data from the public policy process. Access to the data is available on a subscription basis at www.policyscope.io.