We recently caught up with two text analytics experts ahead of our Text Analytics Summit in San Francisco to get their thoughts on the growing commercial importance of text and big data analytics, and where the next wave of gains will be realized.
Principal Scientist Data Mining Systems & Health Management,
Tom H. C. Anderson,
Founder and Managing Partner, Anderson Analytics
What needs to happen in the market, in the process, and in the translation of results in order for business (end users) to trust and depend on text analytics as a part of everyday business practice and budget allocation?
Ashok: My first reaction is that historically speaking, like for the last 100 years, people have been thinking about how to use numeric data to solve problems, and statistics was built on this - the average of data and standard deviations, and this is how people SHOULD think. But 1/3 of all business leaders make decisions without numeric data. Text is a step beyond this because a lot of the math machinery is hard to apply to text due to semantics. And there are many types of data which are easy to process by the human mind but not the machine mind. So we need improvement in semantic technologies that can help mark and tag and see the underlying meaning in large text corpa. There are large companies that do that, but from what I’ve seen, they don’t lend themselves to easy analysis. You can tag your text and get stuff out of it, but it’s still up to the human to make the connections. So for example, if we start developing techniques for the overlap, then that would be of great help. You can imagine this doing a very large analysis and automatically discovering topics and looking for trends.
Tom: Well I think about all the effort and cost that is currently being devoted to collecting and storing this kind of data. Currently there us a lot of buzz around “social media monitoring”, but most companies have a large amount of text coming in from people who they know are their real customers.
So I would throw the question back and ask, what is the ROI of collecting and storing combinations of structured and unstructured information from sources such as customer call centers (call logs and email complaints and suggestions), customer satisfaction and brand tracking surveys etc. if you’re not going to bother leveraging that information for insights?
The good news related to text analytics is that as long as your competitors aren’t doing it yet, the information advantage which can be gained through leveraging text analytics is even greater.
Our challenge as text analytics software developers has been to make the software as powerful and easy to use as possible. However software is just half the battle. Equally important is allocating analysts time to think about how to best leverage the data and tools.
When the first calculator came out no one was expecting the calculator to create presentations ready for the C-Suite without the investment in an analyst. A calculation on proper return on investment on text analytics needs to include a certain human time investment factor as well.
How far away are we from these technologies reaching that level of reliability?
Ashok: We’re getting closer. We know reasoning technologies are still a ways out. And IBM in the way of automated query and analysis is notable. But as far as semantic reasoning and understanding, we are still a few years out. Maybe 5-7 years.
Tom: Depending on whom you talk to we are already there. Comparing human analysis of unstructured data to text analytics software in isolation is like comparing apples to cows. Output from text analytics software because it is in fact 100% consistent is far more reliable than human analysis and because of this lends itself to statistical/mathematical analysis that human text coding cannot. Of course as I mentioned earlier, a proper text analytics effort requires machine and human to work in concert.
How do we need to think differently – to ask questions differently – to encourage development towards greater ROI?
Ashok: You need a specific business problem in mind. SO let’s call it vertical application that targets specific needs of the business community. If we were to digress, look at Hadoop. There is a real business problem that it solves. And it’s why they are seeing so much traction. And the distribution is open source.
Tom: Approach research involving text data the same way that you approach analysis of structured data. Too often we do not think sufficiently about the data we have available and to those who would benefit from analysis of that data. Ask internal clients, what are you struggling with? What assumptions are you making? Does it make sense to leverage text analytics to explore, quantify, prove and model these assumptions? Often the question is yes, sometimes it is no. It depends on both the objective and the data in question.
Within the cycle of, say, 5-7 years, what will the market be clamoring for; what demand will developers and engineers be striving to meet?
Ashok: My reaction is that it depends on the way problems are (being) solved. If they are solved in a vacuum, and we as technologists have solved (problems) without a larger business problem in mind, we will still not have a large degree of adoption. I say that we are trying to solve a specific problem, and then we build tools and tech to solve that spec problem. I can understand hesitancy to technologies at large.
Tom: It’s hard for me to imagine all the ways text analytics can be used; I just know there will be many. My company’s expertise has been in the field of consumer insights/marketing research, therefore our software was designed for specific types of text data with specific types of analysis in mind.
To think that text analytics could or should only be used one way or that there will only be one dominant solution out there is sort of crazy to me. Almost like saying that math could only be applied one way to one profession. What is clear to me is that there will be many different kinds of implementation for very different purposes.
And if you look forward to next 3 years, what progress would you like to see made to bring data and text analysis closer to a household name?
Ashok: I’d like to see applications come out that allow for rapid ingestion of text and numeric data simultaneously, so that we can look at the combination of text and numeric data simultaneously with ease. And also algorithms that can bring these two sources easily. And I think it could happen in 18 months for a specific vertical. It’s all just data in the end. And we need to give people the ability to analyse as fast as possible (with data) stored in ways that automation can access and take decisions on that data in real time.
Tom: It could happen overnight. Again, if we are talking about a household name, I’m assuming you mean something that could be used or sold B-C rather than B-B. I think it will happen on the web first for the obvious reasons that this is where most people are creating and using data most. So web search like Google, or how we use various PC applications including email, social media related to what we discuss and comparing and suggesting to us based on this data how we are different from our peers and what we might be interested in are obvious ways in which I believe this could happen rather quickly.
And what if we’re looking at a specific vertical?
Ashok: If we successfully address specific vertical problems, we can generate results that have repercussions that solve problems beyond that, but we have to show an appreciable ROI to a real life business problem (first). And I differ this way to a lot of technologists
Tom: As I noted earlier, I think this is the key, and this is what I’ve been saying for years. The best way to achieve significant gains in the ROI of text analytics at this point is by incorporating industry/vertical/domain expertise. Text analytics is absolutely not something that will be owned by any one company or domain. Just like math, there are simply too many opportunities and ways it should be combined with other expertise in that vertical.
So yes, expect to see many different things in many different verticals.