Conference Exhibition & Sponsorship Download Brochure Venue

What Can Text Analytics and Silicon Valley Learn From Each Other?

By Seth Grimes, Text Analytics Summit Founding Chair

Business markets are global, yet the Bay Area stands out as a source and consumer of innovative technologies and in particular, as a pace-setter for the online and social worlds. With the Text Analytics Summit coming to San Jose, I reached out to a few west-coasters who are making Valley text analytics news: Nitin Indurkhya, principal research scientist at eBay Research Labs; YY Lee, COO of FirstRain; and Michael Osofsky, co-founder and chief innovation officer at NetBase.

I posed a couple of questions to our three experts, designed to explore the role of text analytics in online/social innovation and the charge toward productization of text analytics being led by Bay Area companies.  The two questions concern both sides of the text analytics coin:

  1. What does Silicon Valley need to learn about text analytics? That is, how can online & social innovators most benefit from the technology?
  2. Noting that most text analytics companies originated away from Silicon Valley, what can text analytics providers and users learn from Silicon Valley, from the Valley's way of doing business?

 

Below are the three experts’ responses to the questions, with minimal editing:

First, Mr. Osofsky recognizes the distinction between technologies and business-focused software solutions, as well as the benefits to be gained in bridging the two worlds.

Michael Osofsky: As demand for text analytics features in software grows, the professionals from the text-analytics community and the techies of the Silicon Valley are going to have the opportunity -- and requirement -- to learn a lot from each other.

Silicon Valley software companies seeking to integrate text analytics into their products need to learn that there is a fundamental trade-off between what's known as precision and recall in text analytics. When you optimize for precision, you end up missing some of the data you'd like to have (low recall). When you optimize for recall, you get that data, but it is noisy (low precision). To navigate through this trade off, software development managers need to allocate plenty of time for experimentation.

On the flip side, text analytics professionals can learn from Silicon Valley how rapidly software can be developed, launched, and evolved. The culture here is fast paced and people take risks. A popular software development methodology called ‘agile’ emphasizes iteration over trying to develop a master plan. It requires customer intimacy to understand requirements and to facilitate trial-and-error.

What both Silicon Valley and text analytics professionals are learning is how tricky it is to get these projects right. The knowledge of what's needed and what's possible is quite ‘sticky,’ according to Eric von Hippel, Professor of Technological Innovation in the MIT Sloan School of Management. New strategies must be adopted to ‘unstick’ that information, ‘marry’ it, and produce great innovations that endure.

 

Mr. Indurkya similarly focuses on business and technology dynamics.

Nitin Indurkya: The Valley's way of doing business can be summarized as ‘rapidly failing forward.’ Fail often and fail quickly. The lessons learned should be folded into the ‘next big thing.’ Most successful Valley companies never start with a technology but rather with a problem. Text analytics is not a problem. Customer Service call analysis is a problem. Chatter about a company on the twitter feeds is a problem.

Just about every business is being redefined around hybrid analytics - structured and unstructured. Many companies have an acceptable level of grip over structured data analytics – but unstructured data is altogether another story. Besides handling domain-specific text, considerable innovation could be achieved by incorporating general-purpose information extracted via Web-scale document analysis.

 

Nitin’s and Michael’s responses are a great lead-in to YY Lee’s.  FirstRain built itself from the ground up to pursue Web-scale information harvesting. FirstRain seeks to ‘unstick’ and ‘marry’ information to provide up-to-the-moment information about businesses and executives.

YY Lee: When you say ‘Silicon Valley companies,’ what comes to mind for me is less about geography or funding style than a connotation of an end-product company. You mentioned eBay, and of course we can't overlook Google, but we also think a lot about companies like Netflix and Kayak.com that have to push the envelope of analytics in interesting ways to provide very specific types of user information experiences.”

YY includes FirstRain in this grouping.  She offers lessons FirstRain learned from experience as a product/end-user focused company whose solutions are built around a substantial core of text analytics & semantics IP.  Take the following observations as points that the text-analytics world can learn a lot about “productization:”

“If you are focused in a "whole-product" result... there are substantial and fundamentally different analytics needs for the following aspects of your application:

  1. Input (assuming this is from human users)
  2. Internal processing / pattern-matching / analytical algorithms
  3. Presentation (back to human users.

I can't stress enough how critical each of these pieces are, and how different the fundamental technologies need to be to effectively drive each of these layers -- although the techniques all fall broadly under “text analytics.”

For example, when you're trying to do pattern discovery among millions of data points (as we do for business and Web content daily), you often do need very complex and abstract methodologies internally within your processing pipeline to extract any real insight and added value from the data.  However, outputs of complex methodologies (and worse-yet, sometimes black-box techniques) are usually not readily consumable by business or consumer users. So we find it necessary to actually invest almost as much IP and effort in presentation-layer analytics that turn results from the analytical pipeline into something that has higher-level meaning, takes into account human perceptions, and has done the translations to be instantly meaningful to end users.

We (FirstRain) have occasionally worked with some advisors / firms / experts who have helped us add different styles of analytics into our system, and this has been extremely helpful. But every technique has its strengths and weaknesses. And along the way, we've learned that a sole focus and/or expertise on any particular set of techniques can actually distract you from the product problem you are trying to solve.

For us, the art has been actually segmenting and adjusting the problem-definitions so that you can apply the best technologies at the right stages, and also implementing the techniques so that they are actually self-adaptive to the real content that is running through our system. So whatever techniques we have adopted whether statistical / heuristic / finger-print-based / etc., we've always found that they require careful tuning, placement and targeting to be a value-added layer across our content corpus. Because we are a product company that has been systematically building & evolving a system that addresses a very broad content set over years, this is the skill that we've really honed.

The key takeaways that I see in these responses involve problem and product focus, agility, and the desirability of pulling and integrating information from multiple sources with the application of a variety of analytical techniques, in order to achieve technical and business goals.  There’s no “Do X, Y, and Z” formula here, but there is definitely a sense of the rewards that are possible if text analytics is done right.

 

Seth Grimes is the founding chair of the Text Analytics Summit and principal consultant at Washington DC based Alta Plana Corporation. He can be reached by email at grimes@altaplana.com or by phone at 301-270-0795. Follow Seth on Twitter: @SethGrimes.

 
Linked In Twitter
testimonials
Platinum Sponsor
  Attensity  
Silver Sponsors
Beyond The Arc Voci Anderson Analytics clarabridge
Exhibitors
 Cognition IBM
Speaking Organizations
AltaPlana Amalgamood Arnold-IT ClearAction Collective Intellect
ebay EMC JDPower mindshare radian6
Zynga Symphonetic Insight Verint IBM Teradata
AB MindTime VOCI MarkLogic Social Media Today
Attensity Cognition Technologies Cisco Harris Interactive
Media Partners
Analytics BeyeNetwork customerthink IEWY NewsDesk INFORMS
KDnuggets Next Gen Market Research ReadWriteWeb Search Business Analytics SearchCRM
Social Media Today Toolbox SearchContentManagement.com Smart Data Collective CMSWire
    The Faster Times    
Knowledge Partners
fido intelligence OdinText Beyond Search Kaps Group GITPRO