Text
Analytics News have commissioned a series of exclusive interview
sessions with the leading minds from the Text Mining Technology
industry.
|
|
Andrei
Broder |
Seth
Grimes |
Joining us on the very
first USA interview is Andrei Broder, VP Emerging Search Technology,
Yahoo! inc and Seth Grimes, President, Alta Plana.
Text Analytics News ask
what are the challenges for text mining technologies? Where are
the most vital applications to be found? And where is text analytics
headed next?...
Go straight to:
Top Challenges
Vital Applications
Predictions
TA News :
What are the top 3 challenges for text mining technologies from
your point of view?
Andrei Broder
: I have been involved with text mining for quite some
time and have worked in various roles. To answer this question I
think I would like to put my end user hat on and tell you the challenges
from the Yahoo! perspective.
The first challenge comes from traditional
text data mining being just that – text focused. However there is
a lot more context around the text. There are a lot of offerings
in the form of packages but the challenge is increased because we
are facing a lot more context in enterprise and in other complex
situations. Challenge number 1 is how to mine the context around
the text as well as the text itself.
Challenge number 2 comes from an adversarial
perspective. Increasingly, many people try to disguise their messages
in order to get around spam filters. This is a very serious challenge
for us at Yahoo. There is a huge cost associated to spam and what
it costs individual companies each year. We need to be able to identify
legitimate messages from those that are not. The challenge is –
How do we really know what spam is these days?
The 3rd challenge for us is the pure
scale of the content i.e. the size of the web. Scaling the analysis
is very difficult and just how much time can one spend on mining
the entire web? You can imagine the challenge here.
Seth Grimes
: Text analytics has really started to get the attention
it deserves. After all, given the value locked in textual data sources,
the potential payoff is huge. The top challenges mix technical and
market aspects.
Start with a challenge that is shared
by business intelligence and data mining: How do you make text mining
more accessible, easier to do? The answer is the same as for business
intelligence and other analytical technologies: focus on usability,
create domain-specific interfaces and models, and build the technology
into everyday, line-of-business applications. Researchers and vendors
are working on all this, and they need to keep it up.
Next challenge: deepen text analytics'
value. Search is a killer app in the enterprise and on the Web,
but search falls far short in giving users what they really want.
Users don't want document hits, they want answers to questions.
Text analytics -- natural language processing, information extraction,
disambiguation, etc. -- can produce those answers.
Challenge number three is to deliver
integrated analytics. Businesses love to talk about 360-degree views.
They've wanted for years to link disparate data on customers, partners,
and channels with demographics and reference information to create
a unified, all-encompassing enterprise view. Now we can add text
to the mix.
TA News :
Which applications do you think are most important?
Andrei Broder
: Well. This is very much industry specific – you know,
each industry has their own set of applications which they deem
as the “most important”.
From my perspective I would say that
Web Advertising is an industry worth around $17bn – therefore applications
within this field are hugely important.
Advertising by context of interactions
and really understanding the text on the page and the opportunities
it holds for advertisers is important. Of course the validation
of the approach is simple… people click on the most successful ads…
the technology tells us this.
Seth Grimes
: There are two senses to this question. You're asking,
perhaps, about the business domains and types of information for
which text analytics can be most fruitfully applied, and you're
asking what text-analytics functions deliver the most value. Let's
tackle the second aspect first.
Text analytics is most important where
it gives the most "lift" to conventional computing, and
that's in its ability to discern machine-processable meaning in
documents. Stuff like sentiment extraction: that's really exciting.
The most important application domains?
I'd nominate two. The first is biomedicine and drug discovery. Text
analytics offers immense potential to do good by speeding new pharmaceuticals
and therapies to market. The second is intelligence and counterterrorism.
Text mining is key in these efforts to make the world a safer place,
and they have really fuelled development of the technology.
But that's text analytics to date.
Now we're seeing rapidly accelerating uptake in marketing applications,
for CRM, for reputation management, in mining social media. Measuring
importance in monetary terms, those areas have the biggest growth
potential. They'll be huge
TA News :
What's next for text? Please give us your predictions for the text
analytics market over the next 12 months.
Andrei Broder
: I think we will see much of the same growth we have seen
up to now. One question that may be asked more and more is – how
comes the industry is not consolidating? There may be some interesting
activity here.
I think it's fair to say that we may
see the emergence of a common set of technologies that cut across
industries and can therefore be applied across multiple industries.
If the industry vendors use more synergy to cut across fields of
applications, I think we could see some real cross-industry text
mining technologies emerge.
Overall, I believe that more integration
across the fields of applications will experience a tipping point…
but of course I cannot predict an exact date! One thing is for sure…
t will be like a snowball effect – once people begin to converge
technologies, more will follow.
Seth Grimes
: Vendor consolidation and vendor emergence. It's hard
for smaller players to catch fire. Technology isn't enough. You
need deft marketing and adept positioning. Look for small fry to
be swallowed by companies selling content-management solutions,
BI tools, or enterprise applications. Informatica's $55 million
2006 acquisition of Itemfield points the way. At the same time,
academic and government-sponsored research is producing great tools
that are a natural for commercialization. Some of them will emerge
as intriguing, new niche entrants to the text-analytics market.
And we'll see vendors increasingly
answering key challenges: delivering accessible, deeper, and better
integrated text analytics. The coming year will boast some exciting
developments.
TA News : Thank you
Andrei and Seth.
|