Contextual Searching

Searching for the context of a message using various techniques and algorithms. Everything from Vector Search to Latent Semantic Indexing (LSI) to Contextual Network Graphs (CNG) will be discussed.

Monday, January 30, 2006

Finding the needle in a haystack

What's this all about? Why do we need a way to search raw text for information? What sort of information can we expect to glean from raw text?

These are but a few of the questions raised about LSI and other contextual search techniques. The simplest answer is that we can better search the plethora of websites if we can find them based on context rather than raw keywords. Regional differences in how we tend to explain our topic are but a small fraction of the reasons for this. We all tend to phrase things differently and with languages being the living, changing things that they are, there are new ways of saying the same thing emerging daily. This is where contextual search can shine. What we are looking for is the "best" fit for our search so that we can spend less time researching a topic and more time learning about it.

A quick example: I want to research a given investment into bio-tech stocks. I pick a few major contenders I have read about and I begin looking on the Internet to find out more about them. I find a few good articles, but I have to read through thousands of bad ones to find the good ones. Now, if I could only take the several thousand articles from a more general search engine (like Google or MSN) and further "sort" the articles based on the overall theme of them all, the cream would rise to the top. This is what a Contextual Network Graph does, given the proper tuning. So, what we want is a quick search and retreive from popular search engines, then we want to process those documents in some way that would cause sentences, paragraphs, etc. to bubble to the surface that had the most content. We might even want to apply constraints or further search criteria to limit want we get back. By doing this we manage to read far fewer documents to find the "good" one we want. The proverbial needle in the haystack.

43 Comments:

Post a Comment

<< Home