Inverse Word Lookup
Posted: June 4, 2006
The idea is pretty simple. I want to be able to type in a few phrases about a word (without the word), and be given the best matches of single words or phrases. One can almost do this with a search engine on the web, or by the intelligent use of a thesaurus (if the term is in a dictionary). I want something that can be applied to any area, however, particularly areas that involve a lot of their own jargon.
Some examples:
Unix Man Pages
I like linux as much as the next guy but there is almost always a steep learning curve to do anything new. The commandline that makes *nix systems so powerful also makes it difficult to learn, particularly when the "documentation" can only be found through sometimes painful web searching. For example, let's say you install a new hard drive and it isn't recognized automatically, or worse, doesn't seem to work well with your particular application. What if you haven't ever configured a hard drive before? Why, just edit your /etc/fstab file and then tweak things using the hdparm command, of course! Of course that's totally not obvious, so you can expect to spend a long time with Google before you even know that fstab and hdparam are the utilities that control configuration of the hard drive, at which point you can actually start learning how to use them.
I would like to create a small application, where the user would just type some keywords or sentences, like "hard drive, chunk size tweaking, declaring file system properties, etc." They will then have a few man pages for commands returned to them that are likely relavent. I believe Gnome, and possibly KDE, as well, has a standard for the documentation of their applications, so GUI applications that conform to those standards could be indexed and searched in the same manner. The underlying algorithm would be a pretty basic search engine (you wouldn't even need to worry about markup or links), so I'm sure a very sensible implementation could be done with existing open source information retrieval libraries.
Statistics
I believe two things very strongly about statistics. One is that statistical modeling techniques are under-utilized across the board, as people tend to prefer to predict things by fully understanding systems and then performing some type of simulation. Statistical and observation based models simply rely on making some observations and deducing what will likely happen next. Usually the statistical methods would be good enough for any practical application, but full scale understanding makes for better journal articles, so those techniques win out in terms of mindshare.
But that's not my point here. The other thing I believe, which may partially explain the first, is that few people exploit the best statistical techniques due to the horribly misguided convention of naming statistical methods after the people who developed them. Open up the table of contents of any statistics text book and it will be painfully apparent what I'm talking about. A chapter title such as "Savitzky-Golay Filters" tells the reader absolutely nothing about whether the chapter is applicable to the problem they're trying to solve. Also, would it have been so bad if a bell-shaped distribution was called "The Bell-Shaped Distribution" and not "The Gaussian Distribution."
Unlike man pages, there is not a simple corpus of documents, each relating to a single word or phrase, that can be indexed and searched over. There are some online text books that might be a good place to start, but entire textbook chapters aren't necessarily ideal. I think it would take a knowledgeable person sitting down and writing a three sentence description of as many statistical methods as possible, and creating a database. I consider myself "semi-knowledgeable" about statistics, so I might be able to get a pretty good start, but probably not what I'd consider a production system.
Better Dictionary
This one is pretty obvious. Just attach a simple search engine to a dictionary. Dictionary web sites or desktop applications always have a single text-entry box that you're supposed to type a word into to look up. Why can't they just add another box that does a generic search (such as the search function on practically any decent sized website on earth)? Maybe it has been done, but I've not seen it.
Like the man pages application, this would be extremely easy. There is even a pretty good, free dictionary for download at www.dict.org. There is a small application called kdict that uses the dict.org database for queries which could be easily updated to have my desired functionality, so an entire new application wouldn't even be necessary.
There are two real-life use-cases that I frequently come across that this could really help address:
- Tip of My Tongue. There's a word, I know it, I'm trying to think of it, it's like some words I can think of, but the one I need escapes me.
- Find the Right Word. Naming things and coming up with things like tag-lines for products require the exact right word. I'm always trying to boil things down into the right handful of words that encompass a large array of concepts. People need a tool to help in this. Everyone who's spent an hour with a dictionary/thesaurus looking for the best word can relate to this. Too often, a thesaurus just doesn't cut it.
Copyright 2007 Peter Groves. This text may be reproduced only in it's entirety in any medium without royalty provided this copyright notice is included.