The power in popping the hood

The search engineering team @ Naukri labs has had a tradition of not only using open source, but delving deep inside the software we use for better understanding and ultimately control — in other words, taking advantage of the “open” in open-source. As can be expected, for our core search and intelligence capabilities we rely heavily upon various open source projects. Yet, given the complexity of our requirements as a domain-leading portal, it is to be expected that none of the tools, frameworks or apis out there give us out-of-box configure-and-deploy capabilities. At the same time, pushing back on product requirements due to limitations of a technology platform is a dangerous situation to be in. If you’re having to do this frequently, be very afraid. Enter platform development.
A cool example of how we use open source to our advantage is the enhancements we have made to the relevance calculation core underlying all of iFind. We use an API that provides extensive full text search and ranking functionality, and we need to extend this functionality further to support the primitives and control that we require.
To accomplish such control we must delve deep inside and around sweeping portions of the API — extending, decorating and often effectively replacing significant portions of the default code. Documentation goes only so far, omitting cases where such sweeping changes are needed by calling them “expert features” or “research areas”. Expert Schmexpert. So we read code in these areas effectively marked “tread carefully” by the writers of the code as if we (who are not experts) wrote it, and that gives great insight into how we can twist arms, minds and code to squeeze function and performance.
Full text search engines typically store listings in an inverted index. This representation is especially suitable for matching documents to queries composed of terms. Conceptually, both queries and documents are treated as vectors existing together in a single high dimensional space. Typical vector cosine similarity leads to one way of computing relevance scores. We need to do better. We need not only the capability to calculate and factor in measures such as proximity, relative proximity, explicit boosts and various other custom measures, but also to expose such measures as atoms that can be used in our custom sorting, filtering and grouping framework. That is because we are a platform, and not a one-off search engine application. Such things require understanding our tools by looking at their internals, so that we can dismantle them and enhance them, and then wrap them inside our control framework.
The result is a search platform that is easier to customize in ways that the world considers esoteric or advanced or both, and this gives us an edge in terms of quick turnaround on features that would seem impossible or complex or advanced if we didn’t have a handle on the internals. It imparts a true sense of power over the machine, and makes the work-day here in the search platform team exciting.

Posted in General

One thought on “The power in popping the hood”

Presentations

Archives