Naukri: Tip-toe through India’s No.1 job site #OneDayTale #Naukri

We love open source and hence we are very open to transparency. The inception to this blog post was an idea to share the walk through of how a user is served in a typical day by your beloved Naukri.com.

Naukri is a black box which gives you results but under the hood it is way more than what anyone can think.

Under the hood

Something like below is what we are continuously doing to serve you better.

Car engine internal working gif — Source

We are just like that one cost effective mission which recently launched 104 satellites to space.

Why?

We are Efficient in terms of

Resources,
User experience,
Great page load times and 99.9% uptime,
Serving a big market, that too with complete customer satisfaction,
And yes, we are profitable.

Ok, Let me share some basic statistics.

In a typical day @naukri,

188,276,144 daily page views (5,087,757,432 monthly page views)
48.1+ million job seekers available for recruiters
40+ million mails
150+ million average Service requests
21+ million API hits
1+ million resume access searches
2+ million views of jobseeker profiles
1.7+ billion job searches
10,000 job seekers registrations / 250,000 modifications
600 new company registrations
7,500 new jobs
1.5+ million applies on jobs
13,500 app installations
49 story points burn down
16+ TB SQL data size
2.5-3K+ (QPS) queries per second on average on databases.
14 ElasticSearch nodes for serving real time indexing

How Naukri matches better?

Overview of Teams and Tech Stack @Naukri

Teams:

We are a proud Tech group composed of 18 small teams handling isolated smaller verticals with the mix of technologies. Each team has its own setup to handle the scale and user volume. We love open source and adopt technologies based on our requirements, feasibility study and scalability & performance analysis.

Tech Stack:

We mostly use a mix of dedicated & virtual servers to host our website. We use cloud to run the monitoring scripts on the website from different geo locations.

We majorly use LAMP (Linux Apache MySQL PHP) to power India’s No1* Job Site. Ofcourse, we are using multi-layered MVC architecture powered by Symfony PHP framework.

To store the data of our users, we use MySQL and for caching we are using Memcache/Redis as per the requirements.

In additon we use the following platforms and frameworks.

Symfony
ElasticSearch
RabbitMQ
Node.js
AngularJs
MongoDB (NoSQL database)
Jenkins (for Continuous Integration, Continuous Development)
PHPUnit (Writing robust code with green lights)
JUnit
HAProxy
Puppet (For deployments)

What we have on offer?

We offer some services like

Searching your next job,
Creating a standout seeker profile,
Premium profiles that appear at top in recruiter search.
Seamless apply on jobs,
Resume creation and resume quality score.
Job recommendation mailers,
Maintaining different profiles,
Featured CV creation service-Fast Forward.
Naukri as Service to apply on jobs.(Apply with Naukri)
and many more like that.

Don’t forget the services we offer to recruiters like

Recruiter profiles,
Posting different type of Jobs,
Searching Candidates-Resdex (based on skillset, locations, notice period etc),
Contact many candidates at once,
Send recruiter emails,
Manage responses of the candidates-CareerSiteManager,
Customized career sites,
Filter relevant candidates for the recruiter-Personalized Application Filters,
Featured job listings,
Letting recruiters hire through their employees network-ReferralRecruit.
…many more features

After serving above two categories there is a third category of user, which is Naukri itself.

We must maintain relevancy to stay in the market and the good ROI for our users to help them stay as well.

We have an analytics team setup to keep the track of trends in the market. They provide us the valuable information about which all locations we should work more, what more features should bring more users onboard etc.

They work in python and hadoop to process the humongous amount of data in parallel and distributed fashion.

We strive hard to match better. We are constantly coming up with more and more products and implementing techniques to automate the hiring process even better.

Our target is to be the one and only stop-shop for job seekers and recruiters.

Let us go with stair case steps towards serving the user request.

As soon as you land on Naukri.com:
Our servers stack up with sleeves up and get the most relevant results as per your skill set, experience, location and many more criteria.

Under the hood

Servers

We have about 250 servers for Naukri including backup and analytics servers. We have divided servers for different small applications and each set of server handles the load for an application independently. This gives us flexibility to scale individual application and fault tolerance.

Load balancer

More the traffic, more the chaos. A traffic police is what we need to keep things in control all the time. For us, 100% availability is the key and compromising with the health of our servers is the last thing we would opt for.

Load balancing is thus what we rely on in order to logically and appropriately distribute the incoming traffic to different available servers. Group of X dedicated hardware load balancers is what we have in place.

You can learn more about Load balancers at https://youtu.be/52bsMq_pzeY

Any user visit pattern changes which induces a rush in traffic is Sophisticatedly directed to the servers performing at par and with ample space to handle the load.

Picture it like a pool of resources (servers) consuming and completing a task as per their potential and returning back healthy and ready to consume another fresh task.

Proxy servers

Yes, we do claim to be transparent about our processes, but not at the cost of security. Proxy servers is what we use to hide our IP addresses and serve under the hood. This is us just playing safe to avoid any system misuse.

Clients from different locations can seamlessly access our data without facing location or company level access restrictions.

Regular caching at proxy servers also helps us to serve our clients faster.

Database

Databases are in master-slave & master-master replication strategy to facilitate high availability and failover. We have created independent databases for applications. Depending upon the app scale, we use sharding which is created as per some (confidential) logic for our clients/users.

Sharding

Though we are undoubtedly elated knowing the rate at which our data is growing, but there is a huge responsibility that comes along. No matter how much data we compute on, Naukri does not want its end users to feel any lag in content access.

Our data driven business demands that we keep the content logically separated for quick, independent, and inexpensive retrieval. Using this “shared nothing horizontal scaling” approach, we divide our database into smaller chunks and distribute them over different servers.

In contrast to replication, there is no data in common that different servers share. Smaller databases are thus easier to manage, cost effective, and fast. In order to cater independent failures, we strongly rely on automated backups, shard and hardware redundancy.

NoSQL

Everything we possess is not structured. Elasticsearch comes to rescue here with its NoSQL database style. Searching and analytics on such unstructured and complex data gave us interesting insights on the data existing as free text on our websites.

Not only externally, we also count on NoSQL to work for our internal server logging in order to parse and make sense of bulk nginx / apache logs.

Meticulous analysis of everyday logs and identification of patterns helps us trigger/alert frequent system health-check reports to our teams.

Exploiting NoSQL databases thus helps us staying “on our toes” 24X7 for any technical illness.

To know more about how we use elasticsearch, follow:

Search Upgrade: How we went through changing search architecture along with elasticsearch version upgrade from 1.x to 2.x

Caching

To return the records to the user quickly, we cache the data which is most frequently asked. To know more about the caching techniques we use, please follow the blog post mentioned below.

MemCache

Why Not Memcached ?

Redis

Redis with a distribution twist

ncCache

ncCache – Browser caching made easy

We want the page loads to be quick and therefore in addition to caching, we use CDN services from Akamai to serve some static content from the nearest possible data center.

CDN (Content Delivery Network)

As per Wikipedia, A content delivery network or content distribution network (CDN) is a globally distributed network of proxy servers deployed in multiple data centers.

Moreover, CDN is a network of data centers which keep the content in sync and serves the users from the nearest possible data center.

( Say your friend’s website is hosted in United States and you are accessing it from India, then the data center in India may serve you the files required and that will save the time to get them from United States )

To know more about how CDN works, check below :

Centralized Java Services

We have a lot of teams always looking out for some information.

Let me take you through an example,

(Tech Team 1) When you apply to a job on naukri, we forward your profile to the recruiter. To do the same we need your profile information.

(Tech Team 2) Similarly when recruiter searches for your profile, we need to get the same information again.

So we can not keep the same information with the different teams.

Why ?

This increases data duplicacy.
Any updates to the data needs to be synced.
All teams would have different code to access the data.

Solution

We created Centralized java services to cater the same. Now the data is centralized at one place and these java services are responsible to provide the data. All updates to the data are still app specific, but reads are centralized.

So all data like

Dropdowns of courses, industry, roles, salary range, experience range etc
Company Profiles,
Jobs,
Subscription of services,

etc is served by centralized java services developed and managed by the dedicated Service team.

You can think of it as an image below :

Java services conceptualization. — Source

where

There is one gateway at the center, which is our services.

Two big building at the extreme corners are different applications.

Nike logo and the user is our application interfaces that constantly update the data.

We have bench-marked the same and found it quite efficient to retrieve the information at real time.

Apart from the centralized data provider services, we have some internal APIs that we use to handle the user requests.

Ours is an API centric architecture which helps us to serve a lot of client interfaces with the one common code. Our mobile app, web app, desktop website all are just interfaces running on the top of the centralized code exposed via APIs.

Some teams have written the APIs in phalcon as it is very high performance and light PHP framework delivered as a C-extension.

Mailer Optimization

Naukri pushes relevant jobs in your mail box which are specific to your profile. Either our intelligent algorithms recommend those jobs for you, or some recruiter may have sent them from Naukri Recruiter Interface.

These mails are huge in number and serving such great number is also a challenge. We use some queuing mechanism and optimizations to serve the same on daily basis.

To learn more about the mailing please follow the link below:

Email Sending Architecture Using Messaging Queue

Inbound Emails for Every Web App: Angle

Apart from mailing at such a large scale, We have in-house webhook to intercept emails and identify the intent of the sender.

Like if someone replies at calendarinvites132222318-rf@referralrecruit.com, we intercept this email and identify it as an acceptance to an event.

To know more about Angle, refer :

Inbound Emails for Every Web App: Angle from Naukri.com

https://www.slideshare.net/naukri-engineering/inbound-emails-for-every-web-app-angle

Inbound Emails for Every Web App: Angle from Naukri.com

Development process:

Developers, Quality Assurance and FED, for beautiful and efficient user interface — all work in house
Requirement gathering from the product team
Grooming the requirements after analyzing current code, architecture and functionalities
Analyzing scope after inducing new features
Providing visual designs to the front end team
Development and Sanity without breaking existing features
Integration with Front end team with new feature developed
Smooth process of development along with unit/integration testing (automated with CD/CI using Jenkins)
Extensive cross browser and cross device sanity
Deployed on test servers for quality assurance before go live
Functional and manual testing on the test server
Every release is then provided to the team of security experts we have and they check each and every input element to find out the vulnerabilities. This ensures that our development is hacking proof (to the best we can).
Deployed on staging server after QA’s go ahead on the test server
Running functional automated testing suites to check if it broke some other part of the website
Once it is green on staging, it goes live to the real and wide target audience
Merging it back to the master branch on GIT repository
Teams follow a Rotating Roster, where development team members monitor logs and alters on live.
Impact analysis after few weeks and report ROI of the same
Repeat from step 1 … infinite times 😛

To the process mentioned above, if we invent something we definitely open source it for the community.

Love towards Open Source

We have open sourced some of our libraries and javascript frameworks. Some of our popular ones are mentioned below:

MyTrend: An ingenious tool to monitor data growth trend and server space utilization

naukri-engineering/MyTrend
Contribute to MyTrend development by creating an account on GitHub.github.com

UserBehaviourTracking: Track User behavior realtime.

naukri-engineering/UserBehaviourTracking
Contribute to UserBehaviourTracking development by creating an account on GitHub.github.com

Deployments

We deploy continuously, we make small features live and do not wait for big releases
Since we are categorized into smaller teams, so bad deployments (if any) do not break the complete website
It helps us test features without letting the major impact on the site
We follow agile, so we always think of the smallest feature that can go live
We use Capistrano for deployments and puppet to maintain configuration changes across boxes. We use mix n match of HAProxy and LVS for load balancing and proxying TCP/HTTP applications.
(Which is better for your application? Follow the following links
LVS vs HAProxy and Experience shared @Ycombinator_forum)
We have no down time due to/during deployments
We then monitor our website and check for major features we offer

Logging

We log each and every request that users make to our boxes, whether they were successfully served or not. We get regular notifications for any errors (404,500) reported on live.

We also track the traffic %age change in every 3 minutes. Any unexpected change in traffic patterns is notified to the teams way sooner.

We also keep track of the statistics like number of new registrations, new searches, unique profiles, new jobs posted, jobs deleted etc etc etc …

Monitoring

We are all humans and checking each and every corner of the website is not possible manually. So we have automation suites setup for the website. Our quality assurance team has written monitoring scripts which we run at fixed interval on site to check if anything is not working.

These scripts are written in perl scripting language and soon we are planning to move the same to Java.

Refer our open sourced selenium code generator.

naukri-engineering/SeleniumCodeGenerator
SeleniumCodeGenerator – Generate the code for Selenium Page Objects automaticallygithub.com

If anything breaks, we immediately trigger an alarm to the corresponding team and the resolution is made live soon before users notice this.

So we have monitoring set up on the site which we run from different corners of the globe to check if the site is functional across the globe or not.

Apart from that, we have a very extensive exception handling which triggers error mails to the corresponding teams as soon as the user is encountered with some issue on the website.

User Behavior Study

We want the users to have a cake walk like experience on every action they perform on our website. This motivates us to observe and track the usage patterns very meticulously, in order to improve further and stay relevant.

We keep a track of user behaviour, analyse it extensively and use it to improve user experience.

We are interested to know what is the first button you click after creating a profile with us. We really want to know which all colors tempt you to click, which font size attracts you first on the page and so on.

We are so careful about User Experience that we created a tool in house. We not only get a lot of insights about our users, we made it open source for the community as well.

We have implemented a Javascript library (We call it NewMonk) to keep track of our users. Not only devices, we also keep a track of which all sections are clicked or visited the most.

You can learn more about implementation of New Monk at

naukri-engineering/NewMonk
NewMonk is a Real User Monitoting (RUM) and logging tool, with many additional user behaviors, like heatmap and error…github.com

Closing words

We strive to keep end users above everything and that is why we keep millions of users happy everyday!

Recruitment is altogether an independent and big market and extremely important for any organization to shine. We have a major role to play here as we are serving nearly 70% stake of this huge market share.

Better employees can sail the ship to shore, others will leave the ship in the ocean.

We are constantly improving our services and want to serve our users with all aspects of recruitment using the best of technology. We love open source and hence we are open to feedback too.

For any feedback, write to info@naukri.com, rohit.sharma1@naukri.com, gupta.neha@naukri.com.

To know more about our open source libraries, follow

naukri-engineering (Naukri Engineering)
naukri-engineering has 24 repositories available. Follow their code on GitHub.github.com

We tried our best swimming the ocean, but couldn’t really touch every wave. Hope you felt the waves we could cover in the limited space 🙂

With these closing words, We would like to wish you all a lot of luck for your careers ahead.

Please share the word with your aspiring friends and colleagues.

Yours beloved
Naukri.com

Crafted with ❤ in India.

#NaukriTeam

Naukri Engineering Team - Hackathon Glimpse. — Source

Posted in Agile, Automation, Automation Testing, Database, Frontend, Quality Assurance, SEO, Software Development, Web Technology

Tags: automation, Jenkins, jobseeker, Knowledge Sharing, mysql, naukri, naukri-engineering, Optimizing web pages, Productivity, Quality testing