User Data Logging Using Mongo

Torture the data, and it will confess to anything………

Why bother capturing data ?

Data has become the most valuable commodity to every Internet company out there and therefore collection of data turns up to be a very essential task. Such was the need encountered by Naukrigulf team. The product team asked us to come up with the solution to log the modified and view timestamp for every jobseeker fields like username, salary, photo, work experience, etc. So, we needed to capture when a particular jobseeker viewed a particular field and which ones he edited. The very sole purpose of logging this kind of data was to recognize the jobseeker activities and patterns in their browsing and modification of their fields. The information generated from such data could further be used for better understanding of the end user and help us to provide him with better user experience on our site.

SO, our biggest challenge was to collect such huge data and process it to provide some real time usability. The first database that naturally comes to mind is MySQL. It has been with us for a very long time and been quite trustworthy and we got to use it. But MySQL, inspite being the senior citizen of databases, had to pass some tests to prove itself worthy of the force.

Mysql v/s Mongo

Before preparing to make tests, we needed to know what are our parameters for test. For logging such data at fast pace, we needed database to be quick in writing to the disk (heavy IO load) and able to handle large queries at any time with very limited resources(We certainly don’t want our cool servers screaming for resources). So after fairly calculating the number of users, we estimated the number of rows.

For the test, machine with the same configuration was used :-

RAM — 8Gb

Processor type — Intel(R) Xeon(R) CPU X5670 @ 2.93GHz

Number of processors – 4

MySQL Tests :-

For a table with 1 crore rows to write, following were the MySQL stats :

Running the test with following options:

Number of threads: 100

Queries performed:-

Total read/write requests: 3166711 (17.572MB per sec.)

Test execution summary:

total time: 180.2129s

total number of events: 166669

per-request statistics:

min: 2.06ms

avg: 108.07ms

max: 3074.01ms

The total time taken was good but not good enough for fast logging which requires excessive insert and updates on the database i.e, we need better disk write time and MySQL was acting good but not impressing us.

Naturally we had to look to the database market. The basic difference between Mongo and MySQL is data model changed from relational to document based. It could be pretty fast since it keeps all its indexes in the RAM. Therfore, we decided to see how Mongo performs on our same test of writing to the disk. We ran mongoperf, a tool to determine mongo performance. We wrote to the file of size 1000Mb and 100 threads and recorded the writes per second.

Mongo Tests :-

Device: rMB/s(Reads per sec) wMB/s(Writes per sec)

sda 0.07 279.14

Conflicts with Mongo

The results were a lot better. But still the problem with Mongo was the RAM i.e. it would suck up all memory with increasing amount of data because of the indexes. In order to check this, we came up with another plan and it was to create a dummy data along with required indexes and check for the amount of disk space and RAM it takes.

For a test data containing 1 crore documents, we recorded the following statistics :-

db.logData.stats()
{
“ns” : “testDb.logData”,
“count” : 10000000,
“size” : 30560000000,
“avgObjSize” : 3056,
“storageSize” : 31556644688,
“totalIndexSize” : 576040080,
“indexSizes” : {
“_id_” : 324456384,
“resid_1” : 251583696
},
“ok” : 1
}

RAM space required = (324456384+251583696)/1024/1024/1024
= 0.5364791303873062 Gb

Total Disk Space = 29.389415577 Gb

So we were now sure that Mongo won’t kill other processes in RAM and these stats seized the day for Mongo.

Final Call

Finally, we decided to go with Mongo for storing the logging data on our servers. This kind of benchmarking approach do helped us in clear selection of the technology and we look forward to dig deeper with this kind of methodology in future.

“In God we trust. All others must bring data.” — W. Edwards Deming

Posted in Database

Tags: mongodb, mysql

Presentations

Archives