If you have been in the IT for a few years now, you would have definitely come across the phrase ‘Big Data’ spread all over the place like wildfire. Just to see what the hoopla was all about, you Wiki-ed it, read about the things that you could do with Hadoop, or some other benefits. The things you read made you helpless as you got sucked into a world of Big Datas, Hadoops, Hives, Pigs and HDFSs. Yes, Big Data is, and can be that powerful.
If you are a noob in the IT sector, and your boss just told you about a new project that he bagged and that would require you to engage in the Big Data circus. You must have started seeing why “Big Data is a must” in countless blogs and tech forums and felt forced to stand in awe at the things you learned.
Alas! This article is not one of the many articles that patronize Big Data as if it is the next best thing after sliced bread. It actually aims to urge you to stop, step back and think whether Big Data is truly required for your company.
But my Data is “BIG”!
How big is it? 100s of MBs that your classic Spreadsheet Software is unable to load? Use any of the data analysis tools to straighten out the same data and put it into your database without breaking a sweat.
Or does it weigh tens of GBs? You can still use the same data analysis tool mentioned above to load this data onto your database.In the worse case, you might be expected to manually load the data by running a few simple queries – still not a problem.
Or are you saying that you have data nearing a 100 GB? Convert a PC into a server for this purpose.
Or is your data between 1 & 5 TBs? Get an external hard drive of 4 TB and stick it to your server and use PostgreSQL to see the magic.
But my data is > 5 TBs, you wail? Now you have no other option apart from Hadoop. Using two 4 TB external hard drives makes your lives unnecessarily complicated.
Why are you so biased about Big Data, anyway? What are your issues with it, you ask?
We are definitely not against Big Data, per-se. In fact, we have our own team that handles Big Data. We just believe in enlightening our clients as well as prospects about the pros and cons of Big Data Analysis, before you aimlessly roam in the Big Data wagon. Here are a few things to consider before you take up on this one:
1. To structure or to UN-structure is the question.
Though Hadoop is capable of handling all kinds of data, it works best with semi-structured or unstructured data. So if you have a database filled with oodles of structured data that is even indexed proficiently, do you really want to opt for Big-Data just because everybody else is?
2. Data, data everywhere.
Yes. We do see too much data that is being generated every second (don’t even mind the actual number). But what you must wonder is whether you are turning it into information? Are you gaining any knowledge at all from that data? If you go ahead and decide to accept Hadoop as your solution, will you be able to use the data to the fullest? Or will you be left with an even bigger problem?
3. Distribute your data? NO!
Hadoop comes with its own file system – the HDFS (Hadoop Distributed File System). You read it right. It assumes that your data can be divided and automatically distributes your data to many nodes. And if your data is supposed to be in a single place, don’t you feel that the whole point of Hadoop just turned pointless?
4. SQL is actually more efficient than Hadoop. Wait, what?!
Sounds hard to believe after placing Hadoop on a high pedestal? Well, here’s the truth. If your database is indexed, SQL is a fast query language while Hadoop does not have any concept of indexing at all.
5. Manages the big guns well, but is equally bad at handling small files.
HDFS is immaculate when it comes to handling big files. But when asked to do random reads over multiple small files, it starts to turn inefficient. True story.
6. Security? Nope.
Security in Hadoop? Well, Hadoop does offer a model of security, but it is disabled by default as it increases the complexity. That should be the deal breaker until and unless there is no other way for you, but to stick with Hadoop.
So basically, we just gave you a few reasons to think twice before you dive into the depth of Big Data and drown yourself. Lastly, we definitely do not mean to offend any Big Data fans out there. Like we mentioned before, we even have our own team working on Big Data (and quite efficiently too, if we may say so ourselves).
So confused that you did not know what just happened? Click here to learn more from our team itself.