Showing revision 1

2016-03-08 Shit Data

Lately there are only negative posts on this blog. Maybe there is a way to fix that?

Look, I've found something positive!

Just kidding. Dumb people are attacking again.

Often you can see people screaming out terms like “big data”. It is usually accompanied with enormous amounts of saliva coming out of their mouths. Like they believe that marketing terms like “big data” are good enough reasons to become brain-dead epileptics (and like that is going to help someone).

Interestingly, it is also associated with another problem. Programmers tend to think that problems they solve are very difficult, when in reality most of these tasks are dead simple. We are just not smart enough yet to come up with an elegant solution.

Anyway, see this amazing article about command line tools being 235 times faster than hadoop cluster.

What's the lesson to learn here? A few gigabytes is NOT big data. It is not big data if it fits into your RAM or if you can store it on your HDD. Trying to use “big data” tools to process such data is just plain stupid.

And yes, most of these times you don't even need a database management system. Plain files will do the trick.

Be conscious and learn to use command line tools.