“So, how much experience do you have with Big Data and Hadoop?” they asked me. I told them that I use Hadoop all the time, but rarely for jobs larger than a few TB. I’m basically a big data neophite - I know the concepts, I’ve written code, but never at scale.
The next question they asked me. “Could you use Hadoop to do a simple group by and sum?” Of course I could, and I just told them I needed to see an example of the file format.
They handed me a flash drive with all 600MB of their data on it (not a sample, everything). For reasons I can’t understand, they were unhappy when my solution involved pandas.read_csv rather than Hadoop.
Hadoop is limiting. Hadoop allows you to run one general computation ...