Indices are Important
15 Oct 2018
- #data science
I recently joined the biggest data project I’ve ever been a part of at work. It’s not necessarily huge by modern big data standards, but the main element I’m working on has over 130 million records, and takes up more than 5 Gigabytes on HDFS. The main dataset for the project is over 160 GB. This post is an attempt to document and motivate the practice of conscientiously indexing your datasets to improve preprocessing performance.
Coffee Log 26
13 Oct 2018
Welcome back to the Coffee Log. I’ve been on hiatus since nothing noteworthy was going on with the remainder of my light roast. I’ve now returned. My plan was to pick up a bag of the T Joe’s dark roast and welcome you back with a “spooky Halloween” edition (since it’s mid October and you know, “dark” roast?). However, those plans changed at the last minute, and now I’ve returned to you with the Whimsical Autumn Edition of the Coffee Log.
C Linked List Follow-up
08 Oct 2018
About a week ago I walked through my solution to comparing two linked lists in C. At the end, I hinted I might write some benchmarks to compare the two approaches discussed. Well, here they are.
Simple BST Validation
02 Oct 2018
Today, my HackerRank problem was a bit of a blast from the past. isBST (i.e., write a function which determines whether a tree is a binary search tree or not) has come up on more than one interview in the past.
Coffee Log 25
02 Oct 2018
Welcome to the casual late morning/ early afternoon edition of the Coffee Log. I had a wild time at work last night preparing a machine learning demo for my company’s booth at The Atlantic Festival. As a result, I took the morning off and had a nice slow start to the day.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16