Big Data
There's a total of 2 notes.
Sat, May 9, 2020
Data structures for massive datasets
The algorithms that we use every day to manipulate data assume that we have access to all
the data we need. What if there's more data than can fit on a single computer, or if accessing
the data itself to do searches is expensive? If so, we can use specialized data structures
that can help us "estimate" the actual value without actually computing it. In some cases,
an estimate might be good enough. These data structures include the count-min sketch, bloom filters,
and reservoir sampling.
Sat, Feb 29, 2020
Memtable & SSTable (Sorted String Table)
The pattern of batching data in memory, tracking it in a write-ahead log, and periodically flushing it to disk is ubiquitous today. Open-source examples include LevelDB, Cassandra, InfluxDB, and HBase. In this article, I implement a tiny memtable for a time-series database in Go and briefly talk about how it can be compressed into a sorted string table.