Sat, May 9, 2020
Data structures for massive datasets
The algorithms that we use every day to manipulate data assume that we have access to all
the data we need. What if there's more data than can fit on a single computer, or if accessing
the data itself to do searches is expensive? If so, we can use specialized data structures
that can help us "estimate" the actual value without actually computing it. In some cases,
an estimate might be good enough. These data structures include the count-min sketch, bloom filters,
and reservoir sampling.
Thu, Mar 5, 2020
Bayesian Networks
A Bayesian network is a directed graph in which each node is annotated with quantitative probability
information. This article covers the definition of a Bayesian network with a graphical representation,
the determination of independence between variables, and the problem of finding the probability
distribution of a set of query values given some observed events.