The era of high-throughput sequencing, when large amounts of DNA and RNA sequence data are generated at increasingly lower costs, presents interesting algorithmic problems that have connections to multiple fields. In this talk, we will present one such problem. Humans have 23 pairs of homologous chromosomes, which are identical except on certain positions called single nucleotide polymorphisms (SNPs). A haplotype of an individual is the pair of sequences of SNPs on the two homologous chromosomes. Knowing the haplotypes of individuals can lead to a better understanding of the interplay of genetic variation and disease as well as better inference of human demographic history. In this talk, we discuss the problem of inferring haplotypes from high-throughput sequencing data in the form of short fragments called reads. We give a simple formula for the number of reads needed to accurately reassemble a haplotype. The analysis leverages connections between this problem and decoding convolutional codes, a well-studied problem in communication theory. Finally, we will discuss an interesting connection with the problem of community detection, where communities have to be inferred based on the friendship graph of users.
David Tse, professor of electrical engineering at Stanford University, received his B.A.Sc in systems design engineering from the University of Waterloo and his M.S. and Ph.D in electrical engineering from MIT. He is coauthor, with Pramod Viswanath, of the text “Fundamentals of Wireless Communication.” Tse is also the inventor of the proportional-fair scheduling algorithm used in all third and fourth-generation cellular systems. He was a postdoctoral member of technical staff at A.T. & T. Bell Laboratories 1994-1995 and was on the faculty of the Department of Electrical Engineering and Computer Sciences at UC Berkeley 1995-2014. He has received an NSERC graduate fellowship from the government of Canada, an NSF CAREER award, the Erlang Prize, numerous best paper awards and several teaching awards. His research interests are in information theory and its applications in various fields, including wireless communication, energy and computational biology.