[YPNG] YPNG Friday 20 April, 11:00--1:00
Sekhar Tatikonda
sekhar.tatikonda at yale.edu
Wed Apr 18 00:16:48 EDT 2018
Hi Everyone,
Yihong will continue his presentation from two weeks ago. (See email
below.)
See you at 11 in the Stat's classroom.
Regards,
sekhar
---------- Forwarded message ----------
From: Sekhar Tatikonda <sekhar.tatikonda at yale.edu>
Date: Wed, Apr 4, 2018 at 6:43 PM
Subject: YPNG Friday 6 April, 11:00--1:00
To: ypng at mailman.yale.edu
Hi Folks,
This Friday Yihong will talk about:
Title: Recovering a hidden Hamiltonian cycle via linear programming
Abstract: One of the most pressing challenges in genomics is to reconstruct
a long and contiguous DNA sequence from short DNA subsequences (contigs).
Enabled by very recent developments in sequencing technology one can now
obtain linkage information that is statistically correlated with the true
contig ordering, so that many links are observed between neighboring pairs
of contigs, while relatively few links are observed for non-neighboring
pairs.
Representing contigs as vertices and observed numbers of links between
contigs as edge weights, the problem of contig assembly reduces to a
Travelling Salesman Problem (TSP) in a weighted complete graph of size $n$
with a hidden Hamiltonian cycle corresponding to the true ordering. We
assume a simple statistical model where the edge weights on the hidden
Hamiltonian cycle are drawn independently from a distribution $P_n$, while
the remaining edge weights are drawn independently from $Q_n$. Despite the
worst-case intractability of solving the TSP, we show that a simple linear
programming (LP) relaxation, namely the fractional $2$-factor (F2F) LP,
recovers the hidden Hamiltonian cycle with probability tending to one as $n
\to \infty$ provided that $alpha_n - log n \to \infty$, where $\alpha_n
\triangleq -2 \log \int \sqrt{dP_n dQ_n}$. This condition is
information-theoretically optimal in the sense that, under mild
distributional assumptions, $\alpha_n \geq (1+o(1) \log n$ is necessary for
any algorithm to succeed regardless of the computational cost.
Departing from the usual proof techniques based on dual witness
construction, the analysis relies on the combinatorial characterization (in
particular, the half-integrality) of vertices of the F2F polytope and the
representation of extremal solutions as bicolored multi-graphs, which can
be further decomposed into simpler ``blossom-type'' structures whose
statistical deviation can be controlled.
This is joint work with Vivek Bagaria (Stanford), Jian Ding (Penn), David
Tse (Stanford), and Jiaming Xu (Purdue).
See you at 11 in the Stat's classroom.
Regards,
sekhar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.yale.edu/pipermail/ypng/attachments/20180418/af8d4e35/attachment.html>
More information about the YPNG
mailing list