<div dir="ltr"><div><div><div><div><div><br></div>Hi Everyone,<br></div>Yihong will continue his presentation from two weeks ago. (See email below.)<br></div>See you at 11 in the Stat's classroom.<br></div>Regards,<br></div>sekhar<br><div><div><div><div><div><br><br><div class="gmail_quote">---------- Forwarded message ----------<br>From: <b class="gmail_sendername">Sekhar Tatikonda</b> <span dir="ltr"><<a href="mailto:sekhar.tatikonda@yale.edu">sekhar.tatikonda@yale.edu</a>></span><br>Date: Wed, Apr 4, 2018 at 6:43 PM<br>Subject: YPNG Friday 6 April, 11:00--1:00<br>To: <a href="mailto:ypng@mailman.yale.edu">ypng@mailman.yale.edu</a><br><br><br><div dir="ltr"><div><div><br><br>Hi Folks,<br>This Friday Yihong will talk about:<br><br>Title: Recovering a hidden Hamiltonian cycle via linear programming<br><br>Abstract: One of the most pressing challenges in genomics is to reconstruct a long and contiguous DNA sequence from short DNA subsequences (contigs). Enabled by very recent developments in sequencing technology one can now obtain linkage information that is statistically correlated with the true contig ordering, so that many links are observed between neighboring pairs of contigs, while relatively few links are observed for non-neighboring pairs.<br><br>Representing contigs as vertices and observed numbers of links between contigs as edge weights, the problem of contig assembly reduces to a Travelling Salesman Problem (TSP) in a weighted complete graph of size $n$ with a hidden Hamiltonian cycle corresponding to the true ordering. We assume a simple statistical model where the edge weights on the hidden Hamiltonian cycle are drawn independently from a distribution $P_n$, while the remaining edge weights are drawn independently from $Q_n$. Despite the worst-case intractability of solving the TSP, we show that a simple linear programming (LP) relaxation, namely the fractional $2$-factor (F2F) LP, recovers the hidden Hamiltonian cycle with probability tending to one as $n \to \infty$ provided that $alpha_n - log n \to \infty$, where $\alpha_n \triangleq -2 \log \int \sqrt{dP_n dQ_n}$. This condition is information-theoretically optimal in the sense that, under mild distributional assumptions, $\alpha_n \geq (1+o(1) \log n$ is necessary for any algorithm to succeed regardless of the computational cost.<br><br>Departing from the usual proof techniques based on dual witness construction, the analysis relies on the combinatorial characterization (in particular, the half-integrality) of vertices of the F2F polytope and the representation of extremal solutions as bicolored multi-graphs, which can be further decomposed into simpler ``blossom-type'' structures whose statistical deviation can be controlled.<br><br>This is joint work with Vivek Bagaria (Stanford), Jian Ding (Penn), David Tse (Stanford), and Jiaming Xu (Purdue).<br>
<div><br></div>See you at 11 in the Stat's classroom.<br>
<br><br></div><div>Regards,<br></div><div>sekhar<br></div><div><br><br><br><br></div></div></div>
</div><br></div></div></div></div></div></div>