Sebastian Deorowicz (@sdeorowicz) 's Twitter Profile
Sebastian Deorowicz

@sdeorowicz

Data compression. Algorithms for genome sequencing compresion and analysis.

ID: 1167138749710557184

linkhttps://refresh-bio.github.io/ calendar_today29-08-2019 18:15:59

125 Tweet

360 Followers

31 Following

bioRxiv Bioinfo (@biorxiv_bioinfo) 's Twitter Profile Photo

Ultra-efficient, unified discovery from microbial sequencing with SPLASH and precise statistical assembly biorxiv.org/cgi/content/sh… #biorxiv_bioinfo

Sebastian Deorowicz (@sdeorowicz) 's Twitter Profile Photo

We've just published a new release of RECKONER, our tool for Illumina read correction. The paper also evaluates the impact of read correction in variant calling pipelines. nature.com/articles/s4159…

Andrzej Zielezinski (@a_zielezinski) 's Twitter Profile Photo

Exciting news! 🎉 Our research on ancient phages in the human gut by Piotr is now out in Nature Communications! 📚🔬 A big shoutout to @BEDutilh and Yasas Wijesekara for an amazing collaboration.

Zamin Iqbal (@zaminiqbal) 's Twitter Profile Photo

First step in a community project to provide a uniformly assembled, annotated and searchable set of bacterial genomes, our preprint on our initial release of 1.9 million genome assemblies+taxonomic estimates. (figure compares with previous 661k dataset) biorxiv.org/content/10.110…

First step in a community project to provide a uniformly assembled, annotated and searchable set of bacterial genomes, our preprint on our initial release of 1.9 million genome assemblies+taxonomic estimates. (figure compares with previous 661k dataset)
biorxiv.org/content/10.110…
Sebastian Deorowicz (@sdeorowicz) 's Twitter Profile Photo

For the current (and future) users: AGC 3.1 (Assembled Genome Compressor) is ready for download: github.com/refresh-bio/agc Main updates: support for ARM-based CPUs, e.g., Mac M1/M2/...; some bug fixes; some new features; speed optimizations. Bioconda package should be ready soon.

Sebastian Deorowicz (@sdeorowicz) 's Twitter Profile Photo

After a few years of development, Kmer-db v.2, our tool for finding similar sequences in large collections of genomic data (even millions of viral genomes), is ready. If interested, take a look at the GitHub repo and related paper. github.com/refresh-bio/km… biorxiv.org/content/10.110…

Sebastian Deorowicz (@sdeorowicz) 's Twitter Profile Photo

Clustering large datasets can be challenging. Fortunately, even slow methods can sprint for sparse similarity matrices. Clusty offers s-, c-link, uclust, set-cover, cd-hit, leiden. The paper shows an application for 15M+ sequences. github.com/refresh-bio/cl… biorxiv.org/content/10.110…

Andrzej Zielezinski (@a_zielezinski) 's Twitter Profile Photo

Excited to share Vclust! It's a fast and accurate tool for calculating intergenomic similarities (like ANI) and clustering virus/#phage genomes/contigs according to ICTV and MIUViG standards. 💻 Tool: github.com/refresh-bio/vc… 📄 Preprint: biorxiv.org/content/10.110… Thread! 1/6 ↓

Excited to share Vclust! It's a fast and accurate tool for calculating intergenomic similarities (like ANI) and clustering virus/#phage genomes/contigs according to ICTV and MIUViG standards.

💻 Tool: github.com/refresh-bio/vc…
📄 Preprint: biorxiv.org/content/10.110… 

Thread! 1/6 ↓
Andrzej Zielezinski (@a_zielezinski) 's Twitter Profile Photo

When writing bioinformatics tools, I often need unique IDs for things like temp directories. So, I created a Python package for generating fun & memorable IDs like "retired-nucleotide" or "funny-malware-7ab4" covering everything from sports to science. github.com/aziele/unique-…

Sebastian Deorowicz (@sdeorowicz) 's Twitter Profile Photo

I am happy to announce that ProteStAr, our compressor of CIF/PDB files with 3D atom coordinates, is now published at Bioinformatics. With this, you can store the whole ESM Atlas or AlphaFold DB in a few files (rather than 200M+) with fast random access. doi.org/10.1093/bioinf…

Heng Li (@lh3lh3) 's Twitter Profile Photo

Pangene now published in Bioinformatics: doi.org/10.1093/bioinf…. In addition to showcasing applications (see the 17q21.31 inversion below), we also reviewed the theoretical formulation of bidirected graphs and discussed the definition and the finding of "bubbles" in such graphs.

Pangene now published in Bioinformatics: doi.org/10.1093/bioinf…. In addition to showcasing applications (see the 17q21.31 inversion below), we also reviewed the theoretical formulation of bidirected graphs and discussed the definition and the finding of "bubbles" in such graphs.
Heng Li (@lh3lh3) 's Twitter Profile Photo

Preprint on "BWT construction and search at the terabase scale". We can compress 100 human genomes to 11GB in 21 hours, find SMEMs with it, do affine-gap alignment and retrieve similar local haplotypes. 7.3Tb commonly sequenced bacterial genomes ⇒ 30GB arxiv.org/abs/2409.00613

Preprint on "BWT construction and search at the terabase scale". We can compress 100 human genomes to 11GB in 21 hours, find SMEMs with it, do affine-gap alignment and retrieve similar local haplotypes. 7.3Tb commonly sequenced bacterial genomes ⇒ 30GB arxiv.org/abs/2409.00613
Roozbeh Dehghannasiri (@roozbehdn) 's Twitter Profile Photo

Happy to share our latest paper with Marek Kokot on SPLASH2 for ultra-efficient reference-free discovery directly on raw sequencing reads out in Nature Biotechnology, supervised by Salzman Lab and Sebastian Deorowicz, and with great contributions from Tavor Baharav. nature.com/articles/s4158…

Sebastian Deorowicz (@sdeorowicz) 's Twitter Profile Photo

AGC 3.2 (assembled genome compressor) has been released. Better speed, better ratio (at least for bacteria genomes), optional low-memory decompression. github.com/refresh-bio/agc

Heng Li (@lh3lh3) 's Twitter Profile Photo

The latest hifiasm can directly assemble standard Oxford Nanopore simplex R10 reads, without HERRO correction or other preprocessing, to phased contigs of contiguity comparable to HiFi assembly. Like before, you can further add ultra-long, Hi-C or trio data for better assembly.

Sebastian Deorowicz (@sdeorowicz) 's Twitter Profile Photo

Recently, our SPLASH paper (nature.com/articles/s4158…) was published in NatBiotech. Now, we release its extended version, sc-SPLASH (biorxiv.org/content/10.110…), which allows reference-free analysis of single-cell data. It was a great experience to work with our collaborators on that!

Sebastian Deorowicz (@sdeorowicz) 's Twitter Profile Photo

Vclust (the ultra-fast, high-accuracy tool for viral genome comparison & clustering) is now published: nature.com/articles/s4159… Great collaboration with Andrzej Zielezinski, Adam Gudyś, UAM guys, and Bas E.Dutilh

Sebastian Deorowicz (@sdeorowicz) 's Twitter Profile Photo

Interested in a tool that aligns millions of proteins in minutes with quality similar to or better than the state-of-the-art utilities? Please take a look at our FAMSA2 paper: biorxiv.org/content/10.110… and GH repo: github.com/refresh-bio/FA…