Sebastian Deorowicz (@sdeorowicz) Twitter Tweets • TwiCopy

bioRxiv Bioinfo

2 years ago

Ultra-efficient, unified discovery from microbial sequencing with SPLASH and precise statistical assembly biorxiv.org/cgi/content/sh… #biorxiv_bioinfo

thumb_up_off_alt2

chat_bubble_outline0

repeat2

shareShare

Sebastian Deorowicz

@sdeorowicz

2 years ago

We've just published a new release of RECKONER, our tool for Illumina read correction. The paper also evaluates the impact of read correction in variant calling pipelines. nature.com/articles/s4159…

thumb_up_off_alt13

chat_bubble_outline0

repeat7

shareShare

Andrzej Zielezinski

@a_zielezinski

2 years ago

Exciting news! 🎉 Our research on ancient phages in the human gut by Piotr is now out in Nature Communications! 📚🔬 A big shoutout to @BEDutilh and Yasas Wijesekara for an amazing collaboration.

thumb_up_off_alt49

chat_bubble_outline7

repeat9

shareShare

First step in a community project to provide a uniformly assembled, annotated and searchable set of bacterial genomes, our preprint on our initial release of 1.9 million genome assemblies+taxonomic estimates. (figure compares with previous 661k dataset) biorxiv.org/content/10.110…

thumb_up_off_alt361

chat_bubble_outline9

repeat160

shareShare

Sebastian Deorowicz

@sdeorowicz

2 years ago

For the current (and future) users: AGC 3.1 (Assembled Genome Compressor) is ready for download: github.com/refresh-bio/agc Main updates: support for ARM-based CPUs, e.g., Mac M1/M2/...; some bug fixes; some new features; speed optimizations. Bioconda package should be ready soon.

thumb_up_off_alt40

chat_bubble_outline1

repeat15

shareShare

Sebastian Deorowicz

@sdeorowicz

a year ago

After a few years of development, Kmer-db v.2, our tool for finding similar sequences in large collections of genomic data (even millions of viral genomes), is ready. If interested, take a look at the GitHub repo and related paper. github.com/refresh-bio/km… biorxiv.org/content/10.110…

thumb_up_off_alt69

chat_bubble_outline1

repeat28

shareShare

Sebastian Deorowicz

@sdeorowicz

a year ago

Clustering large datasets can be challenging. Fortunately, even slow methods can sprint for sparse similarity matrices. Clusty offers s-, c-link, uclust, set-cover, cd-hit, leiden. The paper shows an application for 15M+ sequences. github.com/refresh-bio/cl… biorxiv.org/content/10.110…

thumb_up_off_alt18

chat_bubble_outline0

repeat9

shareShare

Andrzej Zielezinski

@a_zielezinski

a year ago

Excited to share Vclust! It's a fast and accurate tool for calculating intergenomic similarities (like ANI) and clustering virus/#phage genomes/contigs according to ICTV and MIUViG standards. 💻 Tool: github.com/refresh-bio/vc… 📄 Preprint: biorxiv.org/content/10.110… Thread! 1/6 ↓

thumb_up_off_alt105

chat_bubble_outline6

repeat56

shareShare

Andrzej Zielezinski

@a_zielezinski

a year ago

When writing bioinformatics tools, I often need unique IDs for things like temp directories. So, I created a Python package for generating fun & memorable IDs like "retired-nucleotide" or "funny-malware-7ab4" covering everything from sports to science. github.com/aziele/unique-…

thumb_up_off_alt9

chat_bubble_outline0

repeat2

shareShare

Sebastian Deorowicz

@sdeorowicz

a year ago

I am happy to announce that ProteStAr, our compressor of CIF/PDB files with 3D atom coordinates, is now published at Bioinformatics. With this, you can store the whole ESM Atlas or AlphaFold DB in a few files (rather than 200M+) with fast random access. doi.org/10.1093/bioinf…

thumb_up_off_alt40

chat_bubble_outline0

repeat13

shareShare

Heng Li

@lh3lh3

a year ago

Pangene now published in Bioinformatics: doi.org/10.1093/bioinf…. In addition to showcasing applications (see the 17q21.31 inversion below), we also reviewed the theoretical formulation of bidirected graphs and discussed the definition and the finding of "bubbles" in such graphs.

thumb_up_off_alt300

chat_bubble_outline1

repeat105

shareShare

Heng Li

@lh3lh3

a year ago

Preprint on "BWT construction and search at the terabase scale". We can compress 100 human genomes to 11GB in 21 hours, find SMEMs with it, do affine-gap alignment and retrieve similar local haplotypes. 7.3Tb commonly sequenced bacterial genomes ⇒ 30GB arxiv.org/abs/2409.00613

thumb_up_off_alt740

chat_bubble_outline9

repeat227

shareShare

Rong

@_rongl

a year ago

New paper online in Nature Biotechnology by Sebastian Deorowicz group and Salzman Lab: SPLASH2 speeds up analysis of sequence variation in massive datasets.

thumb_up_off_alt4

chat_bubble_outline1

repeat1

shareShare

Roozbeh Dehghannasiri

@roozbehdn

a year ago

Happy to share our latest paper with Marek Kokot on SPLASH2 for ultra-efficient reference-free discovery directly on raw sequencing reads out in Nature Biotechnology, supervised by Salzman Lab and Sebastian Deorowicz, and with great contributions from Tavor Baharav. nature.com/articles/s4158…

thumb_up_off_alt29

chat_bubble_outline7

repeat5

shareShare

Sebastian Deorowicz

@sdeorowicz

a year ago

AGC 3.2 (assembled genome compressor) has been released. Better speed, better ratio (at least for bacteria genomes), optional low-memory decompression. github.com/refresh-bio/agc

thumb_up_off_alt28

chat_bubble_outline2

repeat15

shareShare

Heng Li

@lh3lh3

a year ago

The latest hifiasm can directly assemble standard Oxford Nanopore simplex R10 reads, without HERRO correction or other preprocessing, to phased contigs of contiguity comparable to HiFi assembly. Like before, you can further add ultra-long, Hi-C or trio data for better assembly.

thumb_up_off_alt184

chat_bubble_outline2

repeat58

shareShare

Sebastian Deorowicz

@sdeorowicz

a year ago

Recently, our SPLASH paper (nature.com/articles/s4158…) was published in NatBiotech. Now, we release its extended version, sc-SPLASH (biorxiv.org/content/10.110…), which allows reference-free analysis of single-cell data. It was a great experience to work with our collaborators on that!

thumb_up_off_alt17

chat_bubble_outline0

repeat2

shareShare

Sebastian Deorowicz

@sdeorowicz

7 months ago

Vclust (the ultra-fast, high-accuracy tool for viral genome comparison & clustering) is now published: nature.com/articles/s4159… Great collaboration with Andrzej Zielezinski, Adam Gudyś, UAM guys, and Bas E.Dutilh

thumb_up_off_alt34

chat_bubble_outline0

repeat12

shareShare

Nature Methods

@naturemethods

7 months ago

Vclust generates fast and accurate estimation of average nucleotide identity (ANI) for viral genomes, scaling clustering to millions of genomes. Andrzej Zielezinski Adam Gudyś Sebastian Deorowicz Piotr UAM Poznań Politechnika Śląska Universität Jena nature.com/articles/s4159…

thumb_up_off_alt22

chat_bubble_outline0

repeat7

shareShare

Sebastian Deorowicz

@sdeorowicz

5 months ago

Interested in a tool that aligns millions of proteins in minutes with quality similar to or better than the state-of-the-art utilities? Please take a look at our FAMSA2 paper: biorxiv.org/content/10.110… and GH repo: github.com/refresh-bio/FA…

thumb_up_off_alt85

chat_bubble_outline1

repeat30

shareShare

Sebastian Deorowicz

bioRxiv Bioinfo

Sebastian Deorowicz

Andrzej Zielezinski

Zamin Iqbal

Sebastian Deorowicz

Sebastian Deorowicz

Sebastian Deorowicz

Andrzej Zielezinski

Andrzej Zielezinski

Sebastian Deorowicz

Heng Li

Heng Li

Rong

Roozbeh Dehghannasiri

Sebastian Deorowicz

Heng Li

Sebastian Deorowicz

Sebastian Deorowicz

Nature Methods

Sebastian Deorowicz