Locally Consistent Decomposition of Strings with Applications to Edit Distance Sketching

Edit distance is a similarity measure for strings that counts how many characters have to be deleted, inserted or substituted in one string to get another one. It has many applications from comparing DNA sequences to text processing. We are still in search for its most efficient algorithms and its exact computational complexity is still unknown. In this talk I will focus on sketching edit distance which could provide an ultimate algorithm for edit distance in the context of large databases of strings.

 

The goal is to compute for each string a short sketch such that from sketches of two strings we can estimate their edit distance. This has connection to embedding of metric spaces.

 

In this talk I will provide an overview of what is known about sketching edit distance, and I will present a new locally consistent decomposition of strings which has applications for sketching of edit distance and approximate pattern matching.

 

This is based on recent results with Sudatta Bhattacharya and Mike Saks.

Date

Speakers

Michal Koucký

Affiliation

Charles University