Project information
Learned Indexing for Similarity Searching
- Project Identification
- GF23-07040K
- Project Period
- 7/2023 - 6/2026
- Investor / Pogramme / Project type
Czech Science Foundation
- LA Grants
- Lead Agency
- MU Faculty or unit
- Faculty of Informatics
- Cooperating Organization
University of Kiel
- Responsible person prof. Dr. Peer Kröger
When faced with the task of storing and retrieving complex, unstructured or high-dimensional data (e.g., multimedia data), metric spaces are often employed as an underlying mathematical concept for their organization. Consequently, the only measure that can be used to arrange the data is a pairwise similarity between data objects. Similarity searching refers to a range of methods used to manage data enabling efficient search in such spaces. The main paradigm of similarity searching has remained mostly unchanged for decades -- data objects are organized into a hierarchical structure according to their mutual distances, using representative pivots to reduce the number of distance computations needed to efficiently search the data.
We plan to investigate an alternative to this paradigm, using machine learning models to replace pivots, thus, posing similarity search as a classification problem. We will use both supervised and unsupervised approaches to implement our solutions. We will also address the questions of scalability and dynamicity, and verify the applications for metric data.
Sustainable Development Goals
Masaryk University is committed to the UN Sustainable Development Goals, which aim to improve the conditions and quality of life on our planet by 2030.
Total number of publications: 17
Advancing the PAM Algorithm to Semi-Supervised k-Medoids Clustering
17th International Conference on Similarity Search and Applications (SISAP), year: 2025
Scaling Learned Metric Index to 100M Datasets
3D-af-Surfer: Protein structural embeddings of AlphaFold DB
Year: 2024, type: Specialized database
3DZD: Protein structural embeddings of AlphaFold DB v4
Year: 2024, type: Specialized database
3DZD: Protein structural embeddings of ESM Atlas
Year: 2024, type: Specialized database
Year: 2024
AlphaFind — discover structure similarity across the proteome in AlphaFold DB
Year: 2024, type: Conference abstract
AlphaFind — discover structure similarity across the proteome in AlphaFold DB
Year: 2024, type: Conference abstract
AlphaFind: Discover structure similarity across the entire known proteome
Year: 2024
AlphaFind: Discover structure similarity across the entire known proteome – data and model
Year: 2024, type: Specialized database