We study the problem of Protein Remote Homology Detection, which assesses the functional similarity of two proteins. We approach this as a problem of binary multiple-instance learning (MIL) that aims to distinguish between homologous and non-homologous proteins. The particular MIL approach employed is based on the dissimilarity representation in which various schemes of combining N-gram representations are considered. This approach allows us to cope with longer N-grams, capturing a richer biological context, and results in versatile framework offering competitive performance compared to state of the art.
- Dissimilarity representation
- Multiple-instance learning
- Protein Remote Homology Detection