This post is a tutorial on how to use the distant segments kernel. String kernels were recently introduced as a more precise way to perform pairwise comparisons. First, I will define the concept of kernels. A kernel is a function that takes two objects in an input space and multiply them by mapping them to a vectorial feature space. A kernel associates a real number to a pair of instances. String kernels are a particular case of kernels. The input space of string kernels is the set of strings. Recall that strings are sequences generated with a given alphabet. The distant segments kernel is a string kernel. For the distant segments kernel, the feature vector associated to a string is the distribution of its distant segments. See this paper for more details. In this tutorial, command lines are shown in red. First, download the source code. seb@ubuntu:~$ wget http://boisvert.info/software/PermutationDSKernel.cpp This software performs the kernel matrix computation of a set of stri
Showing posts from December, 2008
- Other Apps
I started my master's degree in September. I also have a new web site . If everything goes as planned,I should start a PhD with professor Jacques Corbeil and professor Mario Marchand at the Université Laval by next year. So far, in my two first internships I focused on microarray for gene expression ( Ubeda et al. , Rochette et al. ). Last year, in my third internship, I also had the chance to develop a new method for the prediction of HIV coreceptor usage with Mario Marchand , François Laviolette and Jacques Corbeil . Our manuscript was recently accepted for publication in the journal Retrovirology ( to appear ). Since existing methods for this particular task all need each V3 loop (a component of HIV) to be aligned, we saw a good opportunity to apply string kernels. Recall that a kernel is a similary measure and that it maps objects to a feature space to perform a dot product. Recall also that string kernels are the family of kernels whose input space is the set of strings.
- Other Apps
Jean-Philippe Vert is giving two talks in Québec this week. The first one will be on his contributions of machine learning in the field of bioinformatics. His second talk will be about biological networks. Talk 1 Some contributions of machine learning in bioinformatics December third, 2008, 10h30, room PLT-2744 (Pavillon Andrien-Pouliot) see http://www.ulaval.ca/Al/interne/plan/AdrienPouliot/reference.htm Many problems in bioinformatics can be formulated as pattern recognition problems on non-standard objects, such as strings, graphs or high-dimensional vectors with particular structure. They have triggered many original developments in machine learning recently, in particular in the way data are represented and prior knowledge is introduced in the algorithm. In this talk I will present some of these developments through several examples in microarray data analysis, virtual screening, and inference of biological networks. Talk 2 Inferring and using biological networks Decemb