Showing posts from 2008

The Distant Segments Kernel: a tutorial

This post is a tutorial on how to use the distant segments kernel. String kernels were recently introduced as a more precise way to perform pairwise comparisons. First, I will define the concept of kernels. A kernel is a function that takes two objects in an input space and multiply them by mapping them to a vectorial feature space. A kernel associates a real number to a pair of instances. String kernels are a particular case of kernels. The input space of string kernels is the set of strings. Recall that strings are sequences generated with a given alphabet. The distant segments kernel is a string kernel. For the distant segments kernel, the feature vector associated to a string is the distribution of its distant segments. See this paper for more details. In this tutorial, command lines are shown in red. First, download the source code. seb@ubuntu:~$ wget This software performs the kernel matrix computation of a set of stri

My research activities

I started my master's degree in September. I also have a new web site . If everything goes as planned,I should start a PhD with professor Jacques Corbeil and professor Mario Marchand at the Université Laval by next year. So far, in my two first internships I focused on microarray for gene expression ( Ubeda et al. , Rochette et al. ). Last year, in my third internship, I also had the chance to develop a new method for the prediction of HIV coreceptor usage with Mario Marchand , François Laviolette and Jacques Corbeil . Our manuscript was recently accepted for publication in the journal Retrovirology ( to appear ). Since existing methods for this particular task all need each V3 loop (a component of HIV) to be aligned, we saw a good opportunity to apply string kernels. Recall that a kernel is a similary measure and that it maps objects to a feature space to perform a dot product. Recall also that string kernels are the family of kernels whose input space is the set of strings.

Jean-Philippe Vert

Jean-Philippe Vert is giving two talks in Québec this week. The first one will be on his contributions of machine learning in the field of bioinformatics. His second talk will be about biological networks. Talk 1 Some contributions of machine learning in bioinformatics December third, 2008, 10h30, room PLT-2744 (Pavillon Andrien-Pouliot) see Many problems in bioinformatics can be formulated as pattern recognition problems on non-standard objects, such as strings, graphs or high-dimensional vectors with particular structure. They have triggered many original developments in machine learning recently, in particular in the way data are represented and prior knowledge is introduced in the algorithm. In this talk I will present some of these developments through several examples in microarray data analysis, virtual screening, and inference of biological networks. Talk 2 Inferring and using biological networks Decemb

First post

Hello to those reading this. I updated my web site earlier today. Recently I was amazed that my old blog is still available through some sort of creepy site whose purpose is to store forever blogs. This old blog holds posts about Linux, my internships, my courses and thoughts. My friend Olivier once told me that he did read my blog in a recurrent manner. As I found that my old posts are still on Internet and as I enjoy writing about random things, I decided to start another blog, this time under the dskernel . dskernel stands for Distant Segments Kernel, that is my current research topic with collaborators at Université Laval. We aim to publish this work (soon enough) in the Bioinformatics journal. Talking about Research, I'll be starting off a master degree at Université Laval (see my web site ). I'll be working on bioinformatics as it is the thing I enjoy the most in research. Of course, I will also be doing some nasty biology, such as genomics, transcriptomics and othe