2012-12-27

The source code of SOAPdenovo2 sits in the shadows

Update 2012-12-28: SOAPdenovo 2.04-r223 source code was posted online on 2012-12-28. Some minor concerns:

1. The tarball suffers from the bundled library problem (libbam.a and libbammac.a are shipped precompiled). This is incompatible with the GPLv3 license.

2. The Makefile is not portable with its hard-coded paths to compilers (/opt/blc/gcc-4.5.0/bin/gcc should just be gcc).


The SOAPdenovo2 article was published today under the terms of the Creative Commons Attribution License. Here are the Availability and requirements for SOAPdenovo2 as reported in the article:

Availability and requirements

Project name: SOAPdenovo2
Project home page: http://soapdenovo2.sourceforge.net/
Operating system(s): e.g. Platform independent
Programming language: C, C++
Other requirements: GCC version ≥ 4.5.0
License: GNU General Public License version 3.0 (GPLv3)
Any restrictions to use by non-academics: none
Contact: bgi-soap@googlegroups.com




I fired up my browser and in a flash went to the "Project home page". I downloaded the latest SOAPdenovo2 distribution but was disappointed to see that the SOAPdenovo2 distribution was binary-only. This is a logical contradiction with the GNU General Public License, version 3 -- the license under which SOAPdenovo2 is being distributed according to the article above.

Binary-only distribution is the path to the dark side (proprietary)

The SOAPdenovo2 distribution contains 4 pre-compiled binaries and 2 plain-text files. No source files were found, which is confusing because the GNU General Public License, version 3, is for distributing open source software with an emphasis for freedom.

Table 1: Files distributed in the tarball called SOAPdenovo2_revision217.tgz
-->
File Type
MANUAL ASCII text, with very long lines
update.log ASCII text
pregraph_sparse_127mer.v1.0.3 ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, for GNU/Linux 2.6.9, not stripped
pregraph_sparse_63mer.v1.0.3 ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, for GNU/Linux 2.6.9, not stripped
SOAPdenovo-127mer ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, for GNU/Linux 2.6.9, not stripped
SOAPdenovo-63mer ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, for GNU/Linux 2.6.9, not stripped

Concerns of a bioinformatics adventurer

Concern #1: ELF 64-bit LSB executables for GNU/Linux 2.6.9 are not platform-independent. Therefore, the claim of platform independence is false. For instance, I can not run these executables on OpenBSD.

Concern #2: GCC version ≥ 4.5.0 is not required as it is a proprietary binary distribution. Therefore this requirement is untrue.

Concern #3: proprietary software distributions are not eligible to licensing under the GNU General Public License version 3. Therefore, the authors should select their own proprietary license or release the source code of SOAPdenovo2. The previous last publicly available version was SOAPdenovo v1.05.


What the reviewers had to say ?


Reviewer #AJN was concerned here and here by the lack of source code.

Freedoms provided by free software

According to the Free Software Foundation, Inc.:

A program is free software if the program's users have the four essential freedoms:
  • The freedom to run the program, for any purpose (freedom 0).
  • The freedom to study how the program works, and change it so it does your computing as you wish (freedom 1). Access to the source code is a precondition for this.
  • The freedom to redistribute copies so you can help your neighbor (freedom 2).
  • The freedom to distribute copies of your modified versions to others (freedom 3). By doing this you can give the whole community a chance to benefit from your changes. Access to the source code is a precondition for this. 

$

6 comments:

Peter Cock said...

According authors' response to the first round of reviews: "We will release the source code of SOAPdenovo2 as soon as the paper is accepted."

That is overdue. I would have hoped that the Editor had been stricter about this, as the reviewers should have been able to access the source code.

Sébastien Boisvert said...

From Author's comments for Resubmission - Version 3:

"RE: Thanks for reviewer’s nderstanding. SOAPdenovo2 has over 40,000 rows of codes and was developed by over 10 bioinformatics for over two years. In fact, many of our collaborators with non-disclosure agreement signed have a copy of the code and we have already got hundreds of feedbacks on improving the code quality from them. We have made the source code ready on SourceForge and once the paper is accepted, we will click the release button."

Methinks that copies of the source code are being distributed to collaborators along with a non disclosure agreement because BGI-Shenzhen (“BGI”) has financial competing interests with their bioinformatics services.

scott@giga said...

Sorry about that, on acceptance I gave the sourceforge page a cursory look to see if it was active, but obviously should have checked all of the links. My mistake. Everything was available to the reviewers from the ftp server and you can currently pull all the modules from the DOI we have integrated into the paper (see: http://dx.doi.org/10.5524/100044) but I'm chasing the authors again to switch the code live on sourceforge as well. Thanks for the feedback.

Sébastien Boisvert said...

Dear scott@giga,

I truly thank you for looking into this matter as I believe that open source is as important as open access to fulfil the destiny of open science. Furthermore, the free software movement is not just about open source, but about much more such as the 4 freedoms above.

I have been using SourceForge.net for more than 8 years and I am not aware of a mechanism to enable/disable selectively files for a project.

$

scott@giga said...

Dear Sébastien,

the authors say they have updated the page now, but if there is anything still missing let me know. They forgot to fill out the platform section, but it should work in Unix/Linux/Mac, and they say openBSD won't be a problem since the program codes should comply to the POSIX standard.

Open source is obviously a very important part of open science, and something GigaScience is very keen on promoting. For many reasons (including the fact that many of these repositories are actually blocked there) in China its a much newer concept, but from my impression the BGI Algorithm team is keen on making & showcasing their work in this way now as well. This community guidance and feedback is very useful for them for the future. I think this has also been a good test and demonstration of our transparent peer review system, as in any other journal this probably wouldn't have been picked up.

Thanks,

Scott

Peter Cock said...

I applaud the SOAPdenovo2 authors for picking an open source licence. I see the SOAPdenovo2 team have now released a source code bundle on SourceForge under http://sourceforge.net/projects/soapdenovo2/files/SOAPdenovo2/src/r223/ (soon after the paper was made publicly available, as they said they would).

That addresses the main concern (rightly flagged by your reviewers) that the source be made public. It would be nice if the source code repository itself were public (showing the history etc), rather than just a snapshot, but that is not essential. Thanks Scott!

There was an error in this gadget