Fetching data from Illumina BaseSpace

We are working to deploy Ray in Illumina BaseSpace.

For our tests, we needed the data on our infrastructure in Québec City.


First, I did a list of objects for 2x150bp Human Genome in Record Time with the HiSeq 2500

$ cat RawFiles.txt
https://basespace.illumina.com/sample/262264/files/raw/sorted_S1_L001_R1_001.fastq.gz?id=25033024&appResultId=
https://basespace.illumina.com/sample/262264/files/raw/sorted_S1_L001_R1_002.fastq.gz?id=25054488&appResultId=
https://basespace.illumina.com/sample/262264/files/raw/sorted_S1_L001_R2_001.fastq.gz?id=25081698&appResultId=
https://basespace.illumina.com/sample/262264/files/raw/sorted_S1_L001_R2_002.fastq.gz?id=25123588&appResultId=
https://basespace.illumina.com/sample/262264/files/raw/sorted_S1_L002_R1_001.fastq.gz?id=25155266&appResultId=
https://basespace.illumina.com/sample/262264/files/raw/sorted_S1_L002_R1_002.fastq.gz?id=25175449&appResultId=
https://basespace.illumina.com/sample/262264/files/raw/sorted_S1_L002_R2_001.fastq.gz?id=25196878&appResultId=
https://basespace.illumina.com/sample/262264/files/raw/sorted_S1_L002_R2_002.fastq.gz?id=25237826&appResultId=



Then, using curl, I wrote a script.


$ cat Get.sh
for object in $(cat RawFiles.txt)
do
        # just take the part with fastq
        fileName=$(for i in $(echo "$object"|sed 's=/= =g'|sed 's=?= =g'); do echo $i; done|grep fastq)

        curl --output $fileName --continue-at - --location \
                --cookie IComLogin=$(cat ~/123) "$object" &> $fileName.log &
done


~/123 contains the cookie value at Illumina BaseSpace.

If you are using a shared machine, use a cookie file instead, otherwise everyone can take your cookie.

Then I started the parallel downloads.

$ bash Get.sh

Finally, I obtained file size, sha1sum, and number of entries for each file.


Table 1: 2x150bp Human Genome in Record Time with the HiSeq 2500.
File
DNA sequences
Bytes
Sha1
sorted_S1_L001_R1_001.fastq.gz
143818693
18629367106
9f964962cc1253d0ab034db8b3ac4b5e74d0859a
sorted_S1_L001_R2_001.fastq.gz
143818693
19363819987
3b795c8e7f844756cc41e5470aa3aa15d3eb047e
sorted_S1_L001_R1_002.fastq.gz
149973420
19495696779
05e3d6ec3dd21466be020e808f6ec923dc392dc2
sorted_S1_L001_R2_002.fastq.gz
149973420
20247581108
d1ced61fee4cbfef61ff9b082169435a67162d11
sorted_S1_L002_R1_001.fastq.gz
144295306
18683353035
74cb3168e853578d5261eb659c71a8c79fd1b1fe
sorted_S1_L002_R2_001.fastq.gz
144295306
19422921920
24ba63efb2b7255914361db925e9dc9ea31f302e
sorted_S1_L002_R1_002.fastq.gz
147591231
19122587922
6cf2b8a24d849f5b59d37e182c79182f4732337d
sorted_S1_L002_R2_002.fastq.gz
147591231
19906116448
13e015736e6a4e6006c6d5089b8ddd2053e7e653


Comments

Unknown said…
Do you know how persistent the cookie is? Have you tried to write an app to do the transfer?
Cheers
Mike
sebhtml said…
I don't know the life length of the cookie.

Manduca said…
This was really helpful. Thanks.

Popular posts from this blog

A survey of the burgeoning industry of cloud genomics

Generating neural machine instructions for multi-head attention

Adding ZVOL VIRTIO disks to a guest running on a host with the FreeBSD BHYVE hypervisor