Fetching data from Illumina BaseSpace

We are working to deploy Ray in Illumina BaseSpace.

For our tests, we needed the data on our infrastructure in Québec City.


First, I did a list of objects for 2x150bp Human Genome in Record Time with the HiSeq 2500

$ cat RawFiles.txt
https://basespace.illumina.com/sample/262264/files/raw/sorted_S1_L001_R1_001.fastq.gz?id=25033024&appResultId=
https://basespace.illumina.com/sample/262264/files/raw/sorted_S1_L001_R1_002.fastq.gz?id=25054488&appResultId=
https://basespace.illumina.com/sample/262264/files/raw/sorted_S1_L001_R2_001.fastq.gz?id=25081698&appResultId=
https://basespace.illumina.com/sample/262264/files/raw/sorted_S1_L001_R2_002.fastq.gz?id=25123588&appResultId=
https://basespace.illumina.com/sample/262264/files/raw/sorted_S1_L002_R1_001.fastq.gz?id=25155266&appResultId=
https://basespace.illumina.com/sample/262264/files/raw/sorted_S1_L002_R1_002.fastq.gz?id=25175449&appResultId=
https://basespace.illumina.com/sample/262264/files/raw/sorted_S1_L002_R2_001.fastq.gz?id=25196878&appResultId=
https://basespace.illumina.com/sample/262264/files/raw/sorted_S1_L002_R2_002.fastq.gz?id=25237826&appResultId=



Then, using curl, I wrote a script.


$ cat Get.sh
for object in $(cat RawFiles.txt)
do
        # just take the part with fastq
        fileName=$(for i in $(echo "$object"|sed 's=/= =g'|sed 's=?= =g'); do echo $i; done|grep fastq)

        curl --output $fileName --continue-at - --location \
                --cookie IComLogin=$(cat ~/123) "$object" &> $fileName.log &
done


~/123 contains the cookie value at Illumina BaseSpace.

If you are using a shared machine, use a cookie file instead, otherwise everyone can take your cookie.

Then I started the parallel downloads.

$ bash Get.sh

Finally, I obtained file size, sha1sum, and number of entries for each file.


Table 1: 2x150bp Human Genome in Record Time with the HiSeq 2500.
File
DNA sequences
Bytes
Sha1
sorted_S1_L001_R1_001.fastq.gz
143818693
18629367106
9f964962cc1253d0ab034db8b3ac4b5e74d0859a
sorted_S1_L001_R2_001.fastq.gz
143818693
19363819987
3b795c8e7f844756cc41e5470aa3aa15d3eb047e
sorted_S1_L001_R1_002.fastq.gz
149973420
19495696779
05e3d6ec3dd21466be020e808f6ec923dc392dc2
sorted_S1_L001_R2_002.fastq.gz
149973420
20247581108
d1ced61fee4cbfef61ff9b082169435a67162d11
sorted_S1_L002_R1_001.fastq.gz
144295306
18683353035
74cb3168e853578d5261eb659c71a8c79fd1b1fe
sorted_S1_L002_R2_001.fastq.gz
144295306
19422921920
24ba63efb2b7255914361db925e9dc9ea31f302e
sorted_S1_L002_R1_002.fastq.gz
147591231
19122587922
6cf2b8a24d849f5b59d37e182c79182f4732337d
sorted_S1_L002_R2_002.fastq.gz
147591231
19906116448
13e015736e6a4e6006c6d5089b8ddd2053e7e653


Comments

Unknown said…
Do you know how persistent the cookie is? Have you tried to write an app to do the transfer?
Cheers
Mike
sebhtml said…
I don't know the life length of the cookie.

Manduca said…
This was really helpful. Thanks.

Popular posts from this blog

Le tissu adipeux brun, la thermogénèse, et les bains froids

My 2022 Calisthenics split routine

Adding ZVOL VIRTIO disks to a guest running on a host with the FreeBSD BHYVE hypervisor