Fetching data from Illumina BaseSpace
We are working to deploy Ray in Illumina BaseSpace.
For our tests, we needed the data on our infrastructure in Québec City.
First, I did a list of objects for 2x150bp Human Genome in Record Time with the HiSeq 2500
$ cat RawFiles.txt
https://basespace.illumina.com/sample/262264/files/raw/sorted_S1_L001_R1_001.fastq.gz?id=25033024&appResultId=
https://basespace.illumina.com/sample/262264/files/raw/sorted_S1_L001_R1_002.fastq.gz?id=25054488&appResultId=
https://basespace.illumina.com/sample/262264/files/raw/sorted_S1_L001_R2_001.fastq.gz?id=25081698&appResultId=
https://basespace.illumina.com/sample/262264/files/raw/sorted_S1_L001_R2_002.fastq.gz?id=25123588&appResultId=
https://basespace.illumina.com/sample/262264/files/raw/sorted_S1_L002_R1_001.fastq.gz?id=25155266&appResultId=
https://basespace.illumina.com/sample/262264/files/raw/sorted_S1_L002_R1_002.fastq.gz?id=25175449&appResultId=
https://basespace.illumina.com/sample/262264/files/raw/sorted_S1_L002_R2_001.fastq.gz?id=25196878&appResultId=
https://basespace.illumina.com/sample/262264/files/raw/sorted_S1_L002_R2_002.fastq.gz?id=25237826&appResultId=
Then, using curl, I wrote a script.
$ cat Get.sh
for object in $(cat RawFiles.txt)
do
# just take the part with fastq
fileName=$(for i in $(echo "$object"|sed 's=/= =g'|sed 's=?= =g'); do echo $i; done|grep fastq)
curl --output $fileName --continue-at - --location \
--cookie IComLogin=$(cat ~/123) "$object" &> $fileName.log &
done
~/123 contains the cookie value at Illumina BaseSpace.
If you are using a shared machine, use a cookie file instead, otherwise everyone can take your cookie.
Then I started the parallel downloads.
$ bash Get.sh
Finally, I obtained file size, sha1sum, and number of entries for each file.
For our tests, we needed the data on our infrastructure in Québec City.
First, I did a list of objects for 2x150bp Human Genome in Record Time with the HiSeq 2500
$ cat RawFiles.txt
https://basespace.illumina.com/sample/262264/files/raw/sorted_S1_L001_R1_001.fastq.gz?id=25033024&appResultId=
https://basespace.illumina.com/sample/262264/files/raw/sorted_S1_L001_R1_002.fastq.gz?id=25054488&appResultId=
https://basespace.illumina.com/sample/262264/files/raw/sorted_S1_L001_R2_001.fastq.gz?id=25081698&appResultId=
https://basespace.illumina.com/sample/262264/files/raw/sorted_S1_L001_R2_002.fastq.gz?id=25123588&appResultId=
https://basespace.illumina.com/sample/262264/files/raw/sorted_S1_L002_R1_001.fastq.gz?id=25155266&appResultId=
https://basespace.illumina.com/sample/262264/files/raw/sorted_S1_L002_R1_002.fastq.gz?id=25175449&appResultId=
https://basespace.illumina.com/sample/262264/files/raw/sorted_S1_L002_R2_001.fastq.gz?id=25196878&appResultId=
https://basespace.illumina.com/sample/262264/files/raw/sorted_S1_L002_R2_002.fastq.gz?id=25237826&appResultId=
Then, using curl, I wrote a script.
$ cat Get.sh
for object in $(cat RawFiles.txt)
do
# just take the part with fastq
fileName=$(for i in $(echo "$object"|sed 's=/= =g'|sed 's=?= =g'); do echo $i; done|grep fastq)
curl --output $fileName --continue-at - --location \
--cookie IComLogin=$(cat ~/123) "$object" &> $fileName.log &
done
~/123 contains the cookie value at Illumina BaseSpace.
If you are using a shared machine, use a cookie file instead, otherwise everyone can take your cookie.
Then I started the parallel downloads.
$ bash Get.sh
Finally, I obtained file size, sha1sum, and number of entries for each file.
Table 1: 2x150bp Human Genome in
Record Time with the HiSeq 2500.
File |
DNA sequences |
Bytes |
Sha1 |
sorted_S1_L001_R1_001.fastq.gz |
143818693 |
18629367106 |
9f964962cc1253d0ab034db8b3ac4b5e74d0859a |
sorted_S1_L001_R2_001.fastq.gz |
143818693
|
19363819987 |
3b795c8e7f844756cc41e5470aa3aa15d3eb047e |
sorted_S1_L001_R1_002.fastq.gz |
149973420
|
19495696779 |
05e3d6ec3dd21466be020e808f6ec923dc392dc2 |
sorted_S1_L001_R2_002.fastq.gz |
149973420
|
20247581108 |
d1ced61fee4cbfef61ff9b082169435a67162d11
|
sorted_S1_L002_R1_001.fastq.gz |
144295306
|
18683353035 |
74cb3168e853578d5261eb659c71a8c79fd1b1fe |
sorted_S1_L002_R2_001.fastq.gz |
144295306
|
19422921920 |
24ba63efb2b7255914361db925e9dc9ea31f302e |
sorted_S1_L002_R1_002.fastq.gz |
147591231
|
19122587922 |
6cf2b8a24d849f5b59d37e182c79182f4732337d |
sorted_S1_L002_R2_002.fastq.gz |
147591231
|
19906116448 |
13e015736e6a4e6006c6d5089b8ddd2053e7e653 |
Comments
Cheers
Mike