Archive for the 'bash' Category

Automate Rapidshare free downloads - No more captchas

Monday, July 14th, 2008

Earlier this month the popular file sharing service Rapidshare decided to drop the use of captcha’s. This came as a huge surprise to myself because previously Rapidshare has used some of the most convaluded captcha puzzles ever! At points I would be Lucky to get 25% of captures used by Rapidshare right.

Dropping the use of captcha’s has paved the way for automated downloads but Rapidshare are aware of this and has lowered the download speed of free users to 500 kilobit per second.

Never less I would rather the lower speed and with the capability to have the damn thing automated. So here it is, my simple bash script to download a batch of files of Rapidshare. Hell if you only want to download 1 file its only a “one” linner.

URL=$(curl -s <URL TO RAPIDSHARE FILE>| grep "<form id=\"ff\" action=\"" | grep -o ‘http://[^"]*rar’);ourfile=$(curl -s -d "dl.start=Free" "$URL" | grep "document.dlf.action=" | grep -o ‘http://[^"]*rar’ | head -n 1); sleep 90;wget $ourfile;

Full version to download a batch of files specified in a file called input.txt, 1 file per line.

New Version (1/11/2008): Rapidshare has added a wait time in between file downloads. On top of your download
to start. This has been fixed.

Download the full .sh with some enhancments made by Tune. Also see comments for a Python version by Conman.

Download Itay’s version with some nice features. See his comment for the features.

while read line
do

URL=$(curl -s $line | grep "<form id=\"ff\" action=\"" | grep -o ‘http://[^"]*rar’);
ourfile=$(curl -s -d "dl.start=Free" "$URL" | grep "document.dlf.action=" | grep -o ‘http://[^"]*rar’ | head -n 1);
sleep 90; #70 secs for 200mb 50 secs for 90mb
wget $ourfile;

done < input.txt

Analysis

while read line
do
Code……………..
done < input.txt

Loop through all lines in the file input.txt, store each line in a variable called ‘line’, move to the next line.

URL=$(curl -s $line | grep "<form id=\"ff\" action=\"" | grep -o ‘http://[^"]*rar’);

Use curl to ‘download’ the page of the url we are processing. Pipe the downloaded pages source code through some greps to extract the action URL of the html form which we need to ’submit’ to trigger the download. Store this url in a variable called ‘URL’.

ourfile=$(curl -s -d "dl.start=Free" "$URL" | grep "document.dlf.action=" | grep -o ‘http://[^"]*rar’ | head -n 1);

Using curl again, ’submit’ a html form to our extracted URL, passing it the post data (-d switch in curl) of being a free user. Rapidshare then replies with a new page with a list of Rapidshare servers to download from, they are stored in lines of Javascript starting with ‘document.dlf.action=’. After the equal sign is the direct url to our file, so lets grab this with another grep!

We are now have a list all the Rapidshare servers holding our file like this.

http://rs202l3.rapidshare.com/files/50141870/1554876/centos-5.0.tar.part01.rar
http://rs202l33.rapidshare.com/files/50141870/1554876/centos-5.0.tar.part01.rar
http://rs202tl3.rapidshare.com/files/50141870/1554876/centos-5.0.tar.part01.rar
http://rs202cg.rapidshare.com/files/50141870/1554876/centos-5.0.tar.part01.rar
http://rs202l32.rapidshare.com/files/50141870/1554876/centos-5.0.tar.part01.rar
etc…..

We will try our luck with the first one, so filter this output through ‘head’ telling it to extract one line (head -n 1)

sleep 90;

Rapidshare still makes us wait for our file! So, we will just wait here for however long it takes which is 70 secs for a 200mb (new file size limit) 50 secs for 90mb (old file size limit).

wget $ourfile;

Finally the wait is over, luckily its the script doing the waiting, not us. Lets get our file. wget our file that is…

Closing comments, if you know how to improve the script please let me know. At the moment it will only work with .rar files though you could easily change this. A generic version would be nice but this is what most files are in on RS.

A version which uses proxys to download files in parallel would be good.

I dont like piping a grep command into a grep command so I’m sure this could be done with just one grep.

I use use both wget and curl, which here accentually do the same job, if you have one or the other you can fix the script up to only use either.

Tip: Set your batch download up an in a screen session and detach it so you can ssh in and monitor your downloads where ever you go. This will also let your downloads continue downloading when you close your terminal window or putty sesion.

Programs needed: Bash (duh!), head, grep, sleep and curl or wget

Note: I have required to increase the sleep time to 90 seconds. I am unsure how to calculate this time because it has seemed to change from when I first wrote this. It may well be RS has increased the time or the value is variable depending on how much you have previously downloaded. So 90 seconds seems like a safe value for now.