Automate Rapidshare free downloads - No more captchas
Earlier this month the popular file sharing service Rapidshare decided to drop the use of captcha’s. This came as a huge surprise to myself because previously Rapidshare has used some of the most convaluded captcha puzzles ever! At points I would be Lucky to get 25% of captures used by Rapidshare right.
Dropping the use of captcha’s has paved the way for automated downloads but Rapidshare are aware of this and has lowered the download speed of free users to 500 kilobit per second.
Never less I would rather the lower speed and with the capability to have the damn thing automated. So here it is, my simple bash script to download a batch of files of Rapidshare. Hell if you only want to download 1 file its only a “one” linner.
Full version to download a batch of files specified in a file called input.txt, 1 file per line.
New Version (1/11/2008): Rapidshare has added a wait time in between file downloads. On top of your download
to start. This has been fixed.
Download the full .sh with some enhancments made by Tune. Also see comments for a Python version by Conman.
Download Itay’s version with some nice features. See his comment for the features.
do
URL=$(curl -s $line | grep "<form id=\"ff\" action=\"" | grep -o ‘http://[^"]*rar’);
ourfile=$(curl -s -d "dl.start=Free" "$URL" | grep "document.dlf.action=" | grep -o ‘http://[^"]*rar’ | head -n 1);
sleep 90; #70 secs for 200mb 50 secs for 90mb
wget $ourfile;
done < input.txt
Analysis
do
Code……………..
done < input.txt
Loop through all lines in the file input.txt, store each line in a variable called ‘line’, move to the next line.
Use curl to ‘download’ the page of the url we are processing. Pipe the downloaded pages source code through some greps to extract the action URL of the html form which we need to ’submit’ to trigger the download. Store this url in a variable called ‘URL’.
Using curl again, ’submit’ a html form to our extracted URL, passing it the post data (-d switch in curl) of being a free user. Rapidshare then replies with a new page with a list of Rapidshare servers to download from, they are stored in lines of Javascript starting with ‘document.dlf.action=’. After the equal sign is the direct url to our file, so lets grab this with another grep!
We are now have a list all the Rapidshare servers holding our file like this.
http://rs202l33.rapidshare.com/files/50141870/1554876/centos-5.0.tar.part01.rar
http://rs202tl3.rapidshare.com/files/50141870/1554876/centos-5.0.tar.part01.rar
http://rs202cg.rapidshare.com/files/50141870/1554876/centos-5.0.tar.part01.rar
http://rs202l32.rapidshare.com/files/50141870/1554876/centos-5.0.tar.part01.rar
etc…..
We will try our luck with the first one, so filter this output through ‘head’ telling it to extract one line (head -n 1)
Rapidshare still makes us wait for our file! So, we will just wait here for however long it takes which is 70 secs for a 200mb (new file size limit) 50 secs for 90mb (old file size limit).
Finally the wait is over, luckily its the script doing the waiting, not us. Lets get our file. wget our file that is…
Closing comments, if you know how to improve the script please let me know. At the moment it will only work with .rar files though you could easily change this. A generic version would be nice but this is what most files are in on RS.
A version which uses proxys to download files in parallel would be good.
I dont like piping a grep command into a grep command so I’m sure this could be done with just one grep.
I use use both wget and curl, which here accentually do the same job, if you have one or the other you can fix the script up to only use either.
Tip: Set your batch download up an in a screen session and detach it so you can ssh in and monitor your downloads where ever you go. This will also let your downloads continue downloading when you close your terminal window or putty sesion.
Programs needed: Bash (duh!), head, grep, sleep and curl or wget
Note: I have required to increase the sleep time to 90 seconds. I am unsure how to calculate this time because it has seemed to change from when I first wrote this. It may well be RS has increased the time or the value is variable depending on how much you have previously downloaded. So 90 seconds seems like a safe value for now.
July 29th, 2008 at 8:58 pm
Using Python and the Mechanize package, I was able to write a simple Python script similar to what you have described here. It is fully automatic and waits the appropriate number of seconds before downloading the file; it does not guess the number of seconds needed for waiting. The benefits of the my script is that it’s portable to any OS that supports Python. So far, I’ve tested it in Ubuntu 8.04 and WinXP.
Another advantage of Python is its exception handling. What does your script do if the file has been deleted from the RS server? What happens if the only a partial download was completed?
Below is a working copy of my script if you’re interested….
——-
import re
import urllib
import sys
import time
from mechanize import Browser
# global variables
lastUpdate = 0.0
progressSize = 0
def reportHook(count, blocksize, totalsize):
global lastUpdate
global progressSize
# a count of zero value implies a new download has started
if count == 0:
# initialize the time download has started
lastUpdate = time.time()
# progress of downloaded file is initially zero
progressSize = 0
return
progressSize += blocksize
# calculate the time elapsed since last progress update
# only display progress every X seconds
deltaTime = time.time() - lastUpdate
if deltaTime > 30:
# calculate the file completion progress
percent = int(float(count * blocksize * 100) / totalsize)
# calculate the download speed in kilobytes/seccond
speed = int(progressSize / (1024 * deltaTime))
# display progress to console
print time.strftime(”\t[%d %b %Y %H:%M:%S]”) , str(percent) + “% complete @” , str(speed) , “kB/s”
# reset the timestamp
lastUpdate = time.time()
# reset byte counter
progressSize = 0
def downloadFile(link):
# strip the trailing newline characters (if any)
url = link.strip(’\n\r ‘)
# parse url for filename
filename = url.split(’/')[-1]
# cleanup from previous call to urlretrieve()
urllib.urlcleanup()
print “Starting download of: ” + filename
# open connection to file and download it
urllib.urlretrieve(url, filename, reportHook)
# display success message
print “Download completed:” , filename
def htc(m):
return chr(int(m.group(1), 16))
def urldecode(url):
rex=re.compile(’%([0-9a-hA-H][0-9a-hA-H])’, re.M)
return rex.sub(htc, url)
def main(localfile):
f = open(localfile, ‘r’)
for line in f.readlines():
# create instance of browser
br = Browser()
try:
# open first page
br.open(line)
# select the first form (free download link)
br.select_form(nr=0)
# submit the form (click the button)
response = br.submit()
# get the entire string of HTML
html = response.read()
except:
# the file does not exist on server
# continue to the next file
continue
htmlDec = urldecode(html)
# determine wait-time (seconds) for free file by searching JavaScript code for variable ‘c’
waitTime = re.search(’var c=[0-9]*’, htmlDec).group(0)[6:]
# wait the required number of seconds
print ‘Waiting for ‘ + waitTime + ‘ seconds’
time.sleep(int(waitTime) + 1)
# find the link to the download file
url = re.search(’action=”http://[^”]*’, htmlDec).group(0)[8:]
try:
downloadFile(url)
except:
continue
# close browser session
br.close()
# delete browser session reference
del br
# sleep for a 15 seconds before attempting to download the next file
time.sleep(15)
# close the argument file
f.close()
if __name__ == “__main__”:
main(sys.argv[1])
August 7th, 2008 at 6:52 am
Improved version
Now one can select the mirror to download from, files without rar extension are working, time to wait is extracted form downloadwebsite.
Furthermore the number of needed programms has been reduced by two. Instead of curl, wget is used and instead of head the regex expression for grep is modified.
TODO skip empty lines in input.txt
## possible mirrors
# cg.rapidshare.com
# l34.rapidshare.com
# tg.rapidshare.com
# gc2.rapidshare.com
# dt.rapidshare.com
# tl2.rapidshare.com
# l32.rapidshare.com
# l3.rapidshare.com
# gc.rapidshare.com
# l33.rapidshare.com
# tl.rapidshare.com
# cg2.rapidshare.com
mirror=dt.rapidshare.com;
while read line
do
URL=$(wget -q -O - $line | grep “
August 7th, 2008 at 6:54 am
Input was too long…
#!/bin/bash
mirror=dt.rapidshare.com;
while read line
do
URL=$(wget -q -O - $line | grep “
August 7th, 2008 at 6:57 am
replace µ with the inequality sign to open tags
#!/bin/bash
mirror=dt.rapidshare.com;
while read line
do
URL=$(wget -q -O - $line | grep “µform id=\”ff\” action=\”" | grep -o ‘http://[^”]*’);
output=$(wget -q -O - –post-data “dl.start=Free” “$URL”);
time=$(echo “$output” | grep “var c=[0-9]*;” | grep -o “[0-9]*”);
ourfile=$(echo “$output” | grep “document.dlf.action=” | grep -o “http://[^\”]*$mirror[^\\]*”);
echo “waiting for download of $ourfile”;
echo “wait $time secs”;
sleep $time;
wget $ourfile;
done
August 7th, 2008 at 7:49 am
after done µ input.txt is needed. µ should be replaced.
Sorry for all this posts, but I didn’t know how to enter this inequality signs.
Probably you can publish a final version of the script with all needed signs.
Btw, I had to repace your ’ with ‘ to make it work.
August 16th, 2008 at 1:56 pm
Thanks very much both of you. I will get around to publishing both your scripts properly.
August 26th, 2008 at 7:38 pm
Brilliant. Just works. Cheers, mate!!
August 31st, 2008 at 11:20 pm
Good day!,
September 17th, 2008 at 4:42 am
great blog,
can anybody release the final version of the script?
Also can you tell how to use this script for megaupload downloads?
thanks
kind regards
navaladi
September 29th, 2008 at 6:59 am
There is version of the script with the changes proposed, but I had to use cut for the URL extraction.
It also uses Xdialog (http://xdialog.free.fr/) but you can change it for zenity, dialog o whatever you want, hope it’s useful:
#!/bin/bash
# cg.rapidshare.com
# l34.rapidshare.com
# tg.rapidshare.com
# gc2.rapidshare.com
# dt.rapidshare.com
# tl2.rapidshare.com
# l32.rapidshare.com
# l3.rapidshare.com
# gc.rapidshare.com
# l33.rapidshare.com
# tl.rapidshare.com
# cg2.rapidshare.com
mirror=dt.rapidshare.com;
Xdialog –editbox /dev/null 100 100 &> /tmp/rapid
echo “” >> /tmp/rapid
while read line
do
URL=$(wget -q -O - $line | grep “<form id=\”ff\” action=\”" | cut -f4 -d”\”");
output=$(wget -q -O - –post-data “dl.start=Free” “$URL”);
ourfile=$(echo “$output” | grep “document.dlf.action=” | grep -o “http://[^\”]*$mirror[^\\]*”);
echo “== $ourfile ==”;
for i in `seq 90 -1 0`; do printf “%02d\r” $i; sleep 1; done
printf “\n\n”
wget $ourfile;
done < “/tmp/rapid”
October 1st, 2008 at 1:27 am
I’d add –user-agent value to wget, with some widely used agent value.. just in case to hide this scripted action.
October 5th, 2008 at 7:54 pm
mirror=( cg.rapidshare.com l34.rapidshare.com tg.rapidshare.com gc2.rapidshare.com dt.rapidshare.com tl2.rapidshare.com l32.rapidshare.com l3.rapi
dshare.com gc.rapidshare.com l33.rapidshare.com tl.rapidshare.com cg2.rapidshare.com )
while read line
do
x=$[x+1]
[ $x -ge ${#mirror[*]} ] && x=0
URL=$(wget -q -O - $line | grep “
October 8th, 2008 at 1:54 pm
Since rapidshare re-introduced the waiting times based on the amount of data downloaded, the script needs corrections.
I did a brute 1 min fix but I suggest another permanent solution should be done.
I also added the User Agent bit. Good idea Sala!
#!/bin/bash
################################################
#Purpose: Automate the downloading of files from rapidshare using the free account
#using simple unix tools.
#Date: 14-7-2008
#Authors: Slith, Tune
#Improvements, Feedback, comments: Please go to http://emkay.unpointless.com/Blog/?p=63
#Notes: To use curl instead of wget use ‘curl -s’ and ‘curl -s -d’
#Version: 1.1
################################################
#IMPORTANT! - PLEASE SHARE ALL IMPOROVEMENTS MADE.
#Thanks to Tune for getting rid of the curl dependency, extracting the correct wait time and
#makeing it work with file downloads other than .rar files.
#TODO: Ignore new lines in input file
#TODO: Make work concurrently with a list of proxys
#TODO: If possible resume partially downloaded files using another mirror / move to next mirror
#if the attempted mirror is down.
mirror=dt.rapidshare.com;
## possible mirrors
# cg.rapidshare.com
# l34.rapidshare.com
# tg.rapidshare.com
# gc2.rapidshare.com
# dt.rapidshare.com
# tl2.rapidshare.com
# l32.rapidshare.com
# l3.rapidshare.com
# gc.rapidshare.com
# l33.rapidshare.com
# tl.rapidshare.com
# cg2.rapidshare.com
UA=”–user-agent=Mozilla”
while read line
do
URL=$(wget $UA -q -O - $line | grep “
October 8th, 2008 at 5:40 pm
Hi there, this is my version of this uber script:)
I’ve added simple support for new download limits, there is also real IE6 User-agent string.
I run script in other way, like this:
screen rapsuck dllist.txt
#!/bin/bash
################################################
#Purpose: Automate the downloading of files from rapidshare using the free account
#using simple unix tools.
#Date: 14-7-2008
#Authors: Slith, Tune
#Improvements, Feedback, comments: Please go to http://emkay.unpointless.com/Blog/?p=63
#Notes: To use curl instead of wget use ‘curl -s’ and ‘curl -s -d’
#Version: 1.1
################################################
#IMPORTANT! - PLEASE SHARE ALL IMPOROVEMENTS MADE.
#Thanks to Tune for getting rid of the curl dependency, extracting the correct wait time and
#makeing it work with file downloads other than .rar files.
#TODO: Ignore new lines in input file
#TODO: Make work concurrently with a list of proxys
#TODO: If possible resume partially downloaded files using another mirror / move to next mirror
#if the attempted mirror is down.
mirror=gc.rapidshare.com;
## possible mirrors
# cg.rapidshare.com
# l34.rapidshare.com
# tg.rapidshare.com
# gc2.rapidshare.com
# dt.rapidshare.com
# tl2.rapidshare.com
# l32.rapidshare.com
# l3.rapidshare.com
# gc.rapidshare.com
# l33.rapidshare.com
# tl.rapidshare.com
# cg2.rapidshare.com
UA=”–user-agent=Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)”
while read line
do
URL=$(wget “$UA” -q -O - $line | grep “
October 8th, 2008 at 5:42 pm
pasted again:
mirror=gc.rapidshare.com;
UA=”–user-agent=Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)”
while read line
do
URL=$(wget “$UA” -q -O - $line | grep “
October 10th, 2008 at 1:47 pm
@Magare
Please, repost your script. WP ate big part of it
October 15th, 2008 at 2:53 pm
hi again, the source is here: http://przemq.of19.net/rapsuck
:)
October 19th, 2008 at 11:51 am
You own my respect and gratitude. ,
October 21st, 2008 at 3:54 pm
mirror=( cg.rapidshare.com l34.rapidshare.com tg.rapidshare.com gc2.rapidshare.com dt.rapidshare.com tl2.rapidshare.com l32.rapidshare.com l3.rapidshare.com gc.rapidshare.com l33.rapidshare.com tl.rapidshare.com cg2.rapidshare.com )
while read line
do
x=$[x+1]
[ $x -ge ${#mirror[*]} ] && x=0
…
and change $mirror to this ${mirror[$x]}
…
for multiple mirrors
November 2nd, 2008 at 6:51 pm
that python script doesn’t work. not sure why or if dependencies changed.
damn thing won’t even open a simple webpage; fails here:
# open first page
br.open(line)
stepped into Mechanize a bit, but it gets hairy pretty quickly. Of course there is no exception message to report.
November 12th, 2008 at 4:07 pm
The new script for:
“New Version (1/11/2008): Rapidshare has added a wait time in between file downloads. On top of your download
to start. This has been fixed.”
In NOT working.
November 13th, 2008 at 3:30 pm
[…] O script original foi escrito pelo Slith Tune e disponibilizado aqui, mas eu alterei este script e adicionei-lhe duas melhorias: […]
November 17th, 2008 at 9:10 am
I’m attaching my own version of the script, based on version 1.1.
Features:
- Support for the different waits (with a countdown timer!)
- Handles various failure conditions
- Removes file from input list if download was successful (leaves only failed downloads on the list)
- You can update the input list while the script is running
- Unpacks rar files (including splitted rar archives: .partN.rar). Requires that the ‘unrar’ utility be in your $PATH.
- If you put a file named pwdict in your target directory (where input.txt is), it will be used as a white-space separated list of rar passwords to try when unpacking.
- Uses the default download mirror (worked best for me.)
Actually the unpacking part is a separate script which is started as a bg process whenever a file is done downloading.
Download
Enjoy!
November 18th, 2008 at 3:58 pm
@Itay:
kalvin@bombadil:~/temp/rapid$ ./downloadFromRS.sh
./downloadFromRS.sh: line 32: unexpected EOF while looking for matching `”‘
./downloadFromRS.sh: line 57: syntax error: unexpected end of file
eh?
January 9th, 2009 at 11:00 pm
@Itay: Yours script caused problems on my pc (egrep -o “[0-9]*” ), but changing * to + solved them
very useful script, thx
January 25th, 2009 at 12:18 pm
No wait time, likely to be the first file you have downloaded in a while
Waiting secs for download of
missing operand
Try `sleep –help’ for more information.
wget: missing URL
Usage: wget [OPTION]… [URL]…
Try `wget –help’ for more options.
No wait time, likely to be the first file you have downloaded in a while
Waiting secs for download of
missing operand
Try `sleep –help’ for more information.
wget: missing URL
Usage: wget [OPTION]… [URL]…
Try `wget –help’ for more options.
January 25th, 2009 at 12:25 pm
It’s because of “Currently a lot of users are downloading files. Please try again in 2 minutes or become a ” message, if this happens then your list just goes to end and exits.
January 25th, 2009 at 12:45 pm
here is full $output
http://paste2.org/p/135815
March 2nd, 2009 at 1:07 pm
Script good starting point, but has flaw:
lines like
time=$(echo “$output” | grep “var…
should be rewritten like
time=$(echo -e “$output” | tr -s \\r \\n| grep “var…
as otherwise grep has to work with one huge line instead of many (and thus will match worse).
April 29th, 2009 at 6:06 am
Thanks !!! great tool
May 7th, 2009 at 8:29 am
Some people expected problems with longtime detection during sequential downloading of many links.
Replacing longtime=$(echo “$output” | egrep “Or try again in about.*minutes” | egrep -o “[0-9]*”) with longtime=$(echo “$output” | egrep -o “Or try again in about .* minutes” | grep -o “[0-9]\{1,3\}”) works perfectly.
Thank you very much for this useful script.
June 15th, 2009 at 3:10 am
Great script, I added some minor things like fake http user agent (wget pretending to be Microsoft Internet Explorer 8.0), separate downloads directory, etc.
Please add premium acct support (user & pass), I thought it was easy, but:
wget –http-user=linux –http-passwd=rocks (etc) is not working for me…
And… what about a quick rapid upload3r script? I think it could be done with curl.
July 7th, 2009 at 5:10 am
Thanks for sharing.You wouldnt say how some websites are helpful source of informations like this one.Thanks
John
July 22nd, 2009 at 5:05 pm
Really, very useful script - thanks a lot!
My enhancements:
1. Allow optional filename as argument
2. Ignore empty and comment lines in input.txt
3. Pause downloading without canceling
4. Reconnect Router to get new dynamic IP
# Allow optional filename, take input.txt if omitted
INPUT=${1:-input.txt}
… while read line; do
case $line in
“”|\#*) ;; # ignore empty and comment lines
*)
# directory “pause” in working dir pauses the download until it’s deleted
while [ -e “pause” ] ; do
echo “.\c”
sleep 10
done
# Router-reconnect (This is Router-specific)
# see database http://www.paehl.de/reconnect for examples
sh ~/bin/router-reconnect.sh
getOutputFromFreeUserSubmit …. (Rest of known script)
esac
done
July 22nd, 2009 at 5:10 pm
The last line of my previous post has been cut,
should have been:
done
July 22nd, 2009 at 5:11 pm
once again:
done (less than) $INPUT
August 20th, 2009 at 1:28 am
The script is very intersting, but it needs one more feature to suit my needs:
change ip for connections wich have dynamic ip (to skip te 90 minutes wait between downloads)
i know it’s a different thing for every router, so everybody should write it for his router, but would be nice an option like “-reset_connections” with wich the scripts runs an external script (the script 4 your router) and then checks if the ip changes…
if i have time i’ll try to write something for my router (with openwrt)
January 4th, 2010 at 2:13 pm
I have downloaded the source of downloadFromRS.sh and I had to modify the egrep parameter from this:
serverbusy=$(echo “$output” | egrep “Currently a lot of users are downloading files. Please try again in.*minutes” | grep -o “[0-9]*”)
to this:
serverbusy=$(echo “$output” | egrep “Currently a lot of users are downloading files\.[[:space:]]+Please try again in.*minutes” | grep -o “[0-9]*”)
The space before “Pleas try…” is doubled so it was not detected.
Anyway, tahnx for this great script!