Automate Rapidshare free downloads - No more captchas

Earlier this month the popular file sharing service Rapidshare decided to drop the use of captcha’s. This came as a huge surprise to myself because previously Rapidshare has used some of the most convaluded captcha puzzles ever! At points I would be Lucky to get 25% of captures used by Rapidshare right.

Dropping the use of captcha’s has paved the way for automated downloads but Rapidshare are aware of this and has lowered the download speed of free users to 500 kilobit per second.

Never less I would rather the lower speed and with the capability to have the damn thing automated. So here it is, my simple bash script to download a batch of files of Rapidshare. Hell if you only want to download 1 file its only a “one” linner.

URL=$(curl -s <URL TO RAPIDSHARE FILE>| grep "<form id=\"ff\" action=\"" | grep -o ‘http://[^"]*rar’);ourfile=$(curl -s -d "dl.start=Free" "$URL" | grep "document.dlf.action=" | grep -o ‘http://[^"]*rar’ | head -n 1); sleep 90;wget $ourfile;

Full version to download a batch of files specified in a file called input.txt, 1 file per line.

New Version (1/11/2008): Rapidshare has added a wait time in between file downloads. On top of your download
to start. This has been fixed.

Download the full .sh with some enhancments made by Tune. Also see comments for a Python version by Conman.

Download Itay’s version with some nice features. See his comment for the features.

while read line
do

URL=$(curl -s $line | grep "<form id=\"ff\" action=\"" | grep -o ‘http://[^"]*rar’);
ourfile=$(curl -s -d "dl.start=Free" "$URL" | grep "document.dlf.action=" | grep -o ‘http://[^"]*rar’ | head -n 1);
sleep 90; #70 secs for 200mb 50 secs for 90mb
wget $ourfile;

done < input.txt

Analysis

while read line
do
Code……………..
done < input.txt

Loop through all lines in the file input.txt, store each line in a variable called ‘line’, move to the next line.

URL=$(curl -s $line | grep "<form id=\"ff\" action=\"" | grep -o ‘http://[^"]*rar’);

Use curl to ‘download’ the page of the url we are processing. Pipe the downloaded pages source code through some greps to extract the action URL of the html form which we need to ’submit’ to trigger the download. Store this url in a variable called ‘URL’.

ourfile=$(curl -s -d "dl.start=Free" "$URL" | grep "document.dlf.action=" | grep -o ‘http://[^"]*rar’ | head -n 1);

Using curl again, ’submit’ a html form to our extracted URL, passing it the post data (-d switch in curl) of being a free user. Rapidshare then replies with a new page with a list of Rapidshare servers to download from, they are stored in lines of Javascript starting with ‘document.dlf.action=’. After the equal sign is the direct url to our file, so lets grab this with another grep!

We are now have a list all the Rapidshare servers holding our file like this.

http://rs202l3.rapidshare.com/files/50141870/1554876/centos-5.0.tar.part01.rar
http://rs202l33.rapidshare.com/files/50141870/1554876/centos-5.0.tar.part01.rar
http://rs202tl3.rapidshare.com/files/50141870/1554876/centos-5.0.tar.part01.rar
http://rs202cg.rapidshare.com/files/50141870/1554876/centos-5.0.tar.part01.rar
http://rs202l32.rapidshare.com/files/50141870/1554876/centos-5.0.tar.part01.rar
etc…..

We will try our luck with the first one, so filter this output through ‘head’ telling it to extract one line (head -n 1)

sleep 90;

Rapidshare still makes us wait for our file! So, we will just wait here for however long it takes which is 70 secs for a 200mb (new file size limit) 50 secs for 90mb (old file size limit).

wget $ourfile;

Finally the wait is over, luckily its the script doing the waiting, not us. Lets get our file. wget our file that is…

Closing comments, if you know how to improve the script please let me know. At the moment it will only work with .rar files though you could easily change this. A generic version would be nice but this is what most files are in on RS.

A version which uses proxys to download files in parallel would be good.

I dont like piping a grep command into a grep command so I’m sure this could be done with just one grep.

I use use both wget and curl, which here accentually do the same job, if you have one or the other you can fix the script up to only use either.

Tip: Set your batch download up an in a screen session and detach it so you can ssh in and monitor your downloads where ever you go. This will also let your downloads continue downloading when you close your terminal window or putty sesion.

Programs needed: Bash (duh!), head, grep, sleep and curl or wget

Note: I have required to increase the sleep time to 90 seconds. I am unsure how to calculate this time because it has seemed to change from when I first wrote this. It may well be RS has increased the time or the value is variable depending on how much you have previously downloaded. So 90 seconds seems like a safe value for now.

44 Responses to “Automate Rapidshare free downloads - No more captchas”

  1. Conman Says:

    Using Python and the Mechanize package, I was able to write a simple Python script similar to what you have described here. It is fully automatic and waits the appropriate number of seconds before downloading the file; it does not guess the number of seconds needed for waiting. The benefits of the my script is that it’s portable to any OS that supports Python. So far, I’ve tested it in Ubuntu 8.04 and WinXP.

    Another advantage of Python is its exception handling. What does your script do if the file has been deleted from the RS server? What happens if the only a partial download was completed?

    Below is a working copy of my script if you’re interested….

    ——-
    import re
    import urllib
    import sys
    import time

    from mechanize import Browser

    # global variables
    lastUpdate = 0.0
    progressSize = 0

    def reportHook(count, blocksize, totalsize):

    global lastUpdate
    global progressSize

    # a count of zero value implies a new download has started
    if count == 0:
    # initialize the time download has started
    lastUpdate = time.time()
    # progress of downloaded file is initially zero
    progressSize = 0
    return

    progressSize += blocksize

    # calculate the time elapsed since last progress update
    # only display progress every X seconds
    deltaTime = time.time() - lastUpdate
    if deltaTime > 30:
    # calculate the file completion progress
    percent = int(float(count * blocksize * 100) / totalsize)
    # calculate the download speed in kilobytes/seccond
    speed = int(progressSize / (1024 * deltaTime))
    # display progress to console
    print time.strftime(”\t[%d %b %Y %H:%M:%S]”) , str(percent) + “% complete @” , str(speed) , “kB/s”
    # reset the timestamp
    lastUpdate = time.time()
    # reset byte counter
    progressSize = 0

    def downloadFile(link):
    # strip the trailing newline characters (if any)
    url = link.strip(’\n\r ‘)
    # parse url for filename
    filename = url.split(’/')[-1]
    # cleanup from previous call to urlretrieve()
    urllib.urlcleanup()
    print “Starting download of: ” + filename
    # open connection to file and download it
    urllib.urlretrieve(url, filename, reportHook)
    # display success message
    print “Download completed:” , filename

    def htc(m):
    return chr(int(m.group(1), 16))

    def urldecode(url):
    rex=re.compile(’%([0-9a-hA-H][0-9a-hA-H])’, re.M)
    return rex.sub(htc, url)

    def main(localfile):
    f = open(localfile, ‘r’)
    for line in f.readlines():
    # create instance of browser
    br = Browser()
    try:
    # open first page
    br.open(line)
    # select the first form (free download link)
    br.select_form(nr=0)
    # submit the form (click the button)
    response = br.submit()
    # get the entire string of HTML
    html = response.read()
    except:
    # the file does not exist on server
    # continue to the next file
    continue

    htmlDec = urldecode(html)

    # determine wait-time (seconds) for free file by searching JavaScript code for variable ‘c’
    waitTime = re.search(’var c=[0-9]*’, htmlDec).group(0)[6:]

    # wait the required number of seconds
    print ‘Waiting for ‘ + waitTime + ‘ seconds’
    time.sleep(int(waitTime) + 1)

    # find the link to the download file
    url = re.search(’action=”http://[^”]*’, htmlDec).group(0)[8:]

    try:
    downloadFile(url)
    except:
    continue

    # close browser session
    br.close()
    # delete browser session reference
    del br

    # sleep for a 15 seconds before attempting to download the next file
    time.sleep(15)

    # close the argument file
    f.close()

    if __name__ == “__main__”:
    main(sys.argv[1])

  2. Tune Says:

    Improved version
    Now one can select the mirror to download from, files without rar extension are working, time to wait is extracted form downloadwebsite.
    Furthermore the number of needed programms has been reduced by two. Instead of curl, wget is used and instead of head the regex expression for grep is modified.
    TODO skip empty lines in input.txt

    ## possible mirrors
    # cg.rapidshare.com
    # l34.rapidshare.com
    # tg.rapidshare.com
    # gc2.rapidshare.com
    # dt.rapidshare.com
    # tl2.rapidshare.com
    # l32.rapidshare.com
    # l3.rapidshare.com
    # gc.rapidshare.com
    # l33.rapidshare.com
    # tl.rapidshare.com
    # cg2.rapidshare.com
    mirror=dt.rapidshare.com;

    while read line
    do
    URL=$(wget -q -O - $line | grep “

  3. Tune Says:

    Input was too long…

    #!/bin/bash
    mirror=dt.rapidshare.com;
    while read line
    do
    URL=$(wget -q -O - $line | grep “

  4. Tune Says:

    replace µ with the inequality sign to open tags

    #!/bin/bash
    mirror=dt.rapidshare.com;

    while read line
    do
    URL=$(wget -q -O - $line | grep “µform id=\”ff\” action=\”" | grep -o ‘http://[^”]*’);
    output=$(wget -q -O - –post-data “dl.start=Free” “$URL”);
    time=$(echo “$output” | grep “var c=[0-9]*;” | grep -o “[0-9]*”);
    ourfile=$(echo “$output” | grep “document.dlf.action=” | grep -o “http://[^\”]*$mirror[^\\]*”);
    echo “waiting for download of $ourfile”;
    echo “wait $time secs”;
    sleep $time;
    wget $ourfile;

    done

  5. Tune Says:

    after done µ input.txt is needed. µ should be replaced.
    Sorry for all this posts, but I didn’t know how to enter this inequality signs.
    Probably you can publish a final version of the script with all needed signs.
    Btw, I had to repace your ’ with ‘ to make it work.

  6. jwhatson Says:

    Thanks very much both of you. I will get around to publishing both your scripts properly.

  7. magare Says:

    Brilliant. Just works. Cheers, mate!!

  8. name Says:

    Good day!,

  9. nava Says:

    great blog,

    can anybody release the final version of the script?
    Also can you tell how to use this script for megaupload downloads?

    thanks
    kind regards
    navaladi

  10. skyliner Says:

    There is version of the script with the changes proposed, but I had to use cut for the URL extraction.
    It also uses Xdialog (http://xdialog.free.fr/) but you can change it for zenity, dialog o whatever you want, hope it’s useful:

    #!/bin/bash

    # cg.rapidshare.com
    # l34.rapidshare.com
    # tg.rapidshare.com
    # gc2.rapidshare.com
    # dt.rapidshare.com
    # tl2.rapidshare.com
    # l32.rapidshare.com
    # l3.rapidshare.com
    # gc.rapidshare.com
    # l33.rapidshare.com
    # tl.rapidshare.com
    # cg2.rapidshare.com
    mirror=dt.rapidshare.com;

    Xdialog –editbox /dev/null 100 100 &> /tmp/rapid
    echo “” >> /tmp/rapid

    while read line
    do

    URL=$(wget -q -O - $line | grep “<form id=\”ff\” action=\”" | cut -f4 -d”\”");
    output=$(wget -q -O - –post-data “dl.start=Free” “$URL”);
    ourfile=$(echo “$output” | grep “document.dlf.action=” | grep -o “http://[^\”]*$mirror[^\\]*”);

    echo “== $ourfile ==”;
    for i in `seq 90 -1 0`; do printf “%02d\r” $i; sleep 1; done
    printf “\n\n”

    wget $ourfile;

    done < “/tmp/rapid”

  11. sala Says:

    I’d add –user-agent value to wget, with some widely used agent value.. just in case to hide this scripted action.

  12. zde Says:

    mirror=( cg.rapidshare.com l34.rapidshare.com tg.rapidshare.com gc2.rapidshare.com dt.rapidshare.com tl2.rapidshare.com l32.rapidshare.com l3.rapi
    dshare.com gc.rapidshare.com l33.rapidshare.com tl.rapidshare.com cg2.rapidshare.com )

    while read line
    do
    x=$[x+1]
    [ $x -ge ${#mirror[*]} ] && x=0
    URL=$(wget -q -O - $line | grep “

  13. Magare Says:

    Since rapidshare re-introduced the waiting times based on the amount of data downloaded, the script needs corrections.

    I did a brute 1 min fix but I suggest another permanent solution should be done.

    I also added the User Agent bit. Good idea Sala!

    #!/bin/bash

    ################################################
    #Purpose: Automate the downloading of files from rapidshare using the free account
    #using simple unix tools.
    #Date: 14-7-2008
    #Authors: Slith, Tune
    #Improvements, Feedback, comments: Please go to http://emkay.unpointless.com/Blog/?p=63
    #Notes: To use curl instead of wget use ‘curl -s’ and ‘curl -s -d’
    #Version: 1.1
    ################################################

    #IMPORTANT! - PLEASE SHARE ALL IMPOROVEMENTS MADE.

    #Thanks to Tune for getting rid of the curl dependency, extracting the correct wait time and
    #makeing it work with file downloads other than .rar files.

    #TODO: Ignore new lines in input file
    #TODO: Make work concurrently with a list of proxys
    #TODO: If possible resume partially downloaded files using another mirror / move to next mirror
    #if the attempted mirror is down.

    mirror=dt.rapidshare.com;

    ## possible mirrors
    # cg.rapidshare.com
    # l34.rapidshare.com
    # tg.rapidshare.com
    # gc2.rapidshare.com
    # dt.rapidshare.com
    # tl2.rapidshare.com
    # l32.rapidshare.com
    # l3.rapidshare.com
    # gc.rapidshare.com
    # l33.rapidshare.com
    # tl.rapidshare.com
    # cg2.rapidshare.com

    UA=”–user-agent=Mozilla”

    while read line
    do
    URL=$(wget $UA -q -O - $line | grep “

  14. prz Says:

    Hi there, this is my version of this uber script:)

    I’ve added simple support for new download limits, there is also real IE6 User-agent string.

    I run script in other way, like this:

    screen rapsuck dllist.txt

    #!/bin/bash

    ################################################
    #Purpose: Automate the downloading of files from rapidshare using the free account
    #using simple unix tools.
    #Date: 14-7-2008
    #Authors: Slith, Tune
    #Improvements, Feedback, comments: Please go to http://emkay.unpointless.com/Blog/?p=63
    #Notes: To use curl instead of wget use ‘curl -s’ and ‘curl -s -d’
    #Version: 1.1
    ################################################

    #IMPORTANT! - PLEASE SHARE ALL IMPOROVEMENTS MADE.

    #Thanks to Tune for getting rid of the curl dependency, extracting the correct wait time and
    #makeing it work with file downloads other than .rar files.

    #TODO: Ignore new lines in input file
    #TODO: Make work concurrently with a list of proxys
    #TODO: If possible resume partially downloaded files using another mirror / move to next mirror
    #if the attempted mirror is down.

    mirror=gc.rapidshare.com;

    ## possible mirrors
    # cg.rapidshare.com
    # l34.rapidshare.com
    # tg.rapidshare.com
    # gc2.rapidshare.com
    # dt.rapidshare.com
    # tl2.rapidshare.com
    # l32.rapidshare.com
    # l3.rapidshare.com
    # gc.rapidshare.com
    # l33.rapidshare.com
    # tl.rapidshare.com
    # cg2.rapidshare.com

    UA=”–user-agent=Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)”

    while read line
    do
    URL=$(wget “$UA” -q -O - $line | grep “

  15. prz Says:

    pasted again:

    mirror=gc.rapidshare.com;

    UA=”–user-agent=Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)”

    while read line
    do
    URL=$(wget “$UA” -q -O - $line | grep “

  16. Tom Says:

    @Magare
    Please, repost your script. WP ate big part of it :(

  17. prz Says:

    hi again, the source is here: http://przemq.of19.net/rapsuck
    :)

  18. Ken Says:

    You own my respect and gratitude. ,

  19. zde Says:

    mirror=( cg.rapidshare.com l34.rapidshare.com tg.rapidshare.com gc2.rapidshare.com dt.rapidshare.com tl2.rapidshare.com l32.rapidshare.com l3.rapidshare.com gc.rapidshare.com l33.rapidshare.com tl.rapidshare.com cg2.rapidshare.com )

    while read line
    do

    x=$[x+1]
    [ $x -ge ${#mirror[*]} ] && x=0

    and change $mirror to this ${mirror[$x]}


    for multiple mirrors

  20. GaylordFucker Says:

    that python script doesn’t work. not sure why or if dependencies changed.

    damn thing won’t even open a simple webpage; fails here:

    # open first page
    br.open(line)

    stepped into Mechanize a bit, but it gets hairy pretty quickly. Of course there is no exception message to report.

  21. luis Says:

    The new script for:
    “New Version (1/11/2008): Rapidshare has added a wait time in between file downloads. On top of your download
    to start. This has been fixed.”

    In NOT working.

  22. Script Bash para Donwload do R@pidSh@re (conta free) « Tuga Linux Says:

    […] O script original foi escrito pelo Slith Tune e disponibilizado aqui, mas eu alterei este script e adicionei-lhe duas melhorias: […]

  23. Itay Says:

    I’m attaching my own version of the script, based on version 1.1.

    Features:
    - Support for the different waits (with a countdown timer!)
    - Handles various failure conditions
    - Removes file from input list if download was successful (leaves only failed downloads on the list)
    - You can update the input list while the script is running
    - Unpacks rar files (including splitted rar archives: .partN.rar). Requires that the ‘unrar’ utility be in your $PATH.
    - If you put a file named pwdict in your target directory (where input.txt is), it will be used as a white-space separated list of rar passwords to try when unpacking.
    - Uses the default download mirror (worked best for me.)

    Actually the unpacking part is a separate script which is started as a bg process whenever a file is done downloading.

    Download

    Enjoy! :-)

  24. kalvin Says:

    @Itay:

    kalvin@bombadil:~/temp/rapid$ ./downloadFromRS.sh
    ./downloadFromRS.sh: line 32: unexpected EOF while looking for matching `”‘
    ./downloadFromRS.sh: line 57: syntax error: unexpected end of file

    eh?

  25. knox Says:

    @Itay: Yours script caused problems on my pc (egrep -o “[0-9]*” ), but changing * to + solved them :)
    very useful script, thx :D

  26. no working Says:

    No wait time, likely to be the first file you have downloaded in a while
    Waiting secs for download of
    missing operand
    Try `sleep –help’ for more information.
    wget: missing URL
    Usage: wget [OPTION]… [URL]…

    Try `wget –help’ for more options.
    No wait time, likely to be the first file you have downloaded in a while
    Waiting secs for download of
    missing operand
    Try `sleep –help’ for more information.
    wget: missing URL
    Usage: wget [OPTION]… [URL]…

    Try `wget –help’ for more options.

  27. no working Says:

    It’s because of “Currently a lot of users are downloading files. Please try again in 2 minutes or become a ” message, if this happens then your list just goes to end and exits.

  28. no working Says:

    here is full $output
    http://paste2.org/p/135815

  29. AlienMind Says:

    Script good starting point, but has flaw:
    lines like
    time=$(echo “$output” | grep “var…
    should be rewritten like
    time=$(echo -e “$output” | tr -s \\r \\n| grep “var…
    as otherwise grep has to work with one huge line instead of many (and thus will match worse).

  30. Jan Says:

    Thanks !!! great tool

  31. Yuri Says:

    Some people expected problems with longtime detection during sequential downloading of many links.

    Replacing longtime=$(echo “$output” | egrep “Or try again in about.*minutes” | egrep -o “[0-9]*”) with longtime=$(echo “$output” | egrep -o “Or try again in about .* minutes” | grep -o “[0-9]\{1,3\}”) works perfectly.

    Thank you very much for this useful script.

  32. Z Says:

    Great script, I added some minor things like fake http user agent (wget pretending to be Microsoft Internet Explorer 8.0), separate downloads directory, etc.

    Please add premium acct support (user & pass), I thought it was easy, but:
    wget –http-user=linux –http-passwd=rocks (etc) is not working for me…

    And… what about a quick rapid upload3r script? I think it could be done with curl.

  33. jean jameson Says:

    Thanks for sharing.You wouldnt say how some websites are helpful source of informations like this one.Thanks
    John

  34. Karl-Peter Huestegge Says:

    Really, very useful script - thanks a lot!

    My enhancements:
    1. Allow optional filename as argument
    2. Ignore empty and comment lines in input.txt
    3. Pause downloading without canceling
    4. Reconnect Router to get new dynamic IP

    # Allow optional filename, take input.txt if omitted
    INPUT=${1:-input.txt}
    … while read line; do
    case $line in
    “”|\#*) ;; # ignore empty and comment lines
    *)

    # directory “pause” in working dir pauses the download until it’s deleted
    while [ -e “pause” ] ; do
    echo “.\c”
    sleep 10
    done

    # Router-reconnect (This is Router-specific)
    # see database http://www.paehl.de/reconnect for examples
    sh ~/bin/router-reconnect.sh

    getOutputFromFreeUserSubmit …. (Rest of known script)

    esac
    done

  35. Karl-Peter Huestegge Says:

    The last line of my previous post has been cut,
    should have been:

    done

  36. Karl-Peter Huestegge Says:

    once again:

    done (less than) $INPUT

  37. UTL Says:

    The script is very intersting, but it needs one more feature to suit my needs:
    change ip for connections wich have dynamic ip (to skip te 90 minutes wait between downloads)
    i know it’s a different thing for every router, so everybody should write it for his router, but would be nice an option like “-reset_connections” with wich the scripts runs an external script (the script 4 your router) and then checks if the ip changes…

    if i have time i’ll try to write something for my router (with openwrt)

  38. Libor Says:

    I have downloaded the source of downloadFromRS.sh and I had to modify the egrep parameter from this:
    serverbusy=$(echo “$output” | egrep “Currently a lot of users are downloading files. Please try again in.*minutes” | grep -o “[0-9]*”)

    to this:

    serverbusy=$(echo “$output” | egrep “Currently a lot of users are downloading files\.[[:space:]]+Please try again in.*minutes” | grep -o “[0-9]*”)

    The space before “Pleas try…” is doubled so it was not detected.

    Anyway, tahnx for this great script!

  39. dog Says:

    how to continue an interrupted download?

    a 200M download is not going to complete on the first try if one has a slow connection or one that may break every so often.

    the ‘Content-Disposition:’ header causes “save as” behaviour in the client. your task is to overcome this.

    you want the server to honour a ‘Range:’ header sent by the client? and it appears the servers will not honour the ‘Range:’ header.

    you will always be forced to start from offset 0 bytes every time your download gets interrupted.

  40. Jae Badget Says:

    Hey, i came across your url on digg and i think it’s great!

  41. EdLost Says:

    Hi Guys!
    This script has been just great!!
    But r.share has changed, and this script does not work.
    Any idea about a new script?

  42. Vlad Says:

    Please, update the script. It seems Rapid change somthg, I can’t download any file. Thnx

  43. bruno Says:

    i think the new rapidshare broke it.

  44. Tim Says:

    Hi!

    The new rapidshare page seems to have broken this script. Any ideas how to fix it?

    Thanks!

Leave a Reply