Best way to download a website?

  • 🐕 I am attempting to get the site runnning as fast as possible. If you are experiencing slow page load times, please report it.
Just trying to figure out how to install Wget is holding me up a little. But I am sure I could figure it out if given enough time.
Well that's a bash script so won't work natively on Windows. I think WSL uses bash by default so you could use that. Any linux vm would work just as well. If you're new to VMs then just use a usb to transfer to your actual computer, otherwise set up filesharing your preferred way.

I'll also recommend redoing it in your language of choice.

edit: I realize my script screws up on one of the comics due to an issue #9.5. I will have to go back and account for that. Just an FYI, I'll update the script in a new post and in the old one when I do.
 
Last edited:
Well that's a bash script so won't work natively on Windows. I think WSL uses bash by default so you could use that. Any linux vm would work just as well. If you're new to VMs then just use a usb to transfer to your actual computer, otherwise set up filesharing your preferred way.

I'll also recommend redoing it in your language of choice.

edit: I realize my script screws up on one of the comics due to an issue #9.5. I will have to go back and account for that. Just an FYI, I'll update the script in a new post and in the old one when I do.
Oh no, virtual machines? This is starting to encompass all of the things I have wanted to get into for the past few years but have never got around to doing. lol
 
Oh no, virtual machines? This is starting to encompass all of the things I have wanted to get into for the past few years but have never got around to doing. lol
It can if you want it to. That script just won't work on Windows natively is all.
 
It can if you want it to. That script just won't work on Windows natively is all.
As complicated as it seems I do like the results better. HTTrack is easier, sure, but I hate copying an entire website just to get some jpegs.
 
It's not really that complicated, the script is the most complicated part.
>get some generic linux distro iso
>download vmware or virtualbox
>imagine not downloading virtualbox to support open source software
>create virtual machine
>install linux on virtual machine
>copypaste script into text file
>rename text file to script.sh
>be leet hacker, type bash script.sh in terminal
>wait for download
>run again for good measure
>plug usb into computer
>virtual machine ask if you want it to go to host or vm
>click vm
>copy files to usb
 
It's not really that complicated, the script is the most complicated part.
>get some generic linux distro iso
>download vmware or virtualbox
>imagine not downloading virtualbox to support open source software
>create virtual machine
>install linux on virtual machine
>copypaste script into text file
>rename text file to script.sh
>be leet hacker, type bash script.sh in terminal
>wait for download
>run again for good measure
>plug usb into computer
>virtual machine ask if you want it to go to host or vm
>click vm
>copy files to usb
Thanks for the step by step! Going to give it a try.
When I go to download virtual box it says 6.0 and below supports software virtualization but the newer versions don't. Does that mean I want 6.0 instead of 6.1.3?
 
Just get 6.1.3, you're not using software virtualization.
 
Any particular distro I should get or should I just go with Ubuntu?
Doesn't matter for this but if you plan to tinker aside from this then any distro you want to explore is fine. Ubuntu and its derivatives will work well.

@Medulseur @ ing to ping you again in case you don't refresh. I found a typo in my script so let me know before you intend to download if I haven't posted the revision.
 
Doesn't matter for this but if you plan to tinker aside from this then any distro you want to explore is fine. Ubuntu and its derivatives will work well.

@Medulseur @ ing to ping you again in case you don't refresh. I found a typo in my script so let me know before you intend to download if I haven't posted the revision.
Yeah I haven't copied it yet because I saw you mention a typo. I'll wait until you are able to fix it.
 
Yeah I haven't copied it yet because I saw you mention a typo. I'll wait until you are able to fix it.
Haven't verified it works 100% but I think it's fine now.

Fixed typo where double-digit issues were still preceded by a 0.
Added a shitty fix to handle the 9.5 issue.
Bash:
        #!/usr/bin/env bash

#root url
parent_url="https://elfquest.com/read/"

#Arrays for each comic on the webpage. Each index in each array relates to each other array
#lazy associative arrays
quests=("ELFQUEST" "SIEGE_AT_BLUE_MOUNTAIN" "KINGS_OF_THE_BROKEN_WHEEL" "WOLFRIDER" "DREAMTIME" "HIDDEN_YEARS" "SHARDS" "SEARCHER_AND_THE_SWORD" "THE_DISCOVERY")
#quests=( "WOLFRIDER" "DREAMTIME" "HIDDEN_YEARS" "SHARDS" "SEARCHER_AND_THE_SWORD" "THE_DISCOVERY")

#Number of issues for each comic
num_issues=( 21 8 9 1 1 29 16 1 1 )
#num_issues=(  1 1 29 16 1 1 )
directories=( "OQ" "SABM" "KOBW" "WR" "DTC" "HY" "SH" "SAS" "DISC" )
#directories=(  "WR" "DTC" "HY" "SH" "SAS" "DISC" )
filenames=( "oq" "sabm" "kobw" "awr" "dtc" "hy" "sh" "sas" "disc" )
#filenames=(  "awr" "dtc" "hy" "sh" "sas" "disc" )

#Lazy bools for oddball quests/comics
#Initially added to facilitate HIDDEN_YEARS which has 30 issues due to an issue #9.5
#Using an array in case this comes up again
#Using C-style 1 = true and 0 = false
#booleans=( 0 )
#bool_index=0

declare -A booleans
booleans["HY"]=0


#used for troubleshooting to help indicate arrays worked as intended
for i in ${!quests[@]}
do
echo Quest: ${quests[$i]} Issues: ${num_issues[$i]}
done



#Go through each quest/comic and then go through each issue of the comic
for i in ${!quests[@]}
do

echo ${quests[$i]}
curr_issue=1

#Go through each issue of the current comic/quest
while [ "$curr_issue" -le "${num_issues[$i]}" ]
do
#Start on page 0 which appears to always be cover page
curr_page=0
#echo "$curr_issue "

#If there's only one issue then there is no subdirectory
#Subdirectories are for each issue only
if [ ${num_issues[$i]} -eq 1 ]
then
#            url="$parent_url${directories[$i]}-"
url="$parent_url${directories[$i]}/${filenames[$i]}-"
else
#Check if the current issue is < 10 and prepend with a 0 if so
if [ $curr_issue -lt 10 ]
then
url="$parent_url${directories[$i]}/${directories[$i]}0${curr_issue}/${filenames[$i]}0${curr_issue}-"
else
url="$parent_url${directories[$i]}/${directories[$i]}${curr_issue}/${filenames[$i]}${curr_issue}-"
fi
fi

#Specific check for the HIDDEN_YEARS comic edition 9.5
#NOTE - Bash does not handle floating point values (eg 9.5) and I'm not using awk or anything to work around it

#If the current issue is 9.5, the current comic is HIDDEN YEARS and the HY bool has NOT been set (so remains 0)
#Set URL accordingly and curr_isue to 9.5 (a string, not a float)
if [ $curr_issue -eq 10 -a ${directories[$i]} = "HY" -a ${booleans["HY"]} -eq 0 ]
then
url="$parent_url${directories[$i]}/${directories[$i]}09.5/${filenames[$i]}09.5-"
#Set first boolean index to 1
booleans["HY"]=1

curr_issue="9.5"
fi


#Wget each image until failure
#Susceptible to premature failure and so may not finish an issue. A few runs should be fine
       
#Wget options
#--no-clobber           Don't overwrite existing files
#--continue             Continue partial downloads
#--verbose              Verbose output
#--timeout=5            Wait 5 seconds before moving on from timeout
#--directory-prefix     Directory to store files in. Example - ELFQUEST/issue_1/

while wget --no-clobber --continue --verbose --timeout=5 --directory-prefix="${quests[$i]}/issue_$curr_issue/" "$url${curr_page}.jpg"
do
#increment to next page
((curr_page+=1))
#sleep for 2-9 seconds (excessive but safe in my experience)
sleep $((1 + $RANDOM % 3))
done
       


#NOTE - This will run on every run for better or worse. Comment it out if undesired
#Convert each issue into a pdf use ImageMagick's convert tool
#Check if convert is installed before attemping to make a pdf
which convert > /dev/null
if [ $? -eq 0 ]
then
#If appropriate pdfs directory does not exist, create it
if [ ! -d ${quests[$i]}/pdfs ]
then
mkdir ${quests[$i]}/pdfs
fi

if [ ${num_issues[$i]} -gt 1 ]
then
convert $(ls ${quests[$i]}/issue_$curr_issue/*jpg | sort -n -t - -k 2) ${quests[$i]}/pdfs/${quests[$i]}_$curr_issue.pdf
else
convert $(ls ${quests[$i]}/issue_$curr_issue/*jpg | sort -n -t - -k 2) ${quests[$i]}/pdfs/${quests[$i]}.pdf
fi
fi


#If the current issue is 9.5, the current comic is HIDDEN YEARS and the HY bool has been set to 1
#Then set curr_issue=10
#Else increment by 1
if [ $curr_issue = "9.5" -a ${directories[$i]} = "HY" -a ${booleans["HY"]} -eq 1 ]
then
curr_issue=10
else
#increment to next issue of current comic
((curr_issue+=1))
fi

done
done
 
Last edited:
Haven't verified it works 100% but I think it's fine now.
Thanks friend! I'll get this VM set up and try out your solution and report back to you with my results. Once again, thanks for taking the time to actually program something like this.
 
  • Like
Reactions: Aidan
Thanks friend! I'll get this VM set up and try out your solution and report back to you with my results. Once again, thanks for taking the time to actually program something like this.
Sure thing. I always encourage people store copies of things they care about that are online and enjoy helping achieve that when I can. People with local copies often end up as curators of things they care about and this is a small segment of the internet that is extraordinarily important and underrated until it affects you.

My fix isn't actually a great fix but I'm working on that right now and will once again update the script when it's done in another post as well as the original post with the script. Practically speaking, this means with the current version of the script it will download the HIDDEN YEARS comic issue 9.5 into the issue_9 directory. If this has already happened then you can just hop into that directory in the terminal and do the following.
Bash:
#Execute each command one at a time, this isn't meant to be a script

#Make sure you're in the issue_9 directory with some hy09.5 files
ls hy09.5*

#That should show all the relevant files in the directory. If so, you can continue
#Make a directory for those files
mkdir ../issue_9.5

#move the files into that newly made directory
mv -v hy09.5* ../issue_9.5/

#make sure they're in there
ls ../issue9.5/

#Make sure none remain in the issue_9 directory as a sanity check
ls hy09.5*

edit - I realized you're probably not remotely familiar with the terminal just now and so to get open a terminal.
Go to where you ran the script from. If you don't know then it's probably the home directory so after opening the terminal type ls
to see if the directories are there. If they are then do
Bash:
cd HIDDEN_YEARS/issue_9
 
Last edited:
Updated script

Added a check for HY issue 9.5 and changed boolean array to be associative
Added pdf collation for each issue stored in a directory called "pdfs"

Bash:
#!/usr/bin/env bash

#root url
parent_url="https://elfquest.com/read/"

#Arrays for each comic on the webpage. Each index in each array relates to each other array
#lazy associative arrays
quests=("ELFQUEST" "SIEGE_AT_BLUE_MOUNTAIN" "KINGS_OF_THE_BROKEN_WHEEL" "WOLFRIDER" "DREAMTIME" "HIDDEN_YEARS" "SHARDS" "SEARCHER_AND_THE_SWORD" "THE_DISCOVERY")
#quests=( "WOLFRIDER" "DREAMTIME" "HIDDEN_YEARS" "SHARDS" "SEARCHER_AND_THE_SWORD" "THE_DISCOVERY")

#Number of issues for each comic
num_issues=( 21 8 9 1 1 29 16 1 1 )
#num_issues=(  1 1 29 16 1 1 )
directories=( "OQ" "SABM" "KOBW" "WR" "DTC" "HY" "SH" "SAS" "DISC" )
#directories=(  "WR" "DTC" "HY" "SH" "SAS" "DISC" )
filenames=( "oq" "sabm" "kobw" "awr" "dtc" "hy" "sh" "sas" "disc" )
#filenames=(  "awr" "dtc" "hy" "sh" "sas" "disc" )

#Lazy bools for oddball quests/comics
#Initially added to facilitate HIDDEN_YEARS which has 30 issues due to an issue #9.5
#Using an array in case this comes up again
#Using C-style 1 = true and 0 = false
#booleans=( 0 )
#bool_index=0

declare -A booleans
booleans["HY"]=0


#used for troubleshooting to help indicate arrays worked as intended
for i in ${!quests[@]}
do
    echo Quest: ${quests[$i]} Issues: ${num_issues[$i]}
done



#Go through each quest/comic and then go through each issue of the comic
for i in ${!quests[@]}
do

    echo ${quests[$i]}
    curr_issue=1

    #Go through each issue of the current comic/quest
    while [ "$curr_issue" -le "${num_issues[$i]}" ]
    do
        #Start on page 0 which appears to always be cover page
        curr_page=0
        #echo "$curr_issue "

        #If there's only one issue then there is no subdirectory
        #Subdirectories are for each issue only
        if [ ${num_issues[$i]} -eq 1 ]
        then
#            url="$parent_url${directories[$i]}-"
            url="$parent_url${directories[$i]}/${filenames[$i]}-"
        else
            #Check if the current issue is < 10 and prepend with a 0 if so
            if [ $curr_issue -lt 10 ]
            then
                url="$parent_url${directories[$i]}/${directories[$i]}0${curr_issue}/${filenames[$i]}0${curr_issue}-"
            else
                url="$parent_url${directories[$i]}/${directories[$i]}${curr_issue}/${filenames[$i]}${curr_issue}-"
            fi
        fi

        #Specific check for the HIDDEN_YEARS comic edition 9.5
        #NOTE - Bash does not handle floating point values (eg 9.5) and I'm not using awk or anything to work around it

        #If the current issue is 9.5, the current comic is HIDDEN YEARS and the HY bool has NOT been set (so remains 0)
        #Set URL accordingly and curr_isue to 9.5 (a string, not a float)
        if [ $curr_issue -eq 10 -a ${directories[$i]} = "HY" -a ${booleans["HY"]} -eq 0 ]
        then
            url="$parent_url${directories[$i]}/${directories[$i]}09.5/${filenames[$i]}09.5-"
            #Set first boolean index to 1
            booleans["HY"]=1

            curr_issue="9.5"
        fi


        #Wget each image until failure
        #Susceptible to premature failure and so may not finish an issue. A few runs should be fine
       
        #Wget options
        #--no-clobber           Don't overwrite existing files
        #--continue             Continue partial downloads
        #--verbose              Verbose output
        #--timeout=5            Wait 5 seconds before moving on from timeout
        #--directory-prefix     Directory to store files in. Example - ELFQUEST/issue_1/

        while wget --no-clobber --continue --verbose --timeout=5 --directory-prefix="${quests[$i]}/issue_$curr_issue/" "$url${curr_page}.jpg"
        do
            #increment to next page
            ((curr_page+=1))
            #sleep for 2-9 seconds (excessive but safe in my experience)
            sleep $((1 + $RANDOM % 3))
        done
       


        #NOTE - This will run on every run for better or worse. Comment it out if undesired
        #Convert each issue into a pdf use ImageMagick's convert tool
        #Check if convert is installed before attemping to make a pdf
        which convert > /dev/null
        if [ $? -eq 0 ]
        then
            #If appropriate pdfs directory does not exist, create it
            if [ ! -d ${quests[$i]}/pdfs ]
            then
                mkdir ${quests[$i]}/pdfs
            fi

            if [ ${num_issues[$i]} -gt 1 ]
            then
                convert $(ls ${quests[$i]}/issue_$curr_issue/*jpg | sort -n -t - -k 2) ${quests[$i]}/pdfs/${quests[$i]}_$curr_issue.pdf
            else
                convert $(ls ${quests[$i]}/issue_$curr_issue/*jpg | sort -n -t - -k 2) ${quests[$i]}/pdfs/${quests[$i]}.pdf
            fi
        fi


        #If the current issue is 9.5, the current comic is HIDDEN YEARS and the HY bool has been set to 1
        #Then set curr_issue=10
        #Else increment by 1
        if [ $curr_issue = "9.5" -a ${directories[$i]} = "HY" -a ${booleans["HY"]} -eq 1 ]
        then
            curr_issue=10
        else
            #increment to next issue of current comic
            ((curr_issue+=1))
        fi

    done
done
 
Last edited:
Updated script
I'm having a bit of an issue getting virtualbox to work. I am getting an error called
AMD-V is disabled in the BIOS (or by the host OS) (VERR_SVM_DISABLED).
and from what I have read I need to go into my PC bios to fix this
 
I'm having a bit of an issue getting virtualbox to work. I am getting an error called
AMD-V is disabled in the BIOS (or by the host OS) (VERR_SVM_DISABLED).
and from what I have read I need to go into my PC bios to fix this
I haven't had to do it before but it sounds like what you say, you gotta go into your bios and click an option to enable virtualization.
 
I haven't had to do it before but it sounds like what you say, you gotta go into your bios and click an option to enable virtualization.
This may have to wait until tomorrow. Getting pretty late here and I don't feel like messing with my bios tonight.
Might just skip the VM all together and install ubuntu on my old laptop that was slogged down by Windows 10
 
  • Like
Reactions: Aidan
I went ahead and added a snippet to make pdfs as it iterates through the comics as well since I assume it's preferred due to convenience. It won't run unless you have convert installed. The Bash on the 1st page and this page are up to date.
To install it on Ubuntu run sudo apt install imagemagick

Naming convention is not good but you can rename them or ask for help scripting that later. Let me know if you run into any issues.
 
Yeah I am a bit worried about that because HTTrack is still going at it but it's limited to about 30kb/s
I was under the impression that wget is linux only. Would it work on windows?
This page and all the comics on it are what I am trying to download.
Their forum seems to imply that, if the download speed limit is set to blank, it defaults to the slowest setting of 25 KB/s to prevent abuse of the servers. Try setting it manually to a sane value if you haven't. Just don't go overboard or you will 100% get rate limited.
 
  • Informative
Reactions: Fomo Hoire
Back