Best way to download a website?

Medulseur · Dec 31, 2021

Aidan said:
I went ahead and added a snippet to make pdfs as it iterates through the comics as well since I assume it's preferred due to convenience. It won't run unless you have convert installed. The Bash on the 1st page and this page are up to date.
To install it on Ubuntu run sudo apt install imagemagick

Naming convention is not good but you can rename them or ask for help scripting that later. Let me know if you run into any issues.

Okay so I ran into another snag. I think I fixed the first issue I had with virtualbox but now I am getting
Not in a hypervisor partition (HVP=0) (VERR_NEM_NOT_AVAILABLE).
But from what I have read people get this error on Dell systems but my PC is custom built.
I think I am probably just going to dig for that old laptop at this point. I'd rather not waste the effort you made into making that script for me.

Aidan · Dec 31, 2021

Medulseur said:
Okay so I ran into another snag. I think I fixed the first issue I had with virtualbox but now I am getting
Not in a hypervisor partition (HVP=0) (VERR_NEM_NOT_AVAILABLE).
But from what I have read people get this error on Dell systems but my PC is custom built.
I think I am probably just going to dig for that old laptop at this point. I'd rather not waste the effort you made into making that script for me.

No idea, though maybe this can help

You need to address the message; you need to go into your BIOS and enable hardware virtualization. Make sure that you power-down the computer after that and pull the power plug for at least 10 seconds.

I'm not sure what difference this would make as I've never had a BIOS change not take but maybe the guy is onto something, it apparently solved the OP of that thread's problem.

Don't feel compelled to use the script, though I do think just making sure you can run VMs is kind of important. I don't know if it'd make a difference but maybe vmware wouldn't have these issues.

Aidan · Dec 31, 2021

@Medulseur Since you're probably tired of messing around with stuff just to get your comics how you want them, I've set up an OnionShare link you can use to download my run of the script. Don't expect it to be very fast but you can let it run in the background until it's done and let me know when you have it. The computer hosting it should be up as long as necessary.

I didn't get any of the comics aside from what appeared to be the main ones but if you want others and they're handled similarly then I can get those and share them separately, just let me know.

Copy/paste this onion link into the Tor Browser and you can download from there by clicking the button on the top right.

http://e4gfgh5uvt2ygdhmlkgyrnn3r73o5uefzkmby6djpxgtdq5nbnmcodyd.onion

If that seems too iffy let me know, there are non-tor alternatives.

Happy New Year!

Medulseur · Dec 31, 2021

Aidan said:
Happy New Year!

Thanks! I'll get the whole VM stuff figured out eventually but I appreciate your help and for setting up the files on onionshare. Downloading them now (going to take about five hours according to the downloader) and I will let you know when it finishes. Once again I really appreciate all the help you have given me with this. Some people will be like "Why didn't you just read them on the website itself since they are free?" and I don't know I just feel better about "having" them, so to speak. The internet is a constantly changing thing and I am worried that many of the more obscure sites I like will up and vanish one day and there will be no archives or anyone who really cares. Happy New Year to you also!

@Aidan Edit: Okay it finished downloading. Thanks for including both separate images and PDFs by the way, very thoughtful. I'll eventually get the offshoot comics but I am happy right now with the main series. I'll try fixing virtual box again to download them later on.

Aidan · Jan 1, 2022

Medulseur said:
Thanks! I'll get the whole VM stuff figured out eventually but I appreciate your help and for setting up the files on onionshare. Downloading them now (going to take about five hours according to the downloader) and I will let you know when it finishes. Once again I really appreciate all the help you have given me with this. Some people will be like "Why didn't you just read them on the website itself since they are free?" and I don't know I just feel better about "having" them, so to speak. The internet is a constantly changing thing and I am worried that many of the more obscure sites I like will up and vanish one day and there will be no archives or anyone who really cares. Happy New Year to you also!

@Aidan Edit: Okay it finished downloading. Thanks for including both separate images and PDFs by the way, very thoughtful. I'll eventually get the offshoot comics but I am happy right now with the main series. I'll try fixing virtual box again to download them later on.

No prob, I just realized the pdf solution didn't work as well as intended, particularly on WOLFQUEST and used img2pdf instead of convert to fix it which seems to have worked much better. If you want the updated pdfs shared via onionshare let me know but otherwise here's the updated script using img2pdf instead of convert (laziest way to implement it)

Bash:

#!/usr/bin/env bash

#root url
parent_url="https://elfquest.com/read/"

#Arrays for each comic on the webpage. Each index in each array relates to each other array
#lazy associative arrays
quests=("ELFQUEST" "SIEGE_AT_BLUE_MOUNTAIN" "KINGS_OF_THE_BROKEN_WHEEL" "WOLFRIDER" "DREAMTIME" "HIDDEN_YEARS" "SHARDS" "SEARCHER_AND_THE_SWORD" "THE_DISCOVERY")
#quests=( "WOLFRIDER" "DREAMTIME" "HIDDEN_YEARS" "SHARDS" "SEARCHER_AND_THE_SWORD" "THE_DISCOVERY")

#Number of issues for each comic
num_issues=( 21 8 9 1 1 29 16 1 1 )
#num_issues=(  1 1 29 16 1 1 )
directories=( "OQ" "SABM" "KOBW" "WR" "DTC" "HY" "SH" "SAS" "DISC" )
#directories=(  "WR" "DTC" "HY" "SH" "SAS" "DISC" )
filenames=( "oq" "sabm" "kobw" "awr" "dtc" "hy" "sh" "sas" "disc" )
#filenames=(  "awr" "dtc" "hy" "sh" "sas" "disc" )

#Lazy bools for oddball quests/comics
#Initially added to facilitate HIDDEN_YEARS which has 30 issues due to an issue #9.5 
#Using an array in case this comes up again
#Using C-style 1 = true and 0 = false
#booleans=( 0 )
#bool_index=0

declare -A booleans
booleans["HY"]=0


#used for troubleshooting to help indicate arrays worked as intended
for i in ${!quests[@]}
do
    echo Quest: ${quests[$i]} Issues: ${num_issues[$i]}
done



#Go through each quest/comic and then go through each issue of the comic
for i in ${!quests[@]}
do

    echo ${quests[$i]}
    curr_issue=1

    #Go through each issue of the current comic/quest
    while [ "$curr_issue" -le "${num_issues[$i]}" ]
    do
        #Start on page 0 which appears to always be cover page
        curr_page=0
        #echo "$curr_issue "

        #If there's only one issue then there is no subdirectory
        #Subdirectories are for each issue only
        if [ ${num_issues[$i]} -eq 1 ]
        then
#            url="$parent_url${directories[$i]}-"
            url="$parent_url${directories[$i]}/${filenames[$i]}-"
        else
            #Check if the current issue is < 10 and prepend with a 0 if so
            if [ $curr_issue -lt 10 ] 
            then
                url="$parent_url${directories[$i]}/${directories[$i]}0${curr_issue}/${filenames[$i]}0${curr_issue}-"
            else
                url="$parent_url${directories[$i]}/${directories[$i]}${curr_issue}/${filenames[$i]}${curr_issue}-"
            fi
        fi

        #Specific check for the HIDDEN_YEARS comic edition 9.5
        #NOTE - Bash does not handle floating point values (eg 9.5) and I'm not using awk or anything to work around it

        #If the current issue is 9.5, the current comic is HIDDEN YEARS and the HY bool has NOT been set (so remains 0)
        #Set URL accordingly and curr_isue to 9.5 (a string, not a float)
        if [ $curr_issue -eq 10 -a ${directories[$i]} = "HY" -a ${booleans["HY"]} -eq 0 ]
        then
            url="$parent_url${directories[$i]}/${directories[$i]}09.5/${filenames[$i]}09.5-"
            #Set first boolean index to 1
            booleans["HY"]=1

            curr_issue="9.5"
        fi


        #Wget each image until failure
        #Susceptible to premature failure and so may not finish an issue. A few runs should be fine
        
        #Wget options
        #--no-clobber           Don't overwrite existing files
        #--continue             Continue partial downloads
        #--verbose              Verbose output
        #--timeout=5            Wait 5 seconds before moving on from timeout
        #--directory-prefix     Directory to store files in. Example - ELFQUEST/issue_1/

        while wget --no-clobber --continue --verbose --timeout=5 --directory-prefix="${quests[$i]}/issue_$curr_issue/" "$url${curr_page}.jpg"
        do
            #increment to next page
            ((curr_page+=1))
            #sleep for 2-9 seconds (excessive but safe in my experience)
            sleep $((1 + $RANDOM % 3))
        done
        


        #NOTE - This will run on every run for better or worse. Comment it out if undesired
        #Convert each issue into a pdf use ImageMagick's convert tool
        #Check if convert is installed before attemping to make a pdf
        #which convert > /dev/null

        #Check if img2pdf is installed before attemping to make pdf
        which img2pdf > /dev/null
        if [ $? -eq 0 ]
        then
            #If appropriate pdfs directory does not exist, create it
            if [ ! -d ${quests[$i]}/pdfs ]
            then
                mkdir ${quests[$i]}/pdfs
            fi

            #If there are multiple issues then apply the issue number to the filename
            #Else just use the comic name
            if [ ${num_issues[$i]} -gt 1 ]
            then
                #convert $(ls ${quests[$i]}/issue_$curr_issue/*jpg | sort -n -t - -k 2) ${quests[$i]}/pdfs/${quests[$i]}_$curr_issue.pdf
                img2pdf $(ls ${quests[$i]}/issue_$curr_issue/*jpg | sort -n -t - -k 2) -o ${quests[$i]}/pdfs/${quests[$i]}_$curr_issue.pdf
            else
                #convert $(ls ${quests[$i]}/issue_$curr_issue/*jpg | sort -n -t - -k 2) ${quests[$i]}/pdfs/${quests[$i]}.pdf
                img2pdf $(ls ${quests[$i]}/issue_$curr_issue/*jpg | sort -n -t - -k 2) -o ${quests[$i]}/pdfs/${quests[$i]}.pdf
            fi
        fi


        #If the current issue is 9.5, the current comic is HIDDEN YEARS and the HY bool has been set to 1
        #Then set curr_issue=10
        #Else increment by 1
        if [ $curr_issue = "9.5" -a ${directories[$i]} = "HY" -a ${booleans["HY"]} -eq 1 ]
        then
            curr_issue=10
        else
            #increment to next issue of current comic
            ((curr_issue+=1))
        fi

    done
done

reckless69 · Jan 26, 2025

If you don't need to get the whole website and just need a specific pages:
SingleFileZ (Chrome / Firefox)

It saves an identical compressed version of the page with all of the images and such predownloaded. Essentially a pdf of a webpage but without breaking things. Instead of a directory full of html and assets, the page is saved as a single file.

el_4aba7 · Jan 27, 2025

Wanted to add WebScrapBook addon to this thread... Has many options and can save single html files too.

Screenshot_20250127-224438_Kiwi Browser.jpg

Screenshot_20250127-224421_Kiwi Browser.jpg

Best way to download a website?

Medulseur

SUPPERTIME

Aidan

Aidan

Medulseur

SUPPERTIME

Aidan

reckless69

el_4aba7