Fandom

How To Wiki

How to mirror, spider, or archive a website

1,795pages on
this wiki
Add New Page
Talk0 Share

Ad blocker interference detected!


Wikia is a free-to-use site that makes money from advertising. We have a modified experience for viewers using ad blockers

Wikia is not accessible if you’ve made further modifications. Remove the custom ad blocker rule(s) and the page will load as expected.


mirror, spider, or archive a website


programs Edit

wgetEdit

wget is a command line program for downloading files off the internet but it also has very powerful mirroring capabilities.

  • Advanced
    • wget -m -R *.jpg,*.exe,*.doc,*.gif,*.zip,*search*,*index.cgi* -l 2 http://www.website.com/snork/doodles/wiggles
      this will mirror(-m) the site recursively, not download any files(-R) of these types *.jpg,*.exe,*.doc,*.gif,*.zip,*search*,*index.cgi*, and only mirror files that are 2 levels down(-l) or less, (in this example it will not download anything that is outside of the snork directory)
    • wget -m -c -D www.tuto.com,pdf.tuto.com http://www.tutomax.com/
      • mirror(-m) http://www.tutomax.com/, continue a partially completed mirroring(-c) and download files that are linked to these domains(-D) www.tuto.com,pdf.tuto.com
    • wget -F -i maximDatasheetsR.html
      • download files from links in the file
From HowTo Wiki, a Wikia wiki.