Wikia

How To Wiki

How to mirror, spider, or archive a website

1,794pages on
this wiki
Talk0

mirror, spider, or archive a website


programs Edit

wgetEdit

wget is a command line program for downloading files off the internet but it also has very powerful mirroring capabilities.

  • Advanced
    • wget -m -R *.jpg,*.exe,*.doc,*.gif,*.zip,*search*,*index.cgi* -l 2 http://www.website.com/snork/doodles/wiggles
      this will mirror(-m) the site recursively, not download any files(-R) of these types *.jpg,*.exe,*.doc,*.gif,*.zip,*search*,*index.cgi*, and only mirror files that are 2 levels down(-l) or less, (in this example it will not download anything that is outside of the snork directory)
    • wget -m -c -D www.tuto.com,pdf.tuto.com http://www.tutomax.com/
      • mirror(-m) http://www.tutomax.com/, continue a partially completed mirroring(-c) and download files that are linked to these domains(-D) www.tuto.com,pdf.tuto.com
    • wget -F -i maximDatasheetsR.html
      • download files from links in the file
From HowTo Wiki, a Wikia wiki.