mirror, spider, or archive a website

programs Edit


wget is a command line program for downloading files off the internet but it also has very powerful mirroring capabilities.

  • Advanced
    • wget -m -R *.jpg,*.exe,*.doc,*.gif,*.zip,*search*,*index.cgi* -l 2
      this will mirror(-m) the site recursively, not download any files(-R) of these types *.jpg,*.exe,*.doc,*.gif,*.zip,*search*,*index.cgi*, and only mirror files that are 2 levels down(-l) or less, (in this example it will not download anything that is outside of the snork directory)
    • wget -m -c -D,
      • mirror(-m), continue a partially completed mirroring(-c) and download files that are linked to these domains(-D),
    • wget -F -i maximDatasheetsR.html
      • download files from links in the specified input file (-i) that you treat as an HTML file, enables you to download from links in said HTML file (-F)

The documentation for wget is available here:

From HowTo Wiki, a Wikia wiki.