Fandom

How To Wiki

How to download all image files in a Wikimedia Commons page or directory

1,795pages on
this wiki
Add New Page
Talk2 Share

Ad blocker interference detected!


Wikia is a free-to-use site that makes money from advertising. We have a modified experience for viewers using ad blockers

Wikia is not accessible if you’ve made further modifications. Remove the custom ad blocker rule(s) and the page will load as expected.

Wikimedia Commons is a great resource for free/open images, and sometimes you may want to download all the images in one of their directories or pages. Wiki Commons doen't offer a simple way to do this. This howto shows a method to do that.


Command line methodEdit

This method uses some commands common to Unix based operating systems. If you are using Windows try installing Cygwin. This will allow you to use these commands inside Windows.


RequirementsEdit

these are some standard Unix commands, and are likely installed if you are using Linux, or any other Unix like OS

StepsEdit

In this example we will get all the images on this page: http://commons.wikimedia.org/wiki/Crystal_Clear. It will grab the original, full quality images, not the lower quality thumbnails shown on the page.


Get the webpages for each image file


The images we are interested in are linked in the commons.wikimedia.org/wiki/File:* files, so we need to extract the image links from these HTML files.


Extract the Image links
  • Command: WIKI_LINKS=`grep fullImageLink commons.wikimedia.org/wiki/File\:* | sed 's/^.*><a href="//'| sed 's/".*$//'`
  • Description: This creates a list of image links, in variable $WIKI_LINKS


Download the Images
  • Command: wget -nc -w 1 -e robots=off -P downloaded_wiki_images $WIKI_LINKS
  • Description: This will download all the images into a folder called 'downloaded_wiki_images'


Delete all temp files
  • Command: rm -rf commons.wikimedia.org
  • Description: deletes all the HTML pages used to get links


Note 1: If you are trying to get all the images in a category that has more than 200 images, you will have to run the commands on each category page. ie 0-200, 200-400, 400-600, etc
Note 2: This method works as of Jan 2010, but as time passes Wiki Commons' page formats may change, and this method may stop working. If so view the source of one of the image pages, find the images URL and see what has changed. The commands should be easy to modify to get it to work.


ScriptEdit

If you want to do all the steps described above in one command create this script.

#!/bin/bash


WIKI_URL=$1

if [ "$WIKI_URL" == '' ]; then
	echo "The first argument is the main webpage"
	echo
	exit 1
fi

# Download Image pages
echo "Downloading Image Pages"
wget -r -l 1 -e robots=off -w 1 -nc $WIKI_URL

# Extract Image Links
echo "Extracting Image Links"
WIKI_LINKS=`grep fullImageLink commons.wikimedia.org/wiki/File\:* | sed 's/^.*a href="//'| sed 's/".*$//'`

echo "Downloading Images"
wget -nc -w 1 -e robots=off -P downloaded_wiki_images $WIKI_LINKS


echo "Cleaning up temp files"
rm -rf commons.wikimedia.org/
echo "Done"

exit


Alternative (works only for categories) Edit

Getting the filenames via WikiSense and the API, works only for categories but should be faster: Just run it with the wished category as argument.

#!/bin/bash
# Get all Images in Category (and 5 subcategories)
wget "http://toolserver.org/~daniel/WikiSense/CategoryIntersect.php?wikifam=commons.wikimedia.org&basecat=${1}&basedeep=5&mode=iul&go=Scannen&format=csv" -O list
# Read the list file after file
while read line; do
 name=$(echo $line | tr ' ' "\t" | cut -f2) # Extract filename
 api="http://commons.wikimedia.org/w/api.php?action=query&titles=File:${name}&prop=imageinfo&iiprop=url"
 url=$(curl "${api}&format=txt" 2>/dev/null | grep "\[url\]" | tr -d \  |cut -d\> -f2) # Get the URL of the File via API
 echo $name
 echo $api
 echo $url
 wget $url # Download File
done < list
rm list # Clean up

Also on Fandom

Random Wiki