>> Noel O'Boyle

Export mediawiki markup

Let's say you want to backup your MediaWiki wiki. You could (and should) export the underlying SQL database where everything is stored. For various reasons, I would prefer to export the wiki markup itself. This can be done using MediaWiki:Special Pages:Export Pages, using the list of pages on MediaWiki:Special Pages:All Pages. The following script automates this process.

Please test the script below before relying on it for backing up. I'm not sure whether edit history is maintained - this isn't crucial for my wiki. I don't use different mainspaces in my wiki, and I'm not sure of the effect this will have. Also, MediaWiki versions differ, so this parser may not work for yours.

Dependencies: BeautifulSoup (I used 3.0.3) - this is a fantastic parser for anyone who ever needs to do HTML scraping.

import urllib
from BeautifulSoup import BeautifulSoup

wiki = "http://MYPROJECT.sourceforge.net/wiki"

allpages = urllib.urlopen(wiki + "/index.php/Special:Allpages")

soup = BeautifulSoup(allpages)

hrefs = []
for anchor in soup('table')[2]('a'):
    hrefs.append(anchor['href'].split("/")[-1])
print hrefs

params = urllib.urlencode({"action":"submit",
                           "pages":"\n".join(hrefs)})

query = urllib.urlopen(wiki + "/index.php/Special:Export", params)
outputfile = open("backup.xml","w")
print >> outputfile, query.read()
outputfile.close()