A client of ours would like to have all the contents from a wiki site they ran for a while. They provided us the complete database of the ‘mediawiki’ software. We are trying to extract the articles from the ‘text’ table with php, without using the MediaWiki engine.
MediaWiki seems to zip the contents before putting it as a BLOB in the database. We can’t find a way to extract it without the engine. I looked at the source code, but can’t recreate how they extract the BLOB’s.
Any suggestions how solve this?
Advertisement
Answer
From Text table:
old_flags
Comma-separated list of flags. Contains the following possible values:
┌──────────┬──────────────────────────────────────────────────────────────────┐ │ gzip │ Text is compressed with PHP's gzdeflate() function. │ │ │ Note: If the $wgCompressRevisions option is on, new rows │ │ │ (=current revisions) will be gzipped transparently at save time. │ │ │ Previous revisions can also be compressed by using the script │ │ │ compressOld.php │ ├──────────┼──────────────────────────────────────────────────────────────────┤ │ utf-8 │ Text was stored as UTF-8. │ │ │ Note: If the $wgLegacyEncoding option is on, rows *without* this │ │ │ flag will be converted to UTF-8 transparently at load time. │ ├──────────┼──────────────────────────────────────────────────────────────────┤ │ object │ Text field contained a serialized PHP object. │ │ │ Note: The object either contains multiple versions compressed │ │ │ together to achieve a better compression ratio, or it refers to │ │ │ another row where the text can be found. │ ├──────────┼──────────────────────────────────────────────────────────────────┤ │ external │ Text was stored in an external location specified by old_text │ └──────────┴──────────────────────────────────────────────────────────────────┘