Skip to content
Advertisement

Extract compressed text from MediaWiki database with PHP

A client of ours would like to have all the contents from a wiki site they ran for a while. They provided us the complete database of the ‘mediawiki’ software. We are trying to extract the articles from the ‘text’ table with php, without using the MediaWiki engine.

MediaWiki seems to zip the contents before putting it as a BLOB in the database. We can’t find a way to extract it without the engine. I looked at the source code, but can’t recreate how they extract the BLOB’s.

Any suggestions how solve this?

Advertisement

Answer

From Text table:

old_flags 

Comma-separated list of flags. Contains the following possible values:

┌──────────┬──────────────────────────────────────────────────────────────────┐
│ gzip     │ Text is compressed with PHP's gzdeflate() function.              │
│          │ Note: If the $wgCompressRevisions option is on, new rows         │
│          │ (=current revisions) will be gzipped transparently at save time. │
│          │ Previous revisions can also be compressed by using the script    │
│          │ compressOld.php                                                  │
├──────────┼──────────────────────────────────────────────────────────────────┤
│ utf-8    │ Text was stored as UTF-8.                                        │
│          │ Note: If the $wgLegacyEncoding option is on, rows *without* this │
│          │ flag will be converted to UTF-8 transparently at load time.      │
├──────────┼──────────────────────────────────────────────────────────────────┤
│ object   │ Text field contained a serialized PHP object.                    │
│          │ Note: The object either contains multiple versions compressed    │
│          │ together to achieve a better compression ratio, or it refers to  │
│          │ another row where the text can be found.                         │
├──────────┼──────────────────────────────────────────────────────────────────┤
│ external │ Text was stored in an external location specified by old_text    │
└──────────┴──────────────────────────────────────────────────────────────────┘
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement