I am trying to write a program to find similarity between two documents, and since im using only english, I decided to use wordnet, but I cannot find a way to link the wordnet with php, I cannot find any wordnet api from php.
I saw in the forum some one said (Spudley) he called wordnet from php (using shell_exec() function), Thesaurus class or API for PHP [edited]
I would really like to know a method used or some example code, a tutorial perhaps to start using the wordnet with php.
many thanks
Advertisement
Answer
The PHP extension which is linked to from the WordNet site is very old and out of date — it claims to work with PHP4, so I don’t think it’s been looked at in years.
There aren’t any other APIs available for WordNet->PHP, so I rolled my own solution.
WordNet can be run from the command-line, so PHP’s shell_exec()
function can read the output.
If you run WordNet from the command-line (cd to Wordnet’s directory, then just wn
) without any parameters, it will show you a list of possible functions that Wordnet supports.
Still in the command-line, if you then try one/some of those functions, you’ll see how Wordnet outputs its results. For example, if you want synonyms for the word ‘star’, you could try the -synsn
function:
wn star -synsn
This will produce output that looks a bit like this:
Synonyms/Hypernyms (Ordered by Estimated Frequency) of noun star
8 senses of star
Sense 1 star => celestial body, heavenly body
Sense 2 ace, adept, champion, sensation, maven, mavin, virtuoso, genius, hotshot, star, superstar, whiz, whizz, wizard, wiz => expert
Sense 3 star => celestial body, heavenly body
Sense 4 star => plane figure, two-dimensional figure
Sense 5 star, principal, lead => actor, histrion, player, thespian, role player
Sense 6 headliner, star => performer, performing artist
Sense 7 asterisk, star => character, grapheme, graphic symbol
Sense 8 star topology, star => topology, network topology
In PHP, you can read this same output using the shell_exec()
function.
$result = shell_exec('/path/to/wn '.$word.' -synsn');
Now $result
should contain the block of text quoted above.
At this point, you have to do some proper coding. You’ll need to take that block of text and parse it for the data you want.
This is where it gets tricky. Because the data is presented in a format designed to be read by a human rather than by a program, it is tricky to parse accurately.
It is important to note that different search options present their output slightly differently. And, some of the results that are returned can be somewhat esoteric. I ended up writing a weighting system to score the results, but it was fairly specific to my needs, so you’ll need to experiment with it to come up with your own system.
I hope that’s enough help for you. 🙂