Skip to content
Advertisement

Which should be the list of ignored words for the Naive Bayesian Classifier?

I am working with Naive Bayesian classifier over PHP (http://www.xhtml.net/php/PHPNaiveBayesianFilter)

And there’s a list of words which can be ignored while training the system. Those words are not saved into the database and therefore not used for the classification. I would like to improve the system as much as I can so I was wondering if there’s any rule or list of typical words to ignore for this kind of systems.

I am currently ignoring words such as “to”, “and”, “the”, “for”, “since”, “which”, “what”, “who”… and some typical verbs such as “be”, “was”, “were”, “been”…etc.

Advertisement

Answer

You would be dealing with a lot of words …. mostly Adjective and Conjunctions and maybe verbs ….

Its a very long list you need to save as txt or import to your database ….. I suggest you just google and download directly

here are some links

http://www.momswhothink.com/reading/list-of-verbs.html

http://grammar.yourdictionary.com/parts-of-speech/conjunctions/conjunctions.html

http://www.smart-words.org/transition-words.html

http://www.momswhothink.com/reading/list-of-adjectives.html

The more word you have the better your your system works

Thanks 🙂

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement