Multinomial naive bayes for text data. Operates directly (and only) on String attributes. Other types of input attributes are accepted but ignored during training and classification
Valid options are:
-W
Use word frequencies instead of binary bag of words.
-P <# instances>
How often to prune the dictionary of low frequency words (default = 0, i.e. don't prune)
-M <double>
Minimum word frequency. Words with less than this frequence are ignored.
If periodic pruning is turned on then this is also used to determine which
words to remove from the dictionary (default = 3).
-normalize
Normalize document length (use in conjunction with -norm and -lnorm)
-norm <num>
Specify the norm that each instance must have (default 1.0)
-lnorm <num>
Specify L-norm to use (default 2.0)
-lowercase
Convert all tokens to lowercase before adding to the dictionary.
-stopwords-handler
The stopwords handler to use (default Null).
-tokenizer <spec>
The tokenizing algorihtm (classname plus parameters) to use.
(default: weka.core.tokenizers.WordTokenizer)
-stemmer <spec>
The stemmering algorihtm (classname plus parameters) to use.
-output-debug-info
If set, classifier is run in debug mode and
may output additional info to the console
-do-not-check-capabilities
If set, classifier capabilities are not checked before classifier is built
(use with caution).