This is a followup post from previous where we were calculating Naive Bayes prediction on the given data set. This time I want to demonstrate how all this can be implemented using WEKA application.
For those who don’t know what WEKA is I highly recommend visiting their website and getting the latest release. It is a compelling machine learning software written in Java. You can find plenty of tutorials on youtube on how to get started with WEKA. So I won’t get into details. I’m sure you’ll be able to follow anyway.
Preparing data for classification
We are going to use the same data set as in the previous example with weather features temperature and humidity and class yes/no for playing golf.
Data is stored in arff file format specific for WEKA software and looks like this:
Here we can see the attribute denominators: temperature, humidity, and play followed by the data table. Using this data set, we are going to train the Naive Bayes model and then apply this model to new data with temperature cool and humidity high to see to which class it will be assigned.
First of all in WEKA explorer Preprocess tab we need to open our ARFF data file:
Here we can see the basic statistics of attributes. If you click the Edit button, the new Viewer window with the data table will be loaded.
In the viewer, you can edit data as you like and then you can always save new data set with Save button in explorer. We will do so when we create a test set with cool and high parameter values. For this we delete all lines of data except first one and edit values to looks like this:
Select nothing on play attribute because we don’t know it yet.
Click OK and then Save data as a separate file. The file should look like this:
The question “?” mark is a standard way of representing missing value in WEKA.
Building a Naive Bayes model
Now that we have data prepared we can proceed on building model. Load full weather data set again in explorer and then go to Classify tab.
Here you need to press Choose Classifier button, and from the tree menu select NaiveBayes. Be sure that Play attribute is selected as a class selector and then press the Start button to build a model.
Model outputs some information on how accurate it classifies and other parameters.
Correctly Classified Instances 9 64.2857 %
Incorrectly Classified Instances 5 35.7143 %
You can see that on given data set the accuracy of the classifier is about 64%. So keep in mind that you shouldn’t always take the results as granted. To get better results you might want to try different classifiers or preprocess data even further. We won’t get into this right now. We need to demonstrate the usage of the model on new upcoming data.
Evaluating classifier with the test set
Now when we have a model we need to load the test data we’ve created before. For this select Supplied test set and click button Set.
Click More Options wherein new window choose PlainText from Output predictions as follows:
Then click left mouse button on a recently created model on result list and select Re-evaluate model on the current test set.
And you should see the prediction for your given data cool and hot like this:
As you can see it has been predicted as yes with error 53.1%. In the previous analytical example, we’ve got 50% error on prediction.