Previously, we tried to run a weka server to utilize all cores of the processor in classification tasks. But it appears that the weka server works only in explorer for classification routines. For more advanced machine learning, there is a more flexible tool – experimenter. Weka server doesn’t support this area. So what to do if you want more performance or utilize the multi-core processor of the local machine. There is a way out, but it is quite tricky. Weka has the ability to perform remote experiments that allow spreading the load across multiple host machines that have Weka set up. You can read the documentation of remote experiments here, but it may be somewhat confusing. It took time for me to figure out some parts by trial and error. The trickiest part is to set everything up and prepare the necessary command to be run before performing a remote experiment. So let’s get to it.
Currently, WEKA is one of the most favorites machine learning tools. Without programming skills, you can do severe classification, clustering, and extensive data analysis. For some time, I’ve been using its standard GUI features without thinking much about performance bottlenecks. But since research are becoming more complex by using ensemble, voting, and other meta-algorithms that generally are based on multiple classifiers running simultaneously, the performance issues start becoming annoying. You need to wait for hours until the task is completed. The problem is that when running classification algorithms from the WEKA GUI, they utilize a single core of your processor. Such algorithms as Multi-layer Perceptron running 10-fold cross-validation is calculating one cross-fold at the time on one core, taking a long time to accomplish: So I started looking for options to make it use all cores of the processor as separate threads for each operation fold. There are a couple of options available to do so. One is to use WekaServer package, and another is remote host processing. This time we will focus on WekaServer solution. The idea is to start a WEKA server as a distributed execution environment. When starting the server, you can indicate how many cores you…
Logistic regression is the next step from linear regression. The most real-life data have a non-linear relationship; thus, applying linear models might be ineffective. Logistic regression is capable of handling non-linear effects in prediction tasks. You can think of many different scenarios where logistic regression could be applied. There can be financial, demographic, health, weather, and other data where the model could be implemented and used to predict subsequent events on future data. For instance, you can classify emails into spam and non-spam, transactions being a fraud or not, and tumors being malignant or benign. In order to understand logistic regression, let’s cover some basics, do a simple classification on data set with two features, and then test it on real-life data with multiple features.
This is a follow-up post from previous where we were calculating Naive Bayes prediction on the given data set. This time I want to demonstrate how all this can be implemented using the WEKA application. I highly recommend visiting their website and getting the latest release. WEKA is a compelling machine learning software written in Java. It is a widely-used and highly regarded machine learning software that offers a range of powerful data mining and modeling tools. It provides a user-friendly interface, making it accessible to both experienced and novice users. Weka offers a wide range of algorithms and data pre-processing techniques, making it a flexible and robust tool for various machine learning applications, such as classification, clustering, and association rule mining. You can find plenty of tutorials on youtube on how to get started with WEKA. So I won’t get into details. I’m sure you’ll be able to follow anyway.