Previously, we tried to run a weka server to utilize all cores of the processor in classification tasks. But it appears that the weka server works only in explorer for classification routines. For more advanced machine learning, there is a more flexible tool – experimenter. Weka server doesn’t support this area. So what to do if you want more performance or utilize the multi-core processor of the local machine. There is a way out, but it is quite tricky. Weka has the ability to perform remote experiments that allow spreading the load across multiple host machines that have Weka set up. You can read the documentation of remote experiments here, but it may be somewhat confusing. It took time for me to figure out some parts by trial and error.
The trickiest part is to set everything up and prepare the necessary command to be run before performing a remote experiment. So let’s get to it.
Setting up a database server
For the remote experiment, the computer where you are working needs a database, where results from different hosts are stored and combined into the final result. There are two options to work with: HSQLDB and MySQL. This is up to you which to use. I find the HSQLDB being simple Java-based, and faster to set up. So in this example, I’ll be using it. I am running Win10, so the following example is only for this operating system. First of all, download the latest HSQLDB package and extract it somewhere in a temporary directory. Now let us create a working directory where we will be putting all necessary files. I have created the WK directory in D: disk. In the WK directory, I have created a jars directory. So it should look like D:\WK\jars.
From downloaded and extracted HSQLDB, copy hsqldb.jar file to jars directory (located in \hsqldb-2.4.0\hsqldb\lib).
IF you have Java engine installed (comes with WEKA), you can start the HSQLDB server and create the database by executing a command in the CMD window:
java -classpath D:\WK\jars\hsqldb.jar org.hsqldb.Server -database.0 experiment -dbname.0 experiment
If successful, you should see the following result:
We leave the command window like this and proceed to the next step.
Seting up Weka remote engine
First of all, in the D:\WK directory, we create the remote_engine directory like this: D:\WK\remote_engine.
The from weka install directory (usually C:\Program Files\Weka-3-9), we take the remoteExperimentServer.jar file and copy somewhere temporary. Here we extract its contents with an archive program. I use 7-zip. After extraction, you will see three files: remote.policy, remote.policy.example, and remoteEngine.jar. You take remote.policy and remoteEngine.jar files and copy them into the D:\WK\remote_engine directory.
This is where things for me got a bit tricky. I’ve got stuck with command and provided files because java couldn’t parse the remote.policy file correctly. For simplicity, copy the weka.jar file to the D:\WK\jars directory because running remote engine command becomes simpler. The documentation commands refused to run because each class has to be included with the full path to file.
So again, you can try running a bit modified command in the newly opened CMD tool (remember not to touch database running command window):
You most likely will run into an error like this:
I couldn’t find what’s wrong with the file. It appears to be UTF-8 encoded and so on. Browsing the internet didn’t give a positive response. So I decided to recreate the new policy file using the java tool (policytool.exe), which can be found on the Java installation directory. Here you can enter each policy manually and save the file in a proper format. I’m not a Java programmer, so I do things intuitively and not always correctly. I have added all securities one by one to a new file called mm.policy. It also added additional security to grant all permissions because it had a problem with file creation by a remote experiment engine. If you want, you can download my policy file to use in your experiment here: mm_policy.zip
copy it to D:\WK\remote_engine directory
Now in command line enter the command:
java -Xmx256m -cp D:\WK\jars\hsqldb.jar -cp D:\WK\remote_engine\remoteEngine.jar: -cp D:\WK\jars\weka.jar -Djava.security.policy=D:\WK\remote_engine\mm.policy weka.experiment.RemoteEngine
You should see the view like this if successful:
As you can see you are given a hostname “MMM-PC” and port 1099
Now you can try to perform a remote experiment in weka.
Running a remote experiment in weka locally
For this, open Weka Experimenter and go to Advanced mode. Select your database and algorithm as you like.
(click image to enlarge)
Then in Distribute Experiment area select for instance By data set and click Hosts, wherein new popup window you need to enter the host which was created by running remote engine
Of course, your local IP address also may be used instead of hostname.
After the host is selected you can go to the Run tab and click start – a remote experiment is performed:
This is practically it for a simple solution.
There is an option to use multiple cores of the processor by adding -p <port>, but it seems that this is the same as previously because the command already creates a port number for you. I couldn’t get all cores to 100% working condition.
Running a remote experiment on the remote host
The remote experiment’s whole power is that you can distribute the load across hosts in the network. All you have is to start a remote engine of the remote machine that you want to use.
I have tested on my network machine in the local network, and it worked. Anyway, the real power is to use Linux that can be controlled through ssh. The next logical step would be to try to configure multiple host machines to spread the load, but since it requires time and more knowledge to work, I leave this topic open for discussions.