Previously we have tried to run weka server to utilize all cores of the processor in classification tasks. But it appears that weka server works only in explorer for classification routines. For more advanced machine learning there is a more flexible tool – experimenter. Weka server doesn’s support this area. So what to do if you want more performance or utilize multi-core processor of the local machine. There is a way out, but it is quite tricky. Weka has the ability to perform remote experiments that allow spreading the load across multiple host machines that have Weka set up. You can read the documentation of remote experiment on Weka wikispaces, but in some cases, it may be somewhat confusing. It took time for me to figure out some parts by trial and error.
The trickiest part is to set everything up and prepare the necessary command to be run before performing a remote experiment. So let’s get to it.
Setting up a database server
For the remote experiment, the computer where you are working needs a database, where results from different hosts are stored and combined into the final result. There are two options to work with: HSQLDB and MySQL. This is up to you which to use. I find the HSQLDB being simple Java-based and faster to set up. So in this example, I’ll be using it. I am running Win10, so the following example is only for this operating system. First of all, download the latest HSQLDB package and extract it somewhere in a temporary directory. Now let us create a working directory where we will be putting all necessary files. I have created WK directory in D: disk. In WK directory I have created jars directory. So it should look like D:\WK\jars.
Now from downloaded and extracted HSQLDB copy hsqldb.jar file to jars directory (located in \hsqldb-2.4.0\hsqldb\lib).
IF you have Java engine installed (comes with WEKA), you can start HSQLDB server and create the database by executing a command in CMD window:
java -classpath D:\WK\jars\hsqldb.jar org.hsqldb.Server -database.0 experiment -dbname.0 experiment
If successful, you should see the following result:
We leave the command window like this and proceed to the next step.
Seting up Weka remote engine
First of all in D:\WK directory we create the remote_engine directory to be like this: D:\WK\remote_engine.
The from weka install directory (usually C:\Program Files\Weka-3-9) we take remoteExperimentServer.jar file and copy somewhere temporary. Here we simply extract its contents with an archive program. I use 7-zip. After extraction, you will see three files: remote.policy, remote.policy.example, and remoteEngine.jar. You take remote.policy and remoteEngine.jar files and copy them into D:\WK\remote_engine directory.
This is where things for me got a bit tricky. I’ve got stuck with command and provided files because java couldn’ parse the remote.policy file correctly. For the sake of simplicity also copy the weka.jar file to D:\WK\jars directory, because running remote engine command becomes simpler. The commands in documentation refused to run because each class has to be included with the full path to file.
So again you can try running a bit modified command in newly opened CMD tool (remember not to touch database running command window):
You most likely will run into an error like this:
Couldn’t find what’s wrong with the file. It appears to be UTF-8 encoded and so on. Browsing the internet didn’t give a positive response. So I decided to recreate the new policy file by using the java tool (policytool.exe) which can be found on the Java installation directory. Here you can enter each policy manually and save the file in proper format. I’m not a Java programmer, so I do things intuitively, and not always in the correct way. I have added all securities one by one to a new file called mm.policy. And also added additional security to grant all permissions, because had a problem with file creation by remote experiment engine. If you want, you can download my policy file to use in your experiment here: mm_policy.zip
copy it to D:\WK\remote_engine directory
Now in command line enter the command:
java -Xmx256m -cp D:\WK\jars\hsqldb.jar -cp D:\WK\remote_engine\remoteEngine.jar: -cp D:\WK\jars\weka.jar -Djava.security.policy=D:\WK\remote_engine\mm.policy weka.experiment.RemoteEngine
You should see the view like this if successful:
As you can see you are given a hostname “MMM-PC” and port 1099
Now you can try to perform a remote experiment in weka.
Running remote experiment in weka locally
For this open Weka Experimenter and go to Advanced mode. Select your database and algorithm as you like.
(click image to enlarge)
Then in Distribute Experiment area select for instance By data set and click Hosts, wherein new popup window you need to enter the host which was created by running remote engine
Of course, your local IP address also may be used instead of hostname.
After the host is selected you can go to the Run tab and click start – a remote experiment is performed:
This is practically it for a simple solution.
There is an option to use multiple cores of the processor by adding -p <port> but it seems that this is the same as previously because command already creates a port number for you. I couldn’t get all cores to 100% working condition.
Running remote experiment on the remote host
The whole power of the remote experiment is that you can distribute the load across hosts in the network. All you have is to start a remote engine of remote machine that you want to use.
I have tested on my network machine in the local network, and it worked. Anyway, the real power is to use Linux that can be controlled through ssh. The next logical step would be to try to configure multiple host machines to spread the load, but since it requires time and more knowledge to work, I leave this topic open for discussions.