How Is Python Different Than Any Other Programming Languages?

Among many of the programming languages available today, Python is one such option that is high in demand by now. The increasing demand for the language is because it is entirely different than many other programming languages available in the market. One such difference or rather benefit is that Python is quite easy to understand and also quite simple to learning too. Do you want to learn Python from scratch? Then Intellipaat ‘Python Course’ is for you. Typically, the selection of a proper programming language depends on many factors such as training, availability, cost, emotional attachment, and also prior investments. But these factors are also variable, and hence there are other factors also on which the selection of the right programming language highly depends. Some of the programming languages that are available in the market now are Java, PHP, C++, Perl, Ruby, Javascript, TCL, Smalltalk, and many others. Python is very much different than these languages in many ways. Companies That Use The Programming Languages Today,  many companies across that world are using programming languages such as Ruby, Python, and PHP. Some of the companies that are making use of PHP are Udemy, Facebook, NASA, Wikipedia, and Google. These companies…

Continue reading

Scale up Big Data mining with Hadoop machine learning tools

Each human or organization doing various activities are constantly generating massive amounts of data. In real life when you visit supermarkets, doctors, institutions, log into a bank account, visit webpages, spend time in social networks, buy online – you leave a footprint of data. The data of your past carries lots of interesting information about your human habits, interests, behavior patterns. If we scale up to organizations, where every process and decision play a significant role in business success, data becomes a valuable asset. Collected and properly mined historical data may help to make critical decisions for the future, optimize the structure and even see the trends where the business is heading. Hadoop machine learning tools Big Data is everywhere and, so storing analyzing it becomes a challenge. No human can handle and effectively analyze vast amounts of data. This is where machine learning and distributed storage comes in handy. Hadoop machine learning is an excellent concept of dealing with large amounts of data. The Apache-based Hadoop platform is based on open source tools and utilities that use a network of lots of computers to store and process large amounts of data more efficiently.   Hadoop machine learning is joined…

Continue reading

Feature extraction from retina vascular images for classification

Classifying medical images is a tedious and complex task. Using machine learning algorithms to assist the process could be a huge help. There are many challenges to make machine learning algorithms work reliably on image data. First of all, you need a rather large image database with ground truth information (expert’s labeled data with diagnosis information). The second problem is preprocessing images, including merging modalities, unifying color maps, normalizing and filtering. This part is important and may impact the last part – feature extraction. This step is crucial because on how well you can extract informative features, depends on how well machine learning algorithms will work. Dataset To demonstrate the classification procedure of medical images the ophthalmology STARE (STructured Analysis of the Retina) image database was pulled from https://cecas.clemson.edu/~ahoover/stare/. The database consists of 400 images with 13 diagnostic cases along with preprocessed images. For classification problem we have chosen only vessels images and only two classes: Normal and Choroidal Neovascularization (CNV). So the number of images was reduced to 99 where 25 were used as test data and 74 as training.

Continue reading

Running remote host Weka experiments

Previously we have tried to run weka server to utilize all cores of the processor in classification tasks. But it appears that weka server works only in explorer for classification routines. For more advanced machine learning there is a more flexible tool – experimenter. Weka server doesn’s support this area. So what to do if you want more performance or utilize multi-core processor of the local machine. There is a way out, but it is quite tricky. Weka has the ability to perform remote experiments that allow spreading the load across multiple host machines that have Weka set up. You can read the documentation of remote experiment on Weka wikispaces, but in some cases, it may be somewhat confusing. It took time for me to figure out some parts by trial and error. The trickiest part is to set everything up and prepare the necessary command to be run before performing a remote experiment. So let’s get to it.

Continue reading

Utilizing multi-core processor for classification in WEKA

Currently, WEKA is one of the most favorites machine learning tools. Without programming skills, you can do serious classification, clustering, and big data analysis. For some time I’ve been using its standard GUI features without thinking much about performance bottlenecks. But since researches are becoming more complex by using ensemble, voting and other meta-algorithms that generally are based on multiple classifiers running simultaneously, the performance issues start becoming annoying. You need to wait for hours until the task is completed. The problem is that when running classification algorithms from the WEKA GUI, the utilize a single core of your processor. Such algorithms as Multi-layer Percepron running 10 fold cross-validation is calculating one cross fold at the time on one core taking a long time to accomplish: So I started looking for options to make it use all cores of the processor as separate threads for each fold of operation. There are a couple of options available to do so. One is to use WekaServer package, and another is remote host processing. This time we will focus on WekaServer solution. The idea is to start a WEKA server as a distributed execution environment. When starting the server, you can indicate how…

Continue reading

Regularized Logistic regression

Previously we have tried logistic regression without regularization and with simple training data set. Bu as we all know, things in real life aren’t as simple as we would want to. There are many types of data available the need to be classified. Number of features can grow up hundreds and thousands while number of instances may be limited. Also in many times we might need to classify in more than two classes. The first problem that might rise due to large number of features is over-fitting. This is when learned hypothesis hΘ (x) fit training data too well (cost J(Θ) ≈ 0), but it fails when classifying new data samples. In other words, model tries to distinct each training example correctly by drawing very complicated decision boundary between training data points. As you can see in image above, over-fitting would be green decision boundary. So how to deal with over-fitting problem? There might be several approaches: Reducing number of features; Selecting different classification model; Use regularization. We leave first two out of question, because selecting optimal number of features is a different topic of optimization. Also we are sticking with logistic regression model for now, so changing classifier is…

Continue reading

Implementing logistic regression learner with python

Logistic regression is the next step from linear regression. The most real-life data have a non-linear relationship, thus applying linear models might be ineffective. Logistic regression is capable of handling non-linear effects in prediction tasks. You can think of lots of different scenarios where logistic regression could be applied. There can be financial, demographic, health, weather and other data where the model could be implemented and used to predict next events on future data. For instance, you can classify emails into spam and non-spam, transactions being fraud or not, tumors being malignant or benign. In order to understand logistic regression, let’s cover some basics, do a simple classification on data set with two features and then test it on real-life data with multiple features.

Continue reading