.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_gallery/ml_supervized_nonlinear.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_gallery_ml_supervized_nonlinear.py: Non-linear models ================= Here we focuse on non-linear models for classification. Nevertheless, each classification model has its regression counterpart. .. GENERATED FROM PYTHON SOURCE LINES 8-27 .. code-block:: default # get_ipython().run_line_magic('matplotlib', 'inline') import matplotlib.pyplot as plt import numpy as np import pandas as pd import seaborn as sns import matplotlib.pyplot as plt from sklearn.svm import SVC from sklearn.preprocessing import StandardScaler from sklearn import datasets from sklearn import metrics from sklearn.model_selection import train_test_split np.set_printoptions(precision=2) pd.set_option('precision', 2) .. GENERATED FROM PYTHON SOURCE LINES 28-78 Support Vector Machines (SVM) ----------------------------- SVM are based kernel methods require only a user-specified kernel function :math:`K(x_i, x_j)`, i.e., a **similarity function** over pairs of data points :math:`(x_i, x_j)` into kernel (dual) space on which learning algorithms operate linearly, i.e. every operation on points is a linear combination of :math:`K(x_i, x_j)`. Outline of the SVM algorithm: 1. Map points :math:`x` into kernel space using a kernel function: :math:`x \rightarrow K(x, .)`. 2. Learning algorithms operates linearly by dot product into high-kernel space :math:`K(., x_i) \cdot K(., x_j)`. - Using the kernel trick (Mercer’s Theorem) replaces dot product in high dimensional space by a simpler operation such that :math:`K(., x_i) \cdot K(., x_j) = K(x_i, x_j)`. Thus we only need to compute a similarity measure for each pairs of point and store in a :math:`N \times N` Gram matrix. - Finally, The learning process consist of estimating the $\alpha_i$ of the decision function that maximises the hinge loss (of :math:`f(x)`) plus some penalty when applied on all training points. .. math:: f(x) = \text{sign} \left(\sum_i^N \alpha_i~y_i~K(x_i, x)\right). 3. Predict a new point $x$ using the decision function. .. figure:: ../images/svm_rbf_kernel_mapping_and_decision_function.png :alt: Support Vector Machines. Gaussian kernel (RBF, Radial Basis Function): One of the most commonly used kernel is the Radial Basis Function (RBF) Kernel. For a pair of points :math:`x_i, x_j` the RBF kernel is defined as: .. raw:: latex \begin{align} K(x_i, x_j) &= \exp\left(-\frac{\|x_i - x_j\|^2}{2\sigma^2}\right)\\ &= \exp\left(-\gamma~\|x_i - x_j\|^2\right) \end{align} Where :math:`\sigma` (or :math:`\gamma`) defines the kernel width parameter. Basically, we consider a Gaussian function centered on each training sample :math:`x_i`. it has a ready interpretation as a similarity measure as it decreases with squared Euclidean distance between the two feature vectors. Non linear SVM also exists for regression problems. .. GENERATED FROM PYTHON SOURCE LINES 81-82 dataset .. GENERATED FROM PYTHON SOURCE LINES 82-87 .. code-block:: default X, y = datasets.load_breast_cancer(return_X_y=True) X_train, X_test, y_train, y_test = \ train_test_split(X, y, test_size=0.5, stratify=y, random_state=42) .. GENERATED FROM PYTHON SOURCE LINES 88-89 Preprocessing: unequal variance of input features, requires scaling for svm. .. GENERATED FROM PYTHON SOURCE LINES 89-98 .. code-block:: default ax = sns.displot(x=X_train.std(axis=0), kind="kde", bw_adjust=.2, cut=0, fill=True, height=3, aspect=1.5,) _ = ax.set_xlabels("Std-dev").tight_layout() scaler = StandardScaler() X_train = scaler.fit_transform(X_train) X_test = scaler.fit_transform(X_test) .. image:: /auto_gallery/images/sphx_glr_ml_supervized_nonlinear_001.png :alt: ml supervized nonlinear :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 99-101 Fit-predict Probalility is a logistic of the decision_function .. GENERATED FROM PYTHON SOURCE LINES 101-110 .. code-block:: default svm = SVC(kernel='rbf', probability=True).fit(X_train, y_train) y_pred = svm.predict(X_test) y_score = svm.decision_function(X_test) y_prob = svm.predict_proba(X_test)[:, 1] ax = sns.relplot(x=y_score, y=y_prob, hue=y_pred, height=2, aspect=1.5) _ = ax.set_axis_labels("decision function", "Probability").tight_layout() .. image:: /auto_gallery/images/sphx_glr_ml_supervized_nonlinear_002.png :alt: ml supervized nonlinear :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 111-121 .. code-block:: default print("bAcc: %.2f, AUC: %.2f (AUC with proba: %.2f)" % ( metrics.balanced_accuracy_score(y_true=y_test, y_pred=y_pred), metrics.roc_auc_score(y_true=y_test, y_score=y_score), metrics.roc_auc_score(y_true=y_test, y_score=y_prob))) # Usefull internals: indices of support vectors within original X np.all(X_train[svm.support_, :] == svm.support_vectors_) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none bAcc: 0.97, AUC: 0.99 (AUC with proba: 0.99) True .. GENERATED FROM PYTHON SOURCE LINES 122-154 Random forest ------------- Decision tree ~~~~~~~~~~~~~ A tree can be "learned" by splitting the training dataset into subsets based on an features value test. Each internal node represents a "test" on an feature resulting on the split of the current sample. At each step the algorithm selects the feature and a cutoff value that maximises a given metric. Different metrics exist for regression tree (target is continuous) or classification tree (the target is qualitative). This process is repeated on each derived subset in a recursive manner called recursive partitioning. The recursion is completed when the subset at a node has all the same value of the target variable, or when splitting no longer adds value to the predictions. This general principle is implemented by many recursive partitioning tree algorithms. .. figure:: ../images/classification_tree.png :width: 400 :alt: Classification tree. Decision trees are simple to understand and interpret however they tend to overfit the data. However decision trees tend to overfit the training set. Leo Breiman propose random forest to deal with this issue. A single decision tree is usually overfits the data it is learning from because it learn from only one pathway of decisions. Predictions from a single decision tree usually don’t make accurate predictions on new data. Forest ~~~~~~ A random forest is a meta estimator that fits a number of **decision tree learners** on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting. Random forest models reduce the risk of overfitting by introducing randomness by: .. figure:: ../images/random_forest.png :width: 300 :alt: Random forest. - building multiple trees (n_estimators) - drawing observations with replacement (i.e., a bootstrapped sample) - splitting nodes on the best split among a random subset of the features selected at every node .. GENERATED FROM PYTHON SOURCE LINES 154-164 .. code-block:: default from sklearn.ensemble import RandomForestClassifier forest = RandomForestClassifier(n_estimators = 100) forest.fit(X_train, y_train) y_pred = forest.predict(X_test) y_prob = forest.predict_proba(X_test)[:, 1] .. GENERATED FROM PYTHON SOURCE LINES 165-170 .. code-block:: default print("bAcc: %.2f, AUC: %.2f " % ( metrics.balanced_accuracy_score(y_true=y_test, y_pred=y_pred), metrics.roc_auc_score(y_true=y_test, y_score=y_prob))) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none bAcc: 0.93, AUC: 0.98 .. GENERATED FROM PYTHON SOURCE LINES 171-178 Extra Trees (Low Variance) Extra Trees is like Random Forest, in that it builds multiple trees and splits nodes using random subsets of features, but with two key differences: it does not bootstrap observations (meaning it samples without replacement), and nodes are split on random splits, not best splits. So, in summary, ExtraTrees: builds multiple trees with bootstrap = False by default, which means it samples without replacement nodes are split based on random splits among a random subset of the features selected at every node In Extra Trees, randomness doesn’t come from bootstrapping of data, but rather comes from the random splits of all observations. ExtraTrees is named for (Extremely Randomized Trees). .. GENERATED FROM PYTHON SOURCE LINES 181-200 Gradient boosting ----------------- Gradient boosting is a meta estimator that fits a sequence of **weak learners**. Each learner aims to reduce the residuals (errors) produced by the previous learner. The two main hyper-parameters are: - The **learning rate** (*lr*) controls over-fitting: decreasing the *lr* limits the capacity of a learner to overfit the residuals, ie, it slows down the learning speed and thus increases the **regularisation**. - The **sub-sampling fraction** controls the fraction of samples to be used for fitting the learners. Values smaller than 1 leads to **Stochastic Gradient Boosting**. It thus controls for over-fitting reducing variance and incresing bias. .. figure:: ../images/gradient_boosting.png :width: 500 :alt: Gradient boosting. .. GENERATED FROM PYTHON SOURCE LINES 200-214 .. code-block:: default from sklearn.ensemble import GradientBoostingClassifier gb = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, subsample=0.5, random_state=0) gb.fit(X_train, y_train) y_pred = gb.predict(X_test) y_prob = gb.predict_proba(X_test)[:, 1] print("bAcc: %.2f, AUC: %.2f " % ( metrics.balanced_accuracy_score(y_true=y_test, y_pred=y_pred), metrics.roc_auc_score(y_true=y_test, y_score=y_prob))) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none bAcc: 0.94, AUC: 0.98 .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.769 seconds) .. _sphx_glr_download_auto_gallery_ml_supervized_nonlinear.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: ml_supervized_nonlinear.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: ml_supervized_nonlinear.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_