Advanced Machine Learning¶

The correction is available on this website : https://curiousml.github.io/

PART 1 : Introduction to Python, Numpy, etc.¶

This part is an exploration of the popular packages in python : NumPy and Matplotlib

1.1. NumPy¶

import numpy as np

Q1.

The chessboard can be generated as follows :

A = np.zeros((8,8))
A[1::2, ::2] = 1
A[::2, 1::2] = 1
print(A)

[[0. 1. 0. 1. 0. 1. 0. 1.]
 [1. 0. 1. 0. 1. 0. 1. 0.]
 [0. 1. 0. 1. 0. 1. 0. 1.]
 [1. 0. 1. 0. 1. 0. 1. 0.]
 [0. 1. 0. 1. 0. 1. 0. 1.]
 [1. 0. 1. 0. 1. 0. 1. 0.]
 [0. 1. 0. 1. 0. 1. 0. 1.]
 [1. 0. 1. 0. 1. 0. 1. 0.]]

Q2.

The generation of the matrix can be coded as :

# we choose the Fortran-like order and not the C-like order
M = np.arange(1, 21).reshape((4, 5), order='F')
print(M)

[[ 1  5  9 13 17]
 [ 2  6 10 14 18]
 [ 3  7 11 15 19]
 [ 4  8 12 16 20]]

The code for the extraction is the following :

M[:,[1, -1, 2]][[1, 2, 0],:]

array([[ 6, 18, 10],
       [ 7, 19, 11],
       [ 5, 17,  9]])

1.2. Matplotlib¶

No question here, but you can run each line of code of the TP in order to become familiar with the package.

import matplotlib.pyplot as plt

2. A problem of regression in dimension 1¶

2.1. Generating data¶

Q3. let us generate 15 real numbers drawn uniformly on $[-1, 1]$. These numbers define our array x

import numpy as np

a = -1
b = 1
m = 15

x = a + (b - a) * np.random.random_sample(m)
print(x)

[-0.39390902  0.89747269  0.90545527  0.50289634 -0.40856984  0.56908429
  0.15090244 -0.77490045  0.15230838  0.88295381 -0.12795669  0.00553306
 -0.49691985 -0.20912397 -0.11364076]

Q4. Let us define the array y defined by the tp

def g(x):
    return((3/2)*x**3 - x**2 -(3/4)*x + 1)

y = g(x) + (1/20)*np.random.randn(m)

Q5. We are going to visualize in the plan the data created

plt.plot(x,y,'bo')
plt.show()

small remark : if we only observe this graph we can have an intuition of what family of predictors that corresponds. But (unfortunately ?) in reality, we rarely have a single descriptive feature.

Q6. With the same process, generate a sample test of size $m' = 30$ named $x_{test}$ and $y_{test}$

m_test = 30
x_test = 2*np.random.random_sample(m_test)-1
y_test = g(x_test) + .05*np.random.randn(m_test)

Linear regression¶

We define a predictor $f$ of linear regression to leas squares.

from sklearn.linear_model import LinearRegression
f = LinearRegression()

The predictor needs a two dimensional descriptive data $X$ as input :

X = x[:, np.newaxis]
X_test = x_test[:, np.newaxis]

Train our model :

f.fit(X, y)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

Q7. Let's visualise the performance :

xplot = np.linspace(-1,1,500).reshape(-1,1)
plt.plot(x,y,'bo')
plt.plot(xplot,f.predict(xplot))
plt.show()

small remark 1 : we can certainly do better...

small remark 2 : we can better evaluate the performance than visualy. Let's quantify the performance !

Q8. We have the mean error of train and the mean error of test :

print("for train set :", sum((y-f.predict(X))**2)/m)
print("for test set :", sum((y_test-f.predict(X_test))**2)/m_test)

for train set : 0.04387573763782355
for test set : 0.2721722099874578

We remark that the predictor is much more accurate on the training set than the test set. It means that the predictor overfit on the training set.

polynomial regression¶

Q9. We are going to use a new predictor and see his performance. This time we are going to use a polynomial regression we degree 2.

## a
from sklearn.preprocessing import PolynomialFeatures
psi = PolynomialFeatures(2,include_bias=False).fit_transform

f = LinearRegression().fit(psi(X),y)

## b
plt.plot(x,y,'bo')
plt.plot(xplot,f.predict(psi(xplot)))
plt.show()

## c
print("for train set :", sum((y-f.predict(psi(X)))**2)/m)
print("for test set :", sum((y_test-f.predict(psi(X_test)))**2)/m_test)

for train set : 0.025401588576239295
for test set : 0.13776741822876987

We observe a boost in performance compared to the linear regression and we seem to overfit (on the data) less than linear regression.

Q10. Let us generalize the computation to any maximum degree of polynomials $n\geq 1$

## a
def reg_poly(n):
    psi = PolynomialFeatures(n,include_bias=False).fit_transform
    return LinearRegression().fit(psi(X),y), psi

## b
plt.plot(x,y,'bo')
poly = {} 
psi = {}
for n in [3,4,13,14]:
    poly[n], psi[n] = reg_poly(n)
    plt.plot(xplot,poly[n].predict(psi[n](xplot)))
plt.axis([-1, 1, -1.5, 1.5])
plt.legend(['data','n = 3', 'n = 4', 'n = 13', 'n = 14'])
plt.show()


## c
print("intercept for n = 3", poly[3].intercept_)
print()
print("coefficients for n = 3", poly[3].coef_)

intercept for n = 3 0.969005648536144

coefficients for n = 3 [-0.7438228  -0.89003531  1.41314286]

For $n=3$, these coefficients seem to be similar to the coefficients of the function $g$.

d) For $n=14$, there is a polynomial that passes through each of the 15 points (Lagrange interpolation polynomial). This polynomial obviously minimizes the empirical risk, so it is the one given by the minimization of the empirical risk.

## e
mean_train_error = []
mean_test_error = []
n_max = 14 
for n in range(1,n_max+1):
    f, psi = reg_poly(n)
    mean_train_error.append(sum((f.predict(psi(X))-y)**2)/m)
    mean_test_error.append(sum((f.predict(psi(X_test))-y_test)**2)/m_test)

plt.plot(range(1,n_max+1),mean_train_error)
plt.plot(range(1,n_max+1),mean_test_error)
plt.axis([0, n_max, 0, .2])
plt.show()

The higher the $n$ is, the more complex the resulting polyomials are (and so the learning error is lower).

The test error decreases first. Then, when n is large, it increases: this is called overfitting. The high complexity of the polynomials makes it easier to stick to the learning points, but does not allow good generalization to new points.

2.4. Lasso regularization¶

Q11. We know that with $n=14$, the predictor overfit the data. one of the most popular method for preventing overfitting is to desensitize your model to the training data with regularization like LASSO regularization. Let us view this more in details.

from sklearn.linear_model import Lasso

def reg_poly_lasso(n, alpha=1):
    psi = PolynomialFeatures(n,include_bias=False).fit_transform
    return Lasso(alpha).fit(psi(X),y), psi

for alpha in [10**a for a in range(-6,1)]:
    f, psi = reg_poly_lasso(14,alpha)
    plt.plot(x,y,'bo')
    plt.plot(xplot,f.predict(psi(xplot)))
    plt.axis([-1, 1, -1.5, 1.5])
    print('Coefficients:',f.coef_)
    print()
plt.show()

/Users/Faugon/opt/anaconda3/envs/machine-learning/lib/python3.6/site-packages/sklearn/linear_model/_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 0.0029314605734746587, tolerance: 8.269756440703037e-05
  positive)
/Users/Faugon/opt/anaconda3/envs/machine-learning/lib/python3.6/site-packages/sklearn/linear_model/_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 0.0034076919265486195, tolerance: 8.269756440703037e-05
  positive)

Coefficients: [-0.89094501 -0.29205068  2.43654556 -2.06833419 -0.51013263  0.18547324
 -0.88167823  0.94417364 -0.76770884  1.19560602 -0.56352582  1.2686363
 -0.3326276   1.27475614]

Coefficients: [-0.86868022 -0.35395355  2.35093488 -1.78493925 -0.73123309  0.23673614
 -0.81187713  0.86136735 -0.46303073  0.88983557 -0.          0.65345108
  0.          0.88373619]

Coefficients: [-0.75200331 -0.73092878  1.45474651 -0.2220925  -0.         -0.
 -0.          0.         -0.          0.          0.          0.
  0.          0.        ]

Coefficients: [-0.67106039 -0.8437111   1.26927482 -0.          0.         -0.
  0.         -0.          0.         -0.          0.         -0.
  0.          0.        ]

Coefficients: [-0.02892989 -0.43296795  0.         -0.          0.         -0.
  0.          0.          0.          0.          0.          0.
  0.          0.        ]

Coefficients: [-0. -0. -0. -0. -0. -0. -0. -0. -0. -0. -0. -0. -0. -0.]

Coefficients: [-0. -0. -0. -0. -0. -0. -0. -0. -0. -0. -0. -0. -0. -0.]

For large alpha values, the algorithm produces polynomials with many zero or close to zero coefficients. This limits the complexity of the polynomials and thus the overfitting.

Q12. (Q13.)

mean_train_error = []
mean_test_error = []
nonzero_coefficients = []
degree = []
alpha_range = [10**a for a in range(-6,1)]

for alpha in alpha_range:
    f, psi = reg_poly_lasso(14,alpha)
    mean_train_error.append(sum((f.predict(psi(X))-y)**2)/m)
    mean_test_error.append(sum((f.predict(psi(X_test))-y_test)**2)/m_test)

plt.plot(np.log10(alpha_range),mean_train_error)
plt.plot(np.log10(alpha_range),mean_test_error)
plt.axis([-6,0,0,0.5])
plt.show()

/Users/Faugon/opt/anaconda3/envs/machine-learning/lib/python3.6/site-packages/sklearn/linear_model/_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 0.0029314605734746587, tolerance: 8.269756440703037e-05
  positive)
/Users/Faugon/opt/anaconda3/envs/machine-learning/lib/python3.6/site-packages/sklearn/linear_model/_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 0.0034076919265486195, tolerance: 8.269756440703037e-05
  positive)

The alpha parameter plays a role similar to n in question 10 (but in the opposite direction).

When alpha decreases, the polynomials obtained are more and more complex, and the learning error decreases. But when alpha is small enough, the test error increases again : there is overfitting.

PART 2 : A first approach to SVM¶

import numpy as np
import matplotlib.pyplot as plt

from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()
X = data.data
y = data.target

Q14. Let us split the data into training sample and test sample. We can simply use the sklearn.model_selection function train_test_split

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y)
# test train spit at 75% - 25%

TEST : First, we are going to train a SVM and give his score on test sample as follows :

from sklearn.svm import LinearSVC
svm = LinearSVC(C=1).fit(X_train,y_train)
print('Score on test sample :', svm.score(X_test,y_test))

Score on test sample : 0.9370629370629371

/Users/Faugon/opt/anaconda3/envs/machine-learning/lib/python3.6/site-packages/sklearn/svm/_base.py:947: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  "the number of iterations.", ConvergenceWarning)

Q15. Let us train again a SVM predictor with the hyperparameter C calibrated with cross-validation

from sklearn.model_selection import validation_curve
C_range = np.logspace(-6,-2,50)
train_scores, valid_scores = validation_curve(LinearSVC(), X_train, y_train, "C", C_range, cv=10)

train_scores_mean = np.mean(train_scores,axis=1)
valid_scores_mean = np.mean(valid_scores,axis=1)
plt.plot(np.log10(C_range),train_scores_mean,label="scores d'apprentissage")
plt.plot(np.log10(C_range),valid_scores_mean,label="scores de validation")
plt.legend()
plt.xlabel('log(C)')
plt.ylabel('score')
plt.show()

After having set our hyperparameter 𝐶 correctly, we observe an improvement in the score!

C_best = C_range[np.argmax(valid_scores_mean)]
svm = LinearSVC(C=C_best).fit(X_train,y_train)
print('Score:', svm.score(X_test,y_test))
y_test_predict = svm.predict(X_test)

Score: 0.916083916083916

/Users/Faugon/opt/anaconda3/envs/machine-learning/lib/python3.6/site-packages/sklearn/svm/_base.py:947: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  "the number of iterations.", ConvergenceWarning)

for more information on the hyperparameter C : https://stats.stackexchange.com/questions/31066/what-is-the-influence-of-c-in-svms-with-linear-kernel

Q16. Compute the precision and recall from the coefficients of the confusion matrix :

generation of the confusion matrix :

from sklearn.metrics import confusion_matrix
confusion_matrix_test = confusion_matrix(y_test,y_test_predict)
print(confusion_matrix_test)

[[50  5]
 [ 7 81]]

the metrics recall and precision (directly given by with confusion matrix) :

tp = confusion_matrix_test[0,0]
fp = confusion_matrix_test[1,0]
fn = confusion_matrix_test[0,1]
recall = tp/(tp+fn)
precision = tp/(tp+fp)
print('Recall :', recall)
print('Precision :', precision)

Recall : 0.9090909090909091
Precision : 0.8771929824561403

the metrics recall and precision (directly given by with sklearn package) :

from sklearn.metrics import recall_score, precision_score
print('Recall: ',recall_score(y_test,y_test_predict,pos_label=0))
print('Precision: ',precision_score(y_test,y_test_predict,pos_label=0))

Recall:  0.9090909090909091
Precision:  0.8771929824561403

Q17. We are going to define a function false_positive_rate

def false_positive_rate(y_true,y_predict,pos_label):
    return np.sum(y_true[y_predict == pos_label] != pos_label)/np.sum(y_true != pos_label)

print(false_positive_rate(y_test,y_test_predict,0))

0.07954545454545454

Q18. Define the function modified_predictor

def modified_predictor(X,tau):
    return (svm.decision_function(X) >= tau).astype('int')

Q19.

A very poor predictor has a recall (true positive rate) only slightly higher than the false positive rate : the ROC curve is therefore only slightly above the diagonal (coordinate curve $(t,t)$ for $t∈[0.1]$).
For a good predictor on the other hand, the ROC curve goes well above the diagonal.

Q20. Let us find the minimum and maximum values for $\tau$ and trace the corresponding ROC curve

decision_function_train = svm.decision_function(X_train)
tau_range = np.linspace(np.min(decision_function_train),np.max(decision_function_train),100)

recalls = []
fprs = []
for tau in tau_range:
    y_train_predict = modified_predictor(X_train,tau)
    recalls.append(recall_score(y_train,y_train_predict,pos_label=0))
    fprs.append(false_positive_rate(y_train,y_train_predict,0))

plt.plot(fprs,recalls)
plt.show()

Q21. Let us determine the $\tau$ that gives a better recall :

recalls_array = np.array(recalls)
fprs_array = np.array(fprs)

good_enough_recalls_index = (recalls_array >= .95)
tau_best = (tau_range[good_enough_recalls_index])[np.argmin(fprs_array[good_enough_recalls_index])]

y_test_predict = modified_predictor(X_test,tau_best)

from sklearn.metrics import accuracy_score
print('Accuracy score: ', accuracy_score(y_test, y_test_predict))
print('Recall: ', recall_score(y_test, y_test_predict,pos_label=0))
print('Precision: ', precision_score(y_test, y_test_predict,pos_label=0))
print('False positive rate: ', false_positive_rate(y_test, y_test_predict,pos_label=0))

Accuracy score:  0.8881118881118881
Recall:  0.9454545454545454
Precision:  0.8
False positive rate:  0.14772727272727273

The accuracy is lower than the previous svm but we have a better recall (Recall at 0.96) !

PART 3 : Learning and selection of SVM¶

import numpy as np # to manipulate numerical values
import pandas as pd # to manipulate dataframes
import matplotlib.pyplot as plt # to display graphics
import seaborn as sns # graphic specially made for data vizualization

Let us prepare the dataset :

data = pd.read_csv("wine_dataset.csv", sep = ";")
data.head(5)

This data contains :

print(data.shape)

(6497, 13)

6497 examples (rows) and 13 features (columns).

The new data is the following :

from sklearn.preprocessing import MinMaxScaler

X = data[["chlorides", "total sulfur dioxide"]]
y = data["color"]
X = MinMaxScaler(feature_range=(-1, 1)).fit(X).transform(X)
print(X[:5,])
print(y[:5])

[[-0.88039867 -0.24423963]
 [-0.86710963 -0.41935484]
 [-0.86378738 -0.58064516]
 [-0.8372093  -0.17050691]
 [-0.8372093  -0.17050691]]
0    0
1    0
2    0
3    0
4    0
Name: color, dtype: int64

Q22. Separate the dataset into a training and a test set

X_train, X_test, y_train, y_test = train_test_split(X, y)

We will first perform a logistic regression.

from sklearn.linear_model import LogisticRegressionCV

Q23. Let's display its score on the test sample

logreg = LogisticRegressionCV(cv=5, Cs = np.logspace(-6,6,20)).fit(X_train, y_train)
logreg.score(X_test, y_test)

0.9483076923076923

Since we only have two explanatory variables, we can represent in the plan the inputs, as well as the boundary of logreg predictor decision.

plt.scatter(X[:,0],X[:,1],c=y,edgecolors='k')
figure = plt.gca()
xlim = figure.get_xlim()
ylim = figure.get_ylim()
xx = np.linspace(xlim[0], xlim[1], 100)
yy = np.linspace(ylim[0], ylim[1], 100)
YY, XX = np.meshgrid(yy, xx)
xy = np.vstack([XX.ravel(), YY.ravel()]).T
P = logreg.decision_function(xy).reshape(XX.shape)
plt.contour(XX, YY, P, levels=[0], alpha=1)
plt.show()

Q24. Let us define the function plot_decision_boundary

def plot_decision_boundary(predictor):
    fig, ax = plt.subplots()
    ax.scatter(X[:,0],X[:,1],c=y,edgecolors='k')
    xlim = ax.get_xlim()
    ylim = ax.get_ylim()
    xx = np.linspace(xlim[0], xlim[1], 100)
    yy = np.linspace(ylim[0], ylim[1], 100)
    YY, XX = np.meshgrid(yy, xx)
    xy = np.vstack([XX.ravel(), YY.ravel()]).T
    P = predictor.decision_function(xy).reshape(XX.shape)
    ax.contour(XX, YY, P, levels=[0], alpha=1)
    return None

from sklearn.model_selection import GridSearchCV
from sklearn.svm import LinearSVC
grid = {'C':np.logspace(-6,6,20)}
svc = GridSearchCV(LinearSVC(),cv=5,param_grid=grid,verbose=2)
print('Score :',svc.fit(X_train,y_train).score(X_test,y_test))
plot_decision_boundary(svc)

Fitting 5 folds for each of 20 candidates, totalling 100 fits
[CV] C=1e-06 .........................................................
[CV] .......................................... C=1e-06, total=   0.0s
[CV] C=1e-06 .........................................................
[CV] .......................................... C=1e-06, total=   0.0s
[CV] C=1e-06 .........................................................
[CV] .......................................... C=1e-06, total=   0.0s
[CV] C=1e-06 .........................................................
[CV] .......................................... C=1e-06, total=   0.0s
[CV] C=1e-06 .........................................................
[CV] .......................................... C=1e-06, total=   0.0s
[CV] C=4.281332398719396e-06 .........................................
[CV] .......................... C=4.281332398719396e-06, total=   0.0s
[CV] C=4.281332398719396e-06 .........................................
[CV] .......................... C=4.281332398719396e-06, total=   0.0s
[CV] C=4.281332398719396e-06 .........................................
[CV] .......................... C=4.281332398719396e-06, total=   0.0s
[CV] C=4.281332398719396e-06 .........................................
[CV] .......................... C=4.281332398719396e-06, total=   0.0s
[CV] C=4.281332398719396e-06 .........................................
[CV] .......................... C=4.281332398719396e-06, total=   0.0s
[CV] C=1.8329807108324375e-05 ........................................
[CV] ......................... C=1.8329807108324375e-05, total=   0.0s
[CV] C=1.8329807108324375e-05 ........................................
[CV] ......................... C=1.8329807108324375e-05, total=   0.0s
[CV] C=1.8329807108324375e-05 ........................................
[CV] ......................... C=1.8329807108324375e-05, total=   0.0s
[CV] C=1.8329807108324375e-05 ........................................
[CV] ......................... C=1.8329807108324375e-05, total=   0.0s
[CV] C=1.8329807108324375e-05 ........................................
[CV] ......................... C=1.8329807108324375e-05, total=   0.0s
[CV] C=7.847599703514606e-05 .........................................
[CV] .......................... C=7.847599703514606e-05, total=   0.0s
[CV] C=7.847599703514606e-05 .........................................
[CV] .......................... C=7.847599703514606e-05, total=   0.0s
[CV] C=7.847599703514606e-05 .........................................
[CV] .......................... C=7.847599703514606e-05, total=   0.0s
[CV] C=7.847599703514606e-05 .........................................
[CV] .......................... C=7.847599703514606e-05, total=   0.0s
[CV] C=7.847599703514606e-05 .........................................
[CV] .......................... C=7.847599703514606e-05, total=   0.0s
[CV] C=0.0003359818286283781 .........................................
[CV] .......................... C=0.0003359818286283781, total=   0.0s
[CV] C=0.0003359818286283781 .........................................
[CV] .......................... C=0.0003359818286283781, total=   0.0s
[CV] C=0.0003359818286283781 .........................................
[CV] .......................... C=0.0003359818286283781, total=   0.0s
[CV] C=0.0003359818286283781 .........................................
[CV] .......................... C=0.0003359818286283781, total=   0.0s
[CV] C=0.0003359818286283781 .........................................
[CV] .......................... C=0.0003359818286283781, total=   0.0s
[CV] C=0.0014384498882876629 .........................................
[CV] .......................... C=0.0014384498882876629, total=   0.0s
[CV] C=0.0014384498882876629 .........................................
[CV] .......................... C=0.0014384498882876629, total=   0.0s
[CV] C=0.0014384498882876629 .........................................
[CV] .......................... C=0.0014384498882876629, total=   0.0s
[CV] C=0.0014384498882876629 .........................................
[CV] .......................... C=0.0014384498882876629, total=   0.0s
[CV] C=0.0014384498882876629 .........................................
[CV] .......................... C=0.0014384498882876629, total=   0.0s
[CV] C=0.00615848211066026 ...........................................

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s

[CV] ............................ C=0.00615848211066026, total=   0.0s
[CV] C=0.00615848211066026 ...........................................
[CV] ............................ C=0.00615848211066026, total=   0.0s
[CV] C=0.00615848211066026 ...........................................
[CV] ............................ C=0.00615848211066026, total=   0.0s
[CV] C=0.00615848211066026 ...........................................
[CV] ............................ C=0.00615848211066026, total=   0.0s
[CV] C=0.00615848211066026 ...........................................
[CV] ............................ C=0.00615848211066026, total=   0.0s
[CV] C=0.026366508987303555 ..........................................
[CV] ........................... C=0.026366508987303555, total=   0.0s
[CV] C=0.026366508987303555 ..........................................
[CV] ........................... C=0.026366508987303555, total=   0.0s
[CV] C=0.026366508987303555 ..........................................
[CV] ........................... C=0.026366508987303555, total=   0.0s
[CV] C=0.026366508987303555 ..........................................
[CV] ........................... C=0.026366508987303555, total=   0.0s
[CV] C=0.026366508987303555 ..........................................
[CV] ........................... C=0.026366508987303555, total=   0.0s
[CV] C=0.11288378916846883 ...........................................
[CV] ............................ C=0.11288378916846883, total=   0.0s
[CV] C=0.11288378916846883 ...........................................
[CV] ............................ C=0.11288378916846883, total=   0.0s
[CV] C=0.11288378916846883 ...........................................
[CV] ............................ C=0.11288378916846883, total=   0.0s
[CV] C=0.11288378916846883 ...........................................
[CV] ............................ C=0.11288378916846883, total=   0.0s
[CV] C=0.11288378916846883 ...........................................
[CV] ............................ C=0.11288378916846883, total=   0.0s
[CV] C=0.4832930238571752 ............................................
[CV] ............................. C=0.4832930238571752, total=   0.0s
[CV] C=0.4832930238571752 ............................................
[CV] ............................. C=0.4832930238571752, total=   0.0s
[CV] C=0.4832930238571752 ............................................
[CV] ............................. C=0.4832930238571752, total=   0.0s
[CV] C=0.4832930238571752 ............................................
[CV] ............................. C=0.4832930238571752, total=   0.0s
[CV] C=0.4832930238571752 ............................................
[CV] ............................. C=0.4832930238571752, total=   0.0s
[CV] C=2.06913808111479 ..............................................
[CV] ............................... C=2.06913808111479, total=   0.0s
[CV] C=2.06913808111479 ..............................................
[CV] ............................... C=2.06913808111479, total=   0.0s
[CV] C=2.06913808111479 ..............................................
[CV] ............................... C=2.06913808111479, total=   0.0s
[CV] C=2.06913808111479 ..............................................
[CV] ............................... C=2.06913808111479, total=   0.0s
[CV] C=2.06913808111479 ..............................................
[CV] ............................... C=2.06913808111479, total=   0.0s
[CV] C=8.858667904100814 .............................................
[CV] .............................. C=8.858667904100814, total=   0.0s
[CV] C=8.858667904100814 .............................................
[CV] .............................. C=8.858667904100814, total=   0.0s
[CV] C=8.858667904100814 .............................................
[CV] .............................. C=8.858667904100814, total=   0.0s
[CV] C=8.858667904100814 .............................................
[CV] .............................. C=8.858667904100814, total=   0.1s
[CV] C=8.858667904100814 .............................................
[CV] .............................. C=8.858667904100814, total=   0.1s
[CV] C=37.92690190732246 .............................................
[CV] .............................. C=37.92690190732246, total=   0.1s
[CV] C=37.92690190732246 .............................................
[CV] .............................. C=37.92690190732246, total=   0.1s
[CV] C=37.92690190732246 .............................................

/Users/Faugon/opt/anaconda3/envs/machine-learning/lib/python3.6/site-packages/sklearn/svm/_base.py:947: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  "the number of iterations.", ConvergenceWarning)
/Users/Faugon/opt/anaconda3/envs/machine-learning/lib/python3.6/site-packages/sklearn/svm/_base.py:947: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  "the number of iterations.", ConvergenceWarning)
/Users/Faugon/opt/anaconda3/envs/machine-learning/lib/python3.6/site-packages/sklearn/svm/_base.py:947: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  "the number of iterations.", ConvergenceWarning)

[CV] .............................. C=37.92690190732246, total=   0.1s
[CV] C=37.92690190732246 .............................................
[CV] .............................. C=37.92690190732246, total=   0.1s
[CV] C=37.92690190732246 .............................................

/Users/Faugon/opt/anaconda3/envs/machine-learning/lib/python3.6/site-packages/sklearn/svm/_base.py:947: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  "the number of iterations.", ConvergenceWarning)
/Users/Faugon/opt/anaconda3/envs/machine-learning/lib/python3.6/site-packages/sklearn/svm/_base.py:947: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  "the number of iterations.", ConvergenceWarning)

[CV] .............................. C=37.92690190732246, total=   0.1s
[CV] C=162.37767391887175 ............................................

/Users/Faugon/opt/anaconda3/envs/machine-learning/lib/python3.6/site-packages/sklearn/svm/_base.py:947: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  "the number of iterations.", ConvergenceWarning)
/Users/Faugon/opt/anaconda3/envs/machine-learning/lib/python3.6/site-packages/sklearn/svm/_base.py:947: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  "the number of iterations.", ConvergenceWarning)
/Users/Faugon/opt/anaconda3/envs/machine-learning/lib/python3.6/site-packages/sklearn/svm/_base.py:947: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  "the number of iterations.", ConvergenceWarning)

[CV] ............................. C=162.37767391887175, total=   0.3s
[CV] C=162.37767391887175 ............................................
[CV] ............................. C=162.37767391887175, total=   0.1s
[CV] C=162.37767391887175 ............................................
[CV] ............................. C=162.37767391887175, total=   0.1s
[CV] C=162.37767391887175 ............................................

/Users/Faugon/opt/anaconda3/envs/machine-learning/lib/python3.6/site-packages/sklearn/svm/_base.py:947: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  "the number of iterations.", ConvergenceWarning)
/Users/Faugon/opt/anaconda3/envs/machine-learning/lib/python3.6/site-packages/sklearn/svm/_base.py:947: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  "the number of iterations.", ConvergenceWarning)
/Users/Faugon/opt/anaconda3/envs/machine-learning/lib/python3.6/site-packages/sklearn/svm/_base.py:947: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  "the number of iterations.", ConvergenceWarning)

print(svc.best_params_)

{'C': 37.92690190732246}

As shown by the plot, we can propose an improvement by using SVMs with kernels.

Q25. SVM with polynomial kernels

from sklearn.svm import SVC

svcpoly = dict()
for d in range(2, 9):
    svcpoly[d] = SVC(kernel = "poly", C=1, coef0=1, degree=d, gamma="auto").fit(X_train, y_train)
    plot_decision_boundary(svcpoly[d])
    plt.title("SVM with polynomial kernel : degree " + str(d))

visually, svm with polynomial kernel whose degree is 3 (or maybe 4) seems to be a good predictor. Let us quantify this observation.

Q26. train a kernel SVM polynomial with degree as hyperparameter to tune by CV (using GridSearchCV)

from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
grid = {'degree':np.arange(2,9)}
svc = GridSearchCV(SVC(kernel="poly",C=1,coef0=1,gamma="auto"),cv=5,param_grid=grid,verbose=2)
print('Score :',svc.fit(X_train,y_train).score(X_test,y_test))
plot_decision_boundary(svc)

Fitting 5 folds for each of 7 candidates, totalling 35 fits
[CV] degree=2 ........................................................
[CV] ......................................... degree=2, total=   0.0s
[CV] degree=2 ........................................................
[CV] ......................................... degree=2, total=   0.1s
[CV] degree=2 ........................................................
[CV] ......................................... degree=2, total=   0.0s
[CV] degree=2 ........................................................
[CV] ......................................... degree=2, total=   0.0s
[CV] degree=2 ........................................................

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s

[CV] ......................................... degree=2, total=   0.0s
[CV] degree=3 ........................................................
[CV] ......................................... degree=3, total=   0.0s
[CV] degree=3 ........................................................
[CV] ......................................... degree=3, total=   0.0s
[CV] degree=3 ........................................................
[CV] ......................................... degree=3, total=   0.0s
[CV] degree=3 ........................................................
[CV] ......................................... degree=3, total=   0.0s
[CV] degree=3 ........................................................
[CV] ......................................... degree=3, total=   0.0s
[CV] degree=4 ........................................................
[CV] ......................................... degree=4, total=   0.0s
[CV] degree=4 ........................................................
[CV] ......................................... degree=4, total=   0.0s
[CV] degree=4 ........................................................
[CV] ......................................... degree=4, total=   0.0s
[CV] degree=4 ........................................................
[CV] ......................................... degree=4, total=   0.0s
[CV] degree=4 ........................................................
[CV] ......................................... degree=4, total=   0.0s
[CV] degree=5 ........................................................
[CV] ......................................... degree=5, total=   0.0s
[CV] degree=5 ........................................................
[CV] ......................................... degree=5, total=   0.1s
[CV] degree=5 ........................................................
[CV] ......................................... degree=5, total=   0.0s
[CV] degree=5 ........................................................
[CV] ......................................... degree=5, total=   0.0s
[CV] degree=5 ........................................................
[CV] ......................................... degree=5, total=   0.0s
[CV] degree=6 ........................................................
[CV] ......................................... degree=6, total=   0.0s
[CV] degree=6 ........................................................
[CV] ......................................... degree=6, total=   0.1s
[CV] degree=6 ........................................................
[CV] ......................................... degree=6, total=   0.0s
[CV] degree=6 ........................................................
[CV] ......................................... degree=6, total=   0.0s
[CV] degree=6 ........................................................
[CV] ......................................... degree=6, total=   0.0s
[CV] degree=7 ........................................................
[CV] ......................................... degree=7, total=   0.0s
[CV] degree=7 ........................................................
[CV] ......................................... degree=7, total=   0.1s
[CV] degree=7 ........................................................
[CV] ......................................... degree=7, total=   0.0s
[CV] degree=7 ........................................................
[CV] ......................................... degree=7, total=   0.1s
[CV] degree=7 ........................................................
[CV] ......................................... degree=7, total=   0.0s
[CV] degree=8 ........................................................
[CV] ......................................... degree=8, total=   0.1s
[CV] degree=8 ........................................................
[CV] ......................................... degree=8, total=   0.0s
[CV] degree=8 ........................................................
[CV] ......................................... degree=8, total=   0.0s
[CV] degree=8 ........................................................
[CV] ......................................... degree=8, total=   0.0s
[CV] degree=8 ........................................................
[CV] ......................................... degree=8, total=   0.0s

[Parallel(n_jobs=1)]: Done  35 out of  35 | elapsed:    1.5s finished

Score : 0.9636923076923077

svc.get_params

<bound method BaseEstimator.get_params of GridSearchCV(cv=5, error_score=nan,
             estimator=SVC(C=1, break_ties=False, cache_size=200,
                           class_weight=None, coef0=1,
                           decision_function_shape='ovr', degree=3,
                           gamma='auto', kernel='poly', max_iter=-1,
                           probability=False, random_state=None, shrinking=True,
                           tol=0.001, verbose=False),
             iid='deprecated', n_jobs=None,
             param_grid={'degree': array([2, 3, 4, 5, 6, 7, 8])},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring=None, verbose=2)>

So as a conclusion, we see that the svm with polynomial kernel whith degree 3 is indeed the best predictor (with the other hyperparameters fixed). It can also be noticed that the score is higher than before.

	fixed acidity	volatile acidity	citric acid	residual sugar	chlorides	free sulfur dioxide	total sulfur dioxide	density	pH	sulphates	alcohol	quality
0	7.0	0.27	0.36	20.7	0.045	45.0	170.0	1.0010	3.00	0.45	8.8	6
1	6.3	0.30	0.34	1.6	0.049	14.0	132.0	0.9940	3.30	0.49	9.5	6
2	8.1	0.28	0.40	6.9	0.050	30.0	97.0	0.9951	3.26	0.44	10.1	6
3	7.2	0.23	0.32	8.5	0.058	47.0	186.0	0.9956	3.19	0.40	9.9	6
4	7.2	0.23	0.32	8.5	0.058	47.0	186.0	0.9956	3.19	0.40	9.9	6