python 3.x - Plot SVM with Matplotlib? -
i have interesting user data. gives information on timeliness of tasks users asked perform. trying find out, if late - tells me if users on time (0), little late (1), or quite late (2) - predictable/explainable. generate late column giving traffic light information (green = not late, red = super late).
here do:
#imports import pandas pd import numpy np import matplotlib.pyplot plt sklearn import preprocessing sklearn import svm import sklearn.metrics sm #load user data df = pd.read_csv('april.csv', error_bad_lines=false, encoding='iso8859_15', delimiter=';') #convert objects datetime data types cols = ['planned start', 'actual start', 'planned end', 'actual end'] df = df[cols].apply( pd.to_datetime, dayfirst=true, errors='ignore' ).join(df.drop(cols, 1)) #convert datetime numeric data types cols = ['planned start', 'actual start', 'planned end', 'actual end'] df = df[cols].apply( pd.to_numeric, errors='ignore' ).join(df.drop(cols, 1)) #add likert scale green, yellow , red traffic lights df['late'] = 0 df.ix[df['end time traffic light'].isin(['yellow']), 'late'] = 1 df.ix[df['end time traffic light'].isin(['red']), 'late'] = 2 #supervised learning #x , y arrays # x = np.array(df.drop(['late'], axis=1)) x = df[['planned start', 'actual start', 'planned end', 'actual end', 'measure package', 'measure' , 'responsible user']].as_matrix() y = np.array(df['late']) #preprocessing data x = preprocessing.scale(x) #supper vector machine clf = svm.svc(decision_function_shape='ovo') clf.fit(x, y) print(clf.score(x, y)) i trying understand how plot decision boundaries.my goal plot 2-way scatter actual end , planned end. naturally, checked documentation (see e.g. here). can't wrap head around it. how work?
as heads future, you'll faster (and better) responses if provide publicly available dataset attempted plotting code, since don't have 'april.csv'. can leave out data-wrangling code 'april.csv'. said...
sebastian raschka created mlxtend package, has has pretty awesome plotting function doing this. uses matplotlib under hood.
import numpy np import pandas pd sklearn import svm mlxtend.plotting import plot_decision_regions import matplotlib.pyplot plt # create arbitrary dataset example df = pd.dataframe({'planned_end': np.random.uniform(low=-5, high=5, size=50), 'actual_end': np.random.uniform(low=-1, high=1, size=50), 'late': np.random.random_integers(low=0, high=2, size=50)} ) # fit support vector machine classifier x = df[['planned_end', 'actual_end']] y = df['late'] clf = svm.svc(decision_function_shape='ovo') clf.fit(x.values, y.values) # plot decision region using mlxtend's awesome plotting function plot_decision_regions(x=x.values, y=y.values, clf=clf, legend=2) # update plot object x/y axis labels , figure title plt.xlabel(x.columns[0], size=14) plt.ylabel(x.columns[1], size=14) plt.title('svm decision region boundary', size=16) 
Comments
Post a Comment