项目作者: arschilke

项目描述 :
Final Project for CS484: Artifical Inteligence at Loyola University Maryland
高级语言: Python
项目地址: git://github.com/arschilke/Mushrooms-AIFinalProject.git
创建时间: 2019-06-07T17:51:18Z
项目社区:https://github.com/arschilke/Mushrooms-AIFinalProject

开源协议:

下载


Mushrooms

Are they edible or are they poisonous?

Project by Alyssa Schilke, Mollie Morrow, and Jennifer Moutenot

DATASET

Mushroom Classification

  • Appearance and specifications of 8,000+ different species of mushrooms (dimensions, population, color, spacing, size, number of rings, odor, spores, edibility, etc.)
  • Dimensions, population, color, spacing, size, number of rings, odor, spores, edibility, etc.
  • Originally intended to be used to determine the edibility of various mushrooms
  • https://www.kaggle.com/uciml/mushroom-classification

Problem Type

Supervised Learning on a Classification Problem

Machine Learning Technique

We split the testing and training sets from the dataset randomly:

  • Before: split test_size=0.75 and train_size=0.25
    • Resulted in 100% accuracy

In order to apply cross validation (to continue with the assignment), we split the dataset at:

  • train_test_split(X, y, random_state=42, test_size=0.99) and train_size=0.1
    We know these are not reasonable values to set (99% testing and 1% training) but given the circumstances of the data, these values allowed for less accuracy across the classifiers

Data Preprocessing

In order for scikit-learn to process our data, we used LabelEncoder to convert our discrete factors into numerical values

RESULTS FROM NAÏVE BAYES

Original

  • Report the training and testing errors yielded:
    • The mean accuracy of the Naive Bayes training and testing data:
      • Train score: 0.8641975308641975
      • Test score: 0.7763272410791993

        After Cross-Validation

  • Hyperparameters learned through cross-validation:
    • nb_parameters = {'fit_prior': (True, False), 'alpha': (0.8, 0.05, 0.1, 0.5)}
  • Best score for Naive Bayes 0.8395061728395061
  • Best params for Naive Bayes {'alpha': 0.8, 'fit_prior': True}
  • After using cross validation, our test score became more accurate
    This was not our best performing classifier

RESULTS FROM MLP

Original

  • Report the training and testing errors yielded:
    • The mean accuracy of the MLP training and testing data:
      • Train score: 1.0
      • Test score: 0.9241576526171826

After Cross-Validation

  • Hyperparameters learned through cross-validation
    • mlp_parameters = {'max_iter':(1000, 1200, 5000, 10000)}
  • Best score for MLP 0.9135802469135802
  • Best params for MLP {'max_iter': 1200}
  • Many hyper-parameters for MLP, default worked the most accurately
  • This was our best performing classifier

RESULTS FROM SVM

Original

  • Report the training and testing errors yielded:
    • The mean accuracy of the SVM training and testing data:
      • Train score: 0.9876543209876543
      • Test score: 0.8977993286087281

After Cross-Validation

  • Hyperparameters learned through cross-validation:
    • svm_parameters = {'kernel': ('rbf', 'linear', 'poly', 'sigmoid'), 'C': (np.arange(0.1, 4)), 'degree': (np.arange(1, 2)), 'coef0': np.arange(0, 2), 'shrinking': (True, False), 'probability': (False, True), 'decision_function_shape': ('ovo', 'ovr')}
  • Best score for SVM 0.9135802469135802
  • Best params for SVM {'C': 1.1, 'coef0': 0, 'decision_function_shape': 'ovo', 'degree': 1, 'kernel': 'linear', 'probability': False, 'shrinking': True}
  • We were able to increase the accuracy using cross validation
  • This was our second-best performing classifier

    Conclusion

  • We found a very unusual dataset that allowed a 100% mean accuracy using MLP and SVM with the normal 25% training data settings.
  • Through our analysis of hyperparameters, we found that MLP took a very long time to cross validate due to the amount of settings to evaluate.
  • This data set only includes 21 species of mushroom, it would be interesting to see how adding other kinds of mushrooms affects the calculations.