Cookbook

Personal reference scripts for commonly used code

Cookbook

Personal reference scripts for commonly used code

  • Machine Learning: A folder containing scripts for commonly used machine learning code
    • Preprocessing.py: Preparing data for machine learning tasks, primarily using pandas and sklearn
    • scikit-learn: Also includes LightGBM and XGBoost
      • ModelTraining.py: Cross validation, hyperparameter tuning, feature selection, etc.
      • Evaluation.py: Evaluation plots, collecting eval metrics, learning curves, feature importance, etc.
      • LighTGBM.py: Early stopping and other code that’s convenient to copy/paste
    • TensorFlow
      • Keras.py: Commonly used code for Keras
      • KerasMNIST.py: Training a convolutional net on the MNIST data with Keras
      • TensorFlowMNIST.py: Training a convolutional net on the MNIST data with TensorFlow
    • PyTorch
    • SparkML
      • SparkML.py: Commonly used code for SparkML. Includes preprocessing, hyperparameter tuning, cross validation, and so on.
  • Plotting: Code snippets for common plots
  • Misc: For scripts that don’t fit within any other folders
    • EDA.py: EDA reports, missing values, and outliers
    • NLP.py: NLTK natural language processing tasks
    • PySpark.py: Missing values, datatype conversions, encoding categorical columns, and prepping data for models
  • DevOps: A folder containing scripts for operationalizing machine learning models