{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Week 6 Notebook: Evalulating Model Performance and Robustness\n",
    "===============================================================\n",
    "\n",
    "Let's take a look at the model performance and dependence on other variables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tensorflow.keras as keras\n",
    "import numpy as np\n",
    "from sklearn.metrics import roc_curve, auc\n",
    "import matplotlib.pyplot as plt\n",
    "import uproot"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import yaml\n",
    "\n",
    "with open('definitions.yml') as file:\n",
    "    # The FullLoader parameter handles the conversion from YAML\n",
    "    # scalar values to Python the dictionary format\n",
    "    definitions = yaml.load(file, Loader=yaml.FullLoader)\n",
    "    \n",
    "features = definitions['features']\n",
    "spectators = definitions['spectators']\n",
    "labels = definitions['labels']\n",
    "\n",
    "nfeatures = definitions['nfeatures']\n",
    "nspectators = definitions['nspectators']\n",
    "nlabels = definitions['nlabels']\n",
    "ntracks = definitions['ntracks']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from DataGenerator import DataGenerator"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load the Previous Model\n",
    "Here, we will load the last model trained in Notebook 5."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# define callbacks\n",
    "from tensorflow.keras.models import load_model\n",
    "\n",
    "keras_model_deepset = load_model('keras_model_deepset_best.h5')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Configure Data Generator for Testing\n",
    "\n",
    "We will configure the data generator to load testing data, including \"spectator variables\" that we were not used in the training, but may be correlated with the output of the algorithm. Specifically, the jet mass and pT."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# load testing file\n",
    "test_files = ['root://eospublic.cern.ch//eos/opendata/cms/datascience/HiggsToBBNtupleProducerTool/HiggsToBBNTuple_HiggsToBB_QCD_RunII_13TeV_MC/test/ntuple_merged_0.root'\n",
    "             ]\n",
    "test_generator = DataGenerator(test_files, features, labels, spectators, batch_size=8192, n_dim=ntracks, \n",
    "                               remove_mass_pt_window=True, \n",
    "                               remove_unlabeled=True,\n",
    "                               return_spectators=True,\n",
    "                               max_entry=200000) # basically, no maximum"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from tqdm.notebook import tqdm\n",
    "# run model inference on test data set\n",
    "predict_array_cnn = []\n",
    "label_array_test = []\n",
    "spec_array_test = []\n",
    "\n",
    "for t in tqdm(test_generator,total=len(test_generator)):\n",
    "    label_array_test.append(t[1][0])\n",
    "    spec_array_test.append(t[1][1])\n",
    "    predict_array_cnn.append(keras_model_deepset.predict(t[0]))\n",
    "    \n",
    "predict_array_cnn = np.concatenate(predict_array_cnn, axis=0)\n",
    "label_array_test = np.concatenate(label_array_test, axis=0)\n",
    "spec_array_test = np.concatenate(spec_array_test, axis=0)\n",
    "\n",
    "# create ROC curves\n",
    "fpr_cnn, tpr_cnn, threshold_cnn = roc_curve(label_array_test[:,1], predict_array_cnn[:,1])\n",
    "    \n",
    "# plot ROC curves\n",
    "plt.figure()\n",
    "plt.plot(tpr_cnn, fpr_cnn, lw=2.5, label=\"Deep Sets, AUC = {:.1f}%\".format(auc(fpr_cnn,tpr_cnn)*100))\n",
    "plt.xlabel(r'True positive rate')\n",
    "plt.ylabel(r'False positive rate')\n",
    "plt.semilogy()\n",
    "plt.ylim(0.001, 1)\n",
    "plt.xlim(0, 1)\n",
    "plt.grid(True)\n",
    "plt.legend(loc='upper left')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def find_nearest(array,value):\n",
    "    idx = (np.abs(array-value)).argmin()\n",
    "    return idx, array[idx]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Correlation with jet mass\n",
    "This algorithm has a slight dependence on jet mass. Qualitatively, we can observe this \"mass sculpting\" by making the distributino of the jet mass with tighter and tighter requirements on the algorithm output. We can see that the original distribution of the jet mass for the background has no \"resonance\" or bump, and is basically smoothly falling. \n",
    "\n",
    "However, once a requirement is made on the algorithm output, jets with low mass or high mass are excluded at higher rates. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure()\n",
    "for wp in [1.0, 0.8, 0.5, 0.1]:\n",
    "    idx, val = find_nearest(fpr_cnn, wp)\n",
    "    plt.hist(spec_array_test[:,0], bins = np.linspace(40, 200, 21), \n",
    "             weights = label_array_test[:,0]*(predict_array_cnn[:,1] > threshold_cnn[idx]),\n",
    "             alpha=0.4,density=True, label='QCD, {}% FPR cut'.format(int(wp*100)),linestyle='-')\n",
    "plt.legend()\n",
    "plt.xlabel(r'$m_{SD}$')\n",
    "plt.ylabel(r'Normalized probability')\n",
    "plt.xlim(40, 200)\n",
    "\n",
    "plt.figure()\n",
    "for wp in [1.0, 0.8, 0.5, 0.1]:\n",
    "    idx, val = find_nearest(fpr_cnn, wp)\n",
    "    plt.hist(spec_array_test[:,0], bins=np.linspace(40, 200, 21), \n",
    "             weights = label_array_test[:,1]*(predict_array_cnn[:,1] > threshold_cnn[idx]),\n",
    "             alpha=0.4,density=True, label='H(bb), {}% FPR cut'.format(int(wp*100)),linestyle='-')\n",
    "plt.legend()\n",
    "plt.xlabel(r'$m_{SD}$')\n",
    "plt.ylabel(r'Normalized probability')\n",
    "plt.xlim(40, 200)\n",
    "\n",
    "plt.figure()\n",
    "plt.hist2d(spec_array_test[:,0][(label_array_test[:,0]==1) & (predict_array_cnn[:,1] > 0.1)], \n",
    "           predict_array_cnn[:,1][(label_array_test[:,0]==1) & (predict_array_cnn[:,1] > 0.1)], \n",
    "           bins=(30, 30), \n",
    "           cmap=plt.cm.jet)\n",
    "plt.colorbar()\n",
    "plt.title('QCD')\n",
    "plt.ylabel(r'Deep Set Output')\n",
    "plt.xlabel(r'$m_{SD}$')\n",
    "\n",
    "plt.figure()\n",
    "plt.hist2d(spec_array_test[:,0][(label_array_test[:,1]==1) & (predict_array_cnn[:,1] > 0.8)], \n",
    "           predict_array_cnn[:,1][(label_array_test[:,1]==1) & (predict_array_cnn[:,1] > 0.8)], \n",
    "           bins=(30, 30), \n",
    "           cmap=plt.cm.jet)\n",
    "plt.colorbar()\n",
    "plt.title('H(bb)')\n",
    "plt.ylabel(r'Deep Set Output')\n",
    "plt.xlabel(r'$m_{SD}$')\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Dependence on jet pT\n",
    "\n",
    "We can repeat the exercise this time looking at jet pT. We see that in general, higher pT jets are promoted to be more \"signal-like\" by the algorithm. This is likely due to the fact that high pT jets are over-represented in the signal sample compared to the background sample."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure()\n",
    "for wp in [1.0, 0.8, 0.5, 0.1]:\n",
    "    idx, val = find_nearest(fpr_cnn, wp)\n",
    "    plt.hist(spec_array_test[:,1], bins=np.linspace(300, 2000, 51), \n",
    "             weights = label_array_test[:,0]*(predict_array_cnn[:,1] > threshold_cnn[idx]),\n",
    "             alpha=0.4,density=True, label='QCD, {}% FPR cut'.format(int(wp*100)),linestyle='-')\n",
    "\n",
    "plt.legend()\n",
    "plt.xlabel(r'$p_{T}$')\n",
    "plt.ylabel(r'Normalized probability')\n",
    "plt.xlim(300, 2000)\n",
    "\n",
    "plt.figure()\n",
    "for wp in [1.0, 0.8, 0.5, 0.1]:\n",
    "    idx, val = find_nearest(fpr_cnn, wp)\n",
    "    plt.hist(spec_array_test[:,1], bins=np.linspace(300, 2000, 51), \n",
    "             weights = label_array_test[:,1]*(predict_array_cnn[:,1] > threshold_cnn[idx]),\n",
    "             alpha=0.4,density=True, label='H(bb), {}% FPR cut'.format(int(wp*100)),linestyle='-')\n",
    "\n",
    "plt.legend()\n",
    "plt.xlabel(r'$p_{T}$')\n",
    "plt.ylabel(r'Normalized probability')\n",
    "plt.xlim(300, 2000)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}