An Exception was encountered at 'In [3]'.
Regular measures are made on the 363 nodes of 8 Grid'5000 clusters to keep track of their evolution. Three main metrics are collected: the average CPU performance (in Gflop/s), the average CPU frequency (in GHz) and the average CPU temperature (in °C).
cluster = 'yeti'
factor = 'mean_gflops'
confidence = 0.9999
# Parameters
cluster = "pyxis"
factor = "intercept_residual"
Execution using papermill encountered an exception here and stopped:
%load_ext autoreload
%autoreload 2
import requests
import pandas
import io
import plotnine
plotnine.options.figure_size = 10, 7.5
plotnine.options.dpi = 100
from cashew import non_regression_tests as nrt
import cashew
print(cashew.__git_version__)
/usr/local/lib/python3.7/dist-packages/scipy/__init__.py:149: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.16.2 OpenBLAS blas_thread_init: pthread_create failed for thread 20 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 21 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 22 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 23 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 24 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 25 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 26 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 27 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 28 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 29 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 30 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 31 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 32 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 33 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 34 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 35 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 36 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 37 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 38 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 39 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 40 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 41 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 42 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 43 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 44 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 45 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 46 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 47 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 48 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 49 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 50 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 51 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 52 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 53 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 54 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 55 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 56 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 57 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 58 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 59 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 60 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 61 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 62 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max OpenBLAS blas_thread_init: pthread_create failed for thread 63 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 767986 current, 767986 max
--------------------------------------------------------------------------- KeyboardInterrupt Traceback (most recent call last) <ipython-input-3-44023fccb89d> in <module>() 4 import pandas 5 import io ----> 6 import plotnine 7 plotnine.options.figure_size = 10, 7.5 8 plotnine.options.dpi = 100 /usr/local/lib/python3.7/dist-packages/plotnine/__init__.py in <module>() ----> 1 from .qplot import qplot # noqa: F401 2 from .ggplot import ggplot, ggsave # noqa: F401 3 from .ggplot import save_as_pdf_pages # noqa: F401 4 from .watermark import watermark # noqa: F401 5 from .mapping import * # noqa: F401,F403,E261 /usr/local/lib/python3.7/dist-packages/plotnine/qplot.py in <module>() 7 from patsy.eval import EvalEnvironment 8 ----> 9 from .ggplot import ggplot 10 from .mapping.aes import aes, all_aesthetics, scaled_aesthetics 11 from .labels import labs /usr/local/lib/python3.7/dist-packages/plotnine/ggplot.py in <module>() 18 from .mapping.aes import aes, make_labels 19 from .layer import Layers ---> 20 from .facets import facet_null 21 from .facets.layout import Layout 22 from .options import get_option /usr/local/lib/python3.7/dist-packages/plotnine/facets/__init__.py in <module>() ----> 1 from .facet_grid import facet_grid 2 from .facet_null import facet_null 3 from .facet_wrap import facet_wrap 4 from .labelling import label_value, label_both, label_context 5 from .labelling import labeller, as_labeller /usr/local/lib/python3.7/dist-packages/plotnine/facets/facet_grid.py in <module>() 4 from ..utils import match, join_keys 5 from ..exceptions import PlotnineError ----> 6 from .facet import facet, layout_null, combine_vars, add_missing_facets 7 from .facet import eval_facet_vars 8 from .strips import strip /usr/local/lib/python3.7/dist-packages/plotnine/facets/facet.py in <module>() 11 from ..utils import cross_join, match 12 from ..exceptions import PlotnineError, PlotnineWarning ---> 13 from ..scales.scales import Scales 14 15 # For default matplotlib backend /usr/local/lib/python3.7/dist-packages/plotnine/scales/__init__.py in <module>() 1 # alpha ----> 2 from .scale_alpha import scale_alpha 3 from .scale_alpha import scale_alpha_continuous 4 from .scale_alpha import scale_alpha_ordinal 5 from .scale_alpha import scale_alpha_discrete /usr/local/lib/python3.7/dist-packages/plotnine/scales/scale_alpha.py in <module>() 7 from ..utils import alias 8 from ..exceptions import PlotnineWarning ----> 9 from .scale import scale_discrete, scale_continuous, scale_datetime 10 11 /usr/local/lib/python3.7/dist-packages/plotnine/scales/scale.py in <module>() 11 from mizani.breaks import date_breaks 12 from mizani.formatters import date_format ---> 13 from mizani.transforms import gettrans 14 15 from ..mapping.aes import is_position_aes, rename_aesthetics /usr/local/lib/python3.7/dist-packages/mizani/transforms.py in <module>() 590 591 logit_trans = probability_trans('logistic', _name='logit', --> 592 _doc='Logit Transformation') 593 probit_trans = probability_trans('norm', _name='norm', 594 _doc='Probit Transformation') /usr/local/lib/python3.7/dist-packages/mizani/transforms.py in probability_trans(distribution, *args, **kwargs) 556 """ 557 try: --> 558 import scipy.stats as stats 559 except ImportError: 560 raise ImportError( /usr/local/lib/python3.7/dist-packages/scipy/stats/__init__.py in <module>() 439 """ 440 --> 441 from .stats import * 442 from .distributions import * 443 from .morestats import * /usr/local/lib/python3.7/dist-packages/scipy/stats/stats.py in <module>() 35 from numpy import array, asarray, ma 36 ---> 37 from scipy.spatial.distance import cdist 38 from scipy.ndimage import measurements 39 from scipy._lib._util import (check_random_state, MapWrapper, /usr/local/lib/python3.7/dist-packages/scipy/spatial/__init__.py in <module>() 96 from .kdtree import * 97 from .ckdtree import * ---> 98 from .qhull import * 99 from ._spherical_voronoi import SphericalVoronoi 100 from ._plotutils import * /usr/lib/python3.7/importlib/_bootstrap.py in parent(self) KeyboardInterrupt:
%%time
csv_url = nrt.DEFAULT_CSV_URL_PREFIX + nrt.DATA_FILES[factor]
df = nrt.format(nrt.get(csv_url))
changelog = nrt.format_changelog(nrt.get(nrt.DEFAULT_CHANGELOG_URL))
outlierlog = nrt.format_changelog(nrt.get(nrt.DEFAULT_OUTLIERLOG_URL))
df = nrt.filter(df, cluster=cluster)
df = nrt.filter_na(df, factor)
%%time
nrt.plot_latest_distribution(df, factor)
%%time
marked=nrt.mark_weird(df, changelog, outlierlog, nmin=10, keep=5, window=5, naive=False, confidence=confidence, cols=[factor])
nb_weird = len(marked[marked.weird.isin({'positive', 'negative'})])
nb_total = len(marked[marked.weird != 'NA'])
print(f'{nb_weird/nb_total*100:.2f}% of measures are abnormal ({nb_weird}/{nb_total})')
%%time
import plotnine
nb_unique = len(marked[['node', 'cpu']].drop_duplicates())
height = max(6, nb_unique/8)
old_sizes = tuple(plotnine.options.figure_size)
plotnine.options.figure_size = (10, height)
print(nrt.plot_overview_raw_data(marked, changelog))
plotnine.options.figure_size = old_sizes
The goal of the following cells is to detect the eventual anomalies for the considered metric (performance, frequency or temperature).
Suppose that we have made 20 different experiments with a given CPU on a given node and measured its average temperature each time. We therefore have a list of 20 values. We can now compute:
For instance, we may have $\mu \approx 64.7°C$ and $\sigma \approx 3.2°C$.
Now, suppose that we perform a new experiment. This time, this CPU has an average temperature of $70°C$. This new temperature measure is higher than the mean of the 20 previous ones, but was it significantly too high? What was the probability of having a temperature at least as high if nothing changed on the CPU?
In the evolution plots, we show the observed values with a prediction region $\mu \pm \alpha\times\sigma$, where the factor $\alpha$ is defined for a given confidence. With a conficence of 99.99%, if nothing has changed on the CPU, then 99.99% of the measures will fall in the prediction region. In other words, if a measure fall outside of this region, then there is probably something unusual that happened on this CPU at this time. The factor $\alpha$ is computed using the quantile function of either the normal distribution or the F distribution.
Back to our example, if we use the normal distribution, with a 99.99% confidence $\alpha \approx 3.89$ and the associated prediction region is $[52.3°C, 77.1°C]$. Our latest observation of $70°C$ falls in this region, so we consider that there is nothing unusual here.
In the overview plots, the question is the other way around. We estimate what was the probability to observe a value as high (or as low) given the prior knowledge we had ($\mu$ and $\sigma$). First, we compute this probability (also called likelihood) using the cumulative distribution function of either the normal distribution or the F distribution. This probability can be very low, so for an easier visualization we take its logarithm. This new value, called log-likelihood, is always negative. For a better visualization, we then give it a sign (positive if the new observation is higher than the mean, negative otherwise). We also bound it to reasonable values to not distort too much the color scale.
Back to our example, if we use the normal distribution, the probability to observe a value at least as high as $70°C$ was $L \approx 0.049$. The log-likelihood is thus $LL \approx -3.02$. Finally, the new observation was higher than the mean, so we give it a positive sign: the final value is $3.02$.
%%time
plotnine.options.figure_size = (10, height)
print(nrt.plot_overview(marked, changelog, confidence=confidence, discretize=True))
plotnine.options.figure_size = old_sizes
%%time
node_limit = None if factor.startswith('mean') else 1
tmp = nrt.plot_evolution_cluster(marked, changelog=changelog, node_limit=node_limit)
%%time
plotnine.options.figure_size = (10, height)
print(nrt.plot_overview_windowed(marked, changelog, confidence=confidence, discretize=True))
plotnine.options.figure_size = old_sizes
%%time
import warnings
warnings.filterwarnings("ignore")
node_limit = None if factor.startswith('mean') else 1
tmp = nrt.plot_evolution_cluster_windowed(marked, changelog=changelog, node_limit=node_limit)