Models Reference

This page documents the models sub-package.

`medpipe.models.Calibrator`

Calibrator class.

This class creates a Calibrator to calibrate predictions.

`Calibrator`

Class that creates a Calibrator.

Attributes:

Name	Type	Description
`model`	`LogisticRegression or IsotonicRegression`	Calibrator model.
`model_type`	`{logistic, isotonic}`	Model type.
`logger`	`logging.Logger or None, default: None`	Logger object to log prints. If None print to terminal.

Methods:

Name	Description
`__init__`	Initialise a Calibrator class instance.
`_set_model`	Set the model to default parameters.
`fit`	Fits the predictor based on input data.
`predict_proba`	Predicts probabilities from input data.
`predict`	Predicts labels from input data.

Source code in src/medpipe/models/Calibrator.py

class Calibrator:
    """
    Class that creates a Calibrator.

    Attributes
    ----------
    model : LogisticRegression or IsotonicRegression
        Calibrator model.
    model_type : {"logistic", "isotonic"}
        Model type.
    logger : logging.Logger or None, default: None
        Logger object to log prints. If None print to terminal.

    Methods
    -------
    __init__(model_type, hyperparameters={}, logger=None):
        Initialise a Calibrator class instance.
    _set_model()
        Set the model to default parameters.
    fit(X, y)
        Fits the predictor based on input data.
    predict_proba(X)
        Predicts probabilities from input data.
    predict(X)
        Predicts labels from input data.
    """

    def __init__(self, model_type, hyperparameters={}, logger=None):
        """
        Initialise a Calibrator class instance.

        Parameters
        ----------
        model_type : {"logistic", "isotonic"}
            Model type.
        hyperparameters : dict[str, value]
            Model hyperparameter dictionary.
        logger : logging.Logger or None, default: None
            Logger object to log prints. If None print to terminal.

        Returns
        -------
        None
            Nothing is returned.

        """
        self.model_type = model_type
        self.hyperparameters = hyperparameters
        self.logger = logger
        self.model = []  # Empty list of models (one per class)

        # Create model based on attributes
        self._set_model()

    def _set_model(self, quiet: bool = False):
        """
        Set the model to default parameters.

        Parameters
        ----------
        quiet : bool, default: False
            Flag to create a model without printing.

        Returns
        -------
        None
            Nothing is returned.

        """
        self.model = create_model(
            self.model_type,
            logger=self.logger,
            quiet=quiet,
            **self.hyperparameters,
        )

    def fit(self, X, y):
        """
        Fits the predictor to the training data.

        Parameters
        ----------
        X : array-like of shape (n_samples,)
            Training data.
        y : array-like of shape (n_samples,)
            Prediction data.

        Returns
        -------
        None
            Nothing is returned.

        """
        self.model.fit(X, y)

    def predict_proba(self, X):
        """
        Predicts probabilities from input data.

        Parameters
        ----------
        X : array-like of shape (n_samples,)
            Training data.

        Returns
        -------
        probabilities : np.array of shape (n_samples, 2)
            Predicted probabilities.

        """
        if self.model_type == "isotonic":
            predictions = self.model.predict(X)
            return get_full_proba(expand_dims(predictions, 1))
        else:
            return self.model.predict_proba(X)

    def predict(self, X):
        """
        Predicts labels from input data.

        Parameters
        ----------
        X : array-like of shape (n_samples,)
            Training data.

        Returns
        -------
        predictions : array-like of shape (n_samples,)
            Predicted labels.

        """
        labels = round(self.model.predict(X))
        return array(labels).T

`init(model_type, hyperparameters={}, logger=None)`

Initialise a Calibrator class instance.

Parameters:

Name	Type	Description	Default
`model_type`	`(logistic, isotonic)`	Model type.	`"logistic"`
`hyperparameters`	`dict[str, value]`	Model hyperparameter dictionary.	`{}`
`logger`	`Logger or None`	Logger object to log prints. If None print to terminal.	`None`

Returns:

Type	Description
`None`	Nothing is returned.

Source code in src/medpipe/models/Calibrator.py

def __init__(self, model_type, hyperparameters={}, logger=None):
    """
    Initialise a Calibrator class instance.

    Parameters
    ----------
    model_type : {"logistic", "isotonic"}
        Model type.
    hyperparameters : dict[str, value]
        Model hyperparameter dictionary.
    logger : logging.Logger or None, default: None
        Logger object to log prints. If None print to terminal.

    Returns
    -------
    None
        Nothing is returned.

    """
    self.model_type = model_type
    self.hyperparameters = hyperparameters
    self.logger = logger
    self.model = []  # Empty list of models (one per class)

    # Create model based on attributes
    self._set_model()

`fit(X, y)`

Fits the predictor to the training data.

Parameters:

Name	Type	Description	Default
`X`	`array-like of shape (n_samples,)`	Training data.	required
`y`	`array-like of shape (n_samples,)`	Prediction data.	required

Returns:

Type	Description
`None`	Nothing is returned.

Source code in src/medpipe/models/Calibrator.py

def fit(self, X, y):
    """
    Fits the predictor to the training data.

    Parameters
    ----------
    X : array-like of shape (n_samples,)
        Training data.
    y : array-like of shape (n_samples,)
        Prediction data.

    Returns
    -------
    None
        Nothing is returned.

    """
    self.model.fit(X, y)

`predict(X)`

Predicts labels from input data.

Parameters:

Name	Type	Description	Default
`X`	`array-like of shape (n_samples,)`	Training data.	required

Returns:

Name	Type	Description
`predictions`	`array-like of shape (n_samples,)`	Predicted labels.

Source code in src/medpipe/models/Calibrator.py

def predict(self, X):
    """
    Predicts labels from input data.

    Parameters
    ----------
    X : array-like of shape (n_samples,)
        Training data.

    Returns
    -------
    predictions : array-like of shape (n_samples,)
        Predicted labels.

    """
    labels = round(self.model.predict(X))
    return array(labels).T

`predict_proba(X)`

Predicts probabilities from input data.

Parameters:

Name	Type	Description	Default
`X`	`array-like of shape (n_samples,)`	Training data.	required

Returns:

Name	Type	Description
`probabilities`	`np.array of shape (n_samples, 2)`	Predicted probabilities.

Source code in src/medpipe/models/Calibrator.py

def predict_proba(self, X):
    """
    Predicts probabilities from input data.

    Parameters
    ----------
    X : array-like of shape (n_samples,)
        Training data.

    Returns
    -------
    probabilities : np.array of shape (n_samples, 2)
        Predicted probabilities.

    """
    if self.model_type == "isotonic":
        predictions = self.model.predict(X)
        return get_full_proba(expand_dims(predictions, 1))
    else:
        return self.model.predict_proba(X)

`medpipe.models.Predictor`

Predictor class.

This class creates a Predictor to train and make predictions.

`Predictor`

Class that creates a Predictor.

Attributes:

Name	Type	Description
`model`	`dict[str, model]`	Predictor model. The key is the predicted label, the model is a HistGradientBoostingClassifier.
`model_type`	`{hgb - c}`	Model type.
`hyperparameters`	`dict[str, value]`	Model hyperparameter dictionary.
`logger`	`logging.Logger or None, default: None`	Logger object to log prints. If None print to terminal.

Methods:

Name	Description
`__init__`	Initialise a Predictor class instance.
`_set_model`	Set the model to default parameters.
`fit`	Fits the predictor based on input data.
`predict_proba`	Predicts probabilities from input data.
`predict`	Predicts labels from input data.

Source code in src/medpipe/models/Predictor.py

class Predictor:
    """
    Class that creates a Predictor.

    Attributes
    ----------
    model : dict[str, model]
        Predictor model. The key is the predicted label, the model
        is a HistGradientBoostingClassifier.
    model_type : {"hgb-c"}
        Model type.
    hyperparameters : dict[str, value]
        Model hyperparameter dictionary.
    logger : logging.Logger or None, default: None
        Logger object to log prints. If None print to terminal.

    Methods
    -------
    __init__(model_type, hyperparameters, logger=None)
        Initialise a Predictor class instance.
    _set_model()
        Set the model to default parameters.
    fit(X_train, y_train, weights=None):
        Fits the predictor based on input data.
    predict_proba(X)
        Predicts probabilities from input data.
    predict(X)
        Predicts labels from input data.
    """

    def __init__(self, model_type, hyperparameters, logger=None):
        """
        Initialise a Predictor class instance.

        Parameters
        ----------
        model_type : {"hgb-c"}
            Model type.
        hyperparameters : dict[str, value]
            Model hyperparameter dictionary.
        logger : logging.Logger or None, default: None
            Logger object to log prints. If None print to terminal.

        Returns
        -------
        None
            Nothing is returned.

        """
        self.model_type = model_type
        self.hyperparameters = hyperparameters
        self.logger = logger

        # Create model based on attributes
        self._set_model()

    def _set_model(self, quiet: bool = False):
        """
        Set the model to default parameters.

        Parameters
        ----------
        quiet : bool, default: False
            Flag to create a model without printing.

        Returns
        -------
        None
            Nothing is returned.

        """
        self.model = create_model(
            self.model_type,
            self.logger,
            quiet=quiet,
            **self.hyperparameters,
        )

    def fit(self, X_train, y_train, weights=None):
        """
        Fits the predictor to the training data.

        Parameters
        ----------
        X_train : array-like of shape (n_samples, n_features)
            Training data.
        y_train : array-like of shape (n_samples,)
            Prediction labels.
        weights : array-like of shape (n_samples,) or None, default: None
            Weights to address class imbalance.

        Returns
        -------
        None
            Nothing is returned.

        """
        self.model.fit(X_train, y_train.squeeze(), sample_weight=weights)

    def predict_proba(self, X):
        """
        Predicts probabilities from input data.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            Training data.

        Returns
        -------
        probabilities : np.array of shape (n_samples, 2)
            Predicted probabilities.

        """
        return self.model.predict_proba(X)

    def predict(self, X):
        """
        Predicts labels from input data.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            Training data.

        Returns
        -------
        labels : array-like of shape (n_samples,)
            Predicted labels.

        """
        return self.model.predict(X)

`init(model_type, hyperparameters, logger=None)`

Initialise a Predictor class instance.

Parameters:

Name	Type	Description	Default
`model_type`	`hgb - c`	Model type.	`"hgb-c"`
`hyperparameters`	`dict[str, value]`	Model hyperparameter dictionary.	required
`logger`	`Logger or None`	Logger object to log prints. If None print to terminal.	`None`

Returns:

Type	Description
`None`	Nothing is returned.

Source code in src/medpipe/models/Predictor.py

def __init__(self, model_type, hyperparameters, logger=None):
    """
    Initialise a Predictor class instance.

    Parameters
    ----------
    model_type : {"hgb-c"}
        Model type.
    hyperparameters : dict[str, value]
        Model hyperparameter dictionary.
    logger : logging.Logger or None, default: None
        Logger object to log prints. If None print to terminal.

    Returns
    -------
    None
        Nothing is returned.

    """
    self.model_type = model_type
    self.hyperparameters = hyperparameters
    self.logger = logger

    # Create model based on attributes
    self._set_model()

`fit(X_train, y_train, weights=None)`

Fits the predictor to the training data.

Parameters:

Name	Type	Description	Default
`X_train`	`array-like of shape (n_samples, n_features)`	Training data.	required
`y_train`	`array-like of shape (n_samples,)`	Prediction labels.	required
`weights`	`array-like of shape (n_samples,) or None`	Weights to address class imbalance.	`None`

Returns:

Type	Description
`None`	Nothing is returned.

Source code in src/medpipe/models/Predictor.py

def fit(self, X_train, y_train, weights=None):
    """
    Fits the predictor to the training data.

    Parameters
    ----------
    X_train : array-like of shape (n_samples, n_features)
        Training data.
    y_train : array-like of shape (n_samples,)
        Prediction labels.
    weights : array-like of shape (n_samples,) or None, default: None
        Weights to address class imbalance.

    Returns
    -------
    None
        Nothing is returned.

    """
    self.model.fit(X_train, y_train.squeeze(), sample_weight=weights)

`predict(X)`

Predicts labels from input data.

Parameters:

Name	Type	Description	Default
`X`	`array-like of shape (n_samples, n_features)`	Training data.	required

Returns:

Name	Type	Description
`labels`	`array-like of shape (n_samples,)`	Predicted labels.

Source code in src/medpipe/models/Predictor.py

def predict(self, X):
    """
    Predicts labels from input data.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        Training data.

    Returns
    -------
    labels : array-like of shape (n_samples,)
        Predicted labels.

    """
    return self.model.predict(X)

`predict_proba(X)`

Predicts probabilities from input data.

Parameters:

Name	Type	Description	Default
`X`	`array-like of shape (n_samples, n_features)`	Training data.	required

Returns:

Name	Type	Description
`probabilities`	`np.array of shape (n_samples, 2)`	Predicted probabilities.

Source code in src/medpipe/models/Predictor.py

def predict_proba(self, X):
    """
    Predicts probabilities from input data.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        Training data.

    Returns
    -------
    probabilities : np.array of shape (n_samples, 2)
        Predicted probabilities.

    """
    return self.model.predict_proba(X)

`medpipe.models.core`

Models functions module.

This module provides functions to core functions for models and pipelines.

Functions: - create_model: Creates a new model. - test_model: Tests a model on some test data. - save_pipeline: Pickles a pipeline. - load_pipeline: Loads a pickled pipeline. - get_positive_proba: Returns just the positive label probabilities of the each class. - get_full_proba: Returns probabilities for both labels.

`create_model(model_type, logger=None, quiet=False, **config_params)`

Creates a AI model.

Parameters:

Name	Type	Description	Default
`model_type`	`(hgb - c, logistic, isotonic)`	Type of model to create. hgb-c: histogram gradient boosting classifier. logistic: logistic regression. isotonic: isotonic regression.	`"hgb-c"`
`quiet`	`bool`	Flag to create a model without printing.	`False`
`**config_params`		Configuration parameters for the model.	`{}`

Returns:

Name	Type	Description
`model`	`HistGradBoostingClassifier`	`LogisticRegression, IsotonicRegression,` Created model.

Raises:

Type	Description
`TypeError`	If model_type is not a str. If an unexpected keyword argument is present.
`ValueError`	If model_type is not "hgb-c", "logistic" or "isotonic".

Source code in src/medpipe/models/core.py

def create_model(
    model_type: str,
    logger=None,
    quiet=False,
    **config_params,
):
    """
    Creates a AI model.

    Parameters
    ----------
    model_type : {"hgb-c", "logistic", "isotonic"}
        Type of model to create.
            hgb-c: histogram gradient boosting classifier.
            logistic: logistic regression.
            isotonic: isotonic regression.
    quiet : bool, default: False
        Flag to create a model without printing.
    **config_params
        Configuration parameters for the model.

    Returns
    -------
    model : HistGradBoostingClassifier
            LogisticRegression, IsotonicRegression,
        Created model.

    Raises
    ------
    TypeError
        If model_type is not a str.
        If an unexpected keyword argument is present.
    ValueError
        If model_type is not "hgb-c", "logistic" or "isotonic".

    """
    if type(model_type) is not str:
        raise TypeError(f"{model_type} shoud be a string")

    match model_type:
        case "hgb-c":
            if not quiet:
                print_message(
                    "Creating a Histogram Gradient Boosting Classifier",
                    logger,
                    SCRIPT_NAME,
                )
            model = HistGradientBoostingClassifier(**config_params)
        case "logistic":
            if not quiet:
                print_message(
                    "Creating a Logistic Regression calibrator", logger, SCRIPT_NAME
                )
            model = LogisticRegression(**config_params)

        case "isotonic":
            if not quiet:
                print_message(
                    "Creating an Isotonic Regression calibrator", logger, SCRIPT_NAME
                )
            model = IsotonicRegression(**config_params)

        case _:
            raise ValueError(f"{model_type} invalid model type. See function docstring")

    return model

`get_full_proba(pos_proba)`

Returns probabilities for both labels.

Parameters:

Name	Type	Description	Default
`pos_proba`	`array-like of shape (n_samples, n_classes)`	Probabilities of the positive labels for each class.	required

Returns:

Name	Type	Description
`probabilities`	`array-like of shape (n_classes, (n_samples, 2))`	Probabilities for each class.

Source code in src/medpipe/models/core.py

def get_full_proba(pos_proba):
    """
    Returns probabilities for both labels.

    Parameters
    ----------
    pos_proba : array-like of shape (n_samples, n_classes)
        Probabilities of the positive labels for each class.

    Returns
    -------
    probabilities : array-like of shape (n_classes, (n_samples, 2))
        Probabilities for each class.

    """
    probabilities = []  # Empty list for the probabilities

    for i in range(pos_proba.shape[1]):
        probabilities.append(np.array([1 - pos_proba[:, i], pos_proba[:, i]]).T)

    if len(probabilities) == 1:
        return probabilities[0]
    else:
        return probabilities

`get_positive_proba(probabilities)`

Returns just the positive label probabilities of the each class.

Parameters:

Name	Type	Description	Default
`probabilities`	`array-like of shape (n_classes, (n_samples, 2))`	Probabilities for each class.	required

Returns:

Name	Type	Description
`pos_proba`	`array-like of shape (n_samples, n_classes)`	Probabilities of the positive labels for each class.

Source code in src/medpipe/models/core.py

def get_positive_proba(probabilities):
    """
    Returns just the positive label probabilities of the each class.

    Parameters
    ----------
    probabilities : array-like of shape (n_classes, (n_samples, 2))
        Probabilities for each class.

    Returns
    -------
    pos_proba : array-like of shape (n_samples, n_classes)
        Probabilities of the positive labels for each class.

    """
    if type(probabilities) is type(np.array([])):
        return np.expand_dims(probabilities[:, 1], 1)

    pos_proba = np.zeros((probabilities[0].shape[0], len(probabilities)))
    for i, proba in enumerate(probabilities):
        pos_proba[:, i] = proba[:, 1]

    return pos_proba

`load_pipeline(load_file)`

Loads a saved Pipeline from a .pkl file.

Parameters:

Name	Type	Description	Default
`load_file`	`str`	Path to the file to load the Pipeline from.	required

Returns:

Name	Type	Description
`pipeline`	`Pipeline`	Loaded pipeline.

Raises:

Type	Description
`TypeError`	If load_file is not a str.
`FileNotFoundError`	If load_file does not exist.
`IsADirectoryError`	If load_file is a directory.
`ValueError`	If load_file extension is not .pkl file.

Source code in src/medpipe/models/core.py

def load_pipeline(load_file: str):
    """
    Loads a saved Pipeline from a .pkl file.

    Parameters
    ----------
    load_file : str
        Path to the file to load the Pipeline from.

    Returns
    -------
    pipeline : Pipeline
        Loaded pipeline.

    Raises
    ------
    TypeError
        If load_file is not a str.
    FileNotFoundError
        If load_file does not exist.
    IsADirectoryError
        If load_file is a directory.
    ValueError
        If load_file extension is not .pkl file.

    """
    file_checks(load_file, ".pkl")

    with open(load_file, "rb") as f:
        pipeline = pickle.load(f)

    return pipeline

`save_pipeline(pipeline, save_file, extension='.pkl')`

Saves a Pipeline to file.

Parameters:

Name	Type	Description	Default
`pipeline`	`Pipeline`	Pipeline to save.	required
`save_file`	`str`	Path to the file to save the model.	required
`extension`	`str`	Extension of the save file.	`".pkl"`

Returns:

Type	Description
`None`	Nothing is returned.

Raises:

Type	Description
`TypeError`	If save_file is not a str.
`FileNotFoundError`	If save_file does not exist.
`IsADirectoryError`	If save_file is a directory.
`ValueError`	If save_file extension is not extension.

Source code in src/medpipe/models/core.py

def save_pipeline(pipeline, save_file, extension=".pkl") -> None:
    """
    Saves a Pipeline to file.

    Parameters
    ----------
    pipeline : Pipeline
        Pipeline to save.
    save_file : str
        Path to the file to save the model.
    extension : str, default: ".pkl"
        Extension of the save file.

    Returns
    -------
    None
        Nothing is returned.

    Raises
    ------
    TypeError
        If save_file is not a str.
    FileNotFoundError
        If save_file does not exist.
    IsADirectoryError
        If save_file is a directory.
    ValueError
        If save_file extension is not extension.

    """
    file_checks(save_file, extension, exists=False)
    with open(save_file, "wb") as f:
        pickle.dump(pipeline, f)

`test_model(y_test, y_pred, y_pred_proba)`

Computes different metrics to test the model.

Parameters:

Name	Type	Description	Default
`y_test`	`array-like of shape (n_samples, n_classes)`	Ground truth test labels.	required
`y_pred`	`array-like of shape (n_samples, n_classes)`	Predicted labels.	required
`y_pred_proba`	`np.array (n_classes,) of arrays (n_samples, 2)`	Predicted probabilities.	required

Returns:

Name	Type	Description
`metric_dict`	`dict[str, dict[str, list[float or tuple(array-like)]]`	Dictionary of the model performance for one fold. Keys are the metric name and values are the metric value. The test metrics used are: - accuracy - f1 - precision - recall - log_loss - roc (Receiver Operator Characteristic) - auroc (Area Under Receiver Operator Characteristic) - prc (Precision-Recall Curve) - ap (Average Precision)

Raises:

Type	Description
`TypeError`	If X_test or y_test are not an array-like.
`ValueError`	If X_test and y_test do not have the same dimensions.

Source code in src/medpipe/models/core.py

def test_model(y_test, y_pred, y_pred_proba):
    """
    Computes different metrics to test the model.

    Parameters
    ----------
    y_test : array-like of shape (n_samples, n_classes)
        Ground truth test labels.
    y_pred : array-like of shape (n_samples, n_classes)
        Predicted labels.
    y_pred_proba : np.array (n_classes,) of arrays (n_samples, 2)
        Predicted probabilities.

    Returns
    -------
    metric_dict : dict[str, dict[str, list[float or tuple(array-like)]]
        Dictionary of the model performance for one fold.
        Keys are the metric name and values are the metric value.
        The test metrics used are:
         - accuracy
         - f1
         - precision
         - recall
         - log_loss
         - roc (Receiver Operator Characteristic)
         - auroc (Area Under Receiver Operator Characteristic)
         - prc (Precision-Recall Curve)
         - ap (Average Precision)

    Raises
    ------
    TypeError
        If X_test or y_test are not an array-like.
    ValueError
        If X_test and y_test do not have the same dimensions.

    """
    # Check that inputs are correct
    array_check(y_pred)
    array_check(y_pred_proba)

    metric_dict = compute_pred_metrics(
        ["accuracy", "f1", "recall", "precision"], y_test, y_pred
    )
    metric_dict.update(
        compute_score_metrics(
            ["roc", "auroc", "prc", "ap", "log_loss"], y_test, y_pred_proba
        )
    )
    return metric_dict