Skip to content

Models Reference

This page documents the models sub-package.


medpipe.models.Calibrator

Calibrator class.

This class creates a Calibrator to calibrate predictions.

Calibrator

Class that creates a Calibrator.

Attributes:

Name Type Description
model LogisticRegression or IsotonicRegression

Calibrator model.

model_type {logistic, isotonic}

Model type.

logger logging.Logger or None, default: None

Logger object to log prints. If None print to terminal.

Methods:

Name Description
__init__

Initialise a Calibrator class instance.

_set_model

Set the model to default parameters.

fit

Fits the predictor based on input data.

predict_proba

Predicts probabilities from input data.

predict

Predicts labels from input data.

Source code in src/medpipe/models/Calibrator.py
class Calibrator:
    """
    Class that creates a Calibrator.

    Attributes
    ----------
    model : LogisticRegression or IsotonicRegression
        Calibrator model.
    model_type : {"logistic", "isotonic"}
        Model type.
    logger : logging.Logger or None, default: None
        Logger object to log prints. If None print to terminal.

    Methods
    -------
    __init__(model_type, hyperparameters={}, logger=None):
        Initialise a Calibrator class instance.
    _set_model()
        Set the model to default parameters.
    fit(X, y)
        Fits the predictor based on input data.
    predict_proba(X)
        Predicts probabilities from input data.
    predict(X)
        Predicts labels from input data.
    """

    def __init__(self, model_type, hyperparameters={}, logger=None):
        """
        Initialise a Calibrator class instance.

        Parameters
        ----------
        model_type : {"logistic", "isotonic"}
            Model type.
        hyperparameters : dict[str, value]
            Model hyperparameter dictionary.
        logger : logging.Logger or None, default: None
            Logger object to log prints. If None print to terminal.

        Returns
        -------
        None
            Nothing is returned.

        """
        self.model_type = model_type
        self.hyperparameters = hyperparameters
        self.logger = logger
        self.model = []  # Empty list of models (one per class)

        # Create model based on attributes
        self._set_model()

    def _set_model(self, quiet: bool = False):
        """
        Set the model to default parameters.

        Parameters
        ----------
        quiet : bool, default: False
            Flag to create a model without printing.

        Returns
        -------
        None
            Nothing is returned.

        """
        self.model = create_model(
            self.model_type,
            logger=self.logger,
            quiet=quiet,
            **self.hyperparameters,
        )

    def fit(self, X, y):
        """
        Fits the predictor to the training data.

        Parameters
        ----------
        X : array-like of shape (n_samples,)
            Training data.
        y : array-like of shape (n_samples,)
            Prediction data.

        Returns
        -------
        None
            Nothing is returned.

        """
        self.model.fit(X, y)

    def predict_proba(self, X):
        """
        Predicts probabilities from input data.

        Parameters
        ----------
        X : array-like of shape (n_samples,)
            Training data.

        Returns
        -------
        probabilities : np.array of shape (n_samples, 2)
            Predicted probabilities.

        """
        if self.model_type == "isotonic":
            predictions = self.model.predict(X)
            return get_full_proba(expand_dims(predictions, 1))
        else:
            return self.model.predict_proba(X)

    def predict(self, X):
        """
        Predicts labels from input data.

        Parameters
        ----------
        X : array-like of shape (n_samples,)
            Training data.

        Returns
        -------
        predictions : array-like of shape (n_samples,)
            Predicted labels.

        """
        labels = round(self.model.predict(X))
        return array(labels).T

__init__(model_type, hyperparameters={}, logger=None)

Initialise a Calibrator class instance.

Parameters:

Name Type Description Default
model_type (logistic, isotonic)

Model type.

"logistic"
hyperparameters dict[str, value]

Model hyperparameter dictionary.

{}
logger Logger or None

Logger object to log prints. If None print to terminal.

None

Returns:

Type Description
None

Nothing is returned.

Source code in src/medpipe/models/Calibrator.py
def __init__(self, model_type, hyperparameters={}, logger=None):
    """
    Initialise a Calibrator class instance.

    Parameters
    ----------
    model_type : {"logistic", "isotonic"}
        Model type.
    hyperparameters : dict[str, value]
        Model hyperparameter dictionary.
    logger : logging.Logger or None, default: None
        Logger object to log prints. If None print to terminal.

    Returns
    -------
    None
        Nothing is returned.

    """
    self.model_type = model_type
    self.hyperparameters = hyperparameters
    self.logger = logger
    self.model = []  # Empty list of models (one per class)

    # Create model based on attributes
    self._set_model()

fit(X, y)

Fits the predictor to the training data.

Parameters:

Name Type Description Default
X array-like of shape (n_samples,)

Training data.

required
y array-like of shape (n_samples,)

Prediction data.

required

Returns:

Type Description
None

Nothing is returned.

Source code in src/medpipe/models/Calibrator.py
def fit(self, X, y):
    """
    Fits the predictor to the training data.

    Parameters
    ----------
    X : array-like of shape (n_samples,)
        Training data.
    y : array-like of shape (n_samples,)
        Prediction data.

    Returns
    -------
    None
        Nothing is returned.

    """
    self.model.fit(X, y)

predict(X)

Predicts labels from input data.

Parameters:

Name Type Description Default
X array-like of shape (n_samples,)

Training data.

required

Returns:

Name Type Description
predictions array-like of shape (n_samples,)

Predicted labels.

Source code in src/medpipe/models/Calibrator.py
def predict(self, X):
    """
    Predicts labels from input data.

    Parameters
    ----------
    X : array-like of shape (n_samples,)
        Training data.

    Returns
    -------
    predictions : array-like of shape (n_samples,)
        Predicted labels.

    """
    labels = round(self.model.predict(X))
    return array(labels).T

predict_proba(X)

Predicts probabilities from input data.

Parameters:

Name Type Description Default
X array-like of shape (n_samples,)

Training data.

required

Returns:

Name Type Description
probabilities np.array of shape (n_samples, 2)

Predicted probabilities.

Source code in src/medpipe/models/Calibrator.py
def predict_proba(self, X):
    """
    Predicts probabilities from input data.

    Parameters
    ----------
    X : array-like of shape (n_samples,)
        Training data.

    Returns
    -------
    probabilities : np.array of shape (n_samples, 2)
        Predicted probabilities.

    """
    if self.model_type == "isotonic":
        predictions = self.model.predict(X)
        return get_full_proba(expand_dims(predictions, 1))
    else:
        return self.model.predict_proba(X)

medpipe.models.Predictor

Predictor class.

This class creates a Predictor to train and make predictions.

Predictor

Class that creates a Predictor.

Attributes:

Name Type Description
model dict[str, model]

Predictor model. The key is the predicted label, the model is a HistGradientBoostingClassifier.

model_type {hgb - c}

Model type.

hyperparameters dict[str, value]

Model hyperparameter dictionary.

logger logging.Logger or None, default: None

Logger object to log prints. If None print to terminal.

Methods:

Name Description
__init__

Initialise a Predictor class instance.

_set_model

Set the model to default parameters.

fit

Fits the predictor based on input data.

predict_proba

Predicts probabilities from input data.

predict

Predicts labels from input data.

Source code in src/medpipe/models/Predictor.py
class Predictor:
    """
    Class that creates a Predictor.

    Attributes
    ----------
    model : dict[str, model]
        Predictor model. The key is the predicted label, the model
        is a HistGradientBoostingClassifier.
    model_type : {"hgb-c"}
        Model type.
    hyperparameters : dict[str, value]
        Model hyperparameter dictionary.
    logger : logging.Logger or None, default: None
        Logger object to log prints. If None print to terminal.

    Methods
    -------
    __init__(model_type, hyperparameters, logger=None)
        Initialise a Predictor class instance.
    _set_model()
        Set the model to default parameters.
    fit(X_train, y_train, weights=None):
        Fits the predictor based on input data.
    predict_proba(X)
        Predicts probabilities from input data.
    predict(X)
        Predicts labels from input data.
    """

    def __init__(self, model_type, hyperparameters, logger=None):
        """
        Initialise a Predictor class instance.

        Parameters
        ----------
        model_type : {"hgb-c"}
            Model type.
        hyperparameters : dict[str, value]
            Model hyperparameter dictionary.
        logger : logging.Logger or None, default: None
            Logger object to log prints. If None print to terminal.

        Returns
        -------
        None
            Nothing is returned.

        """
        self.model_type = model_type
        self.hyperparameters = hyperparameters
        self.logger = logger

        # Create model based on attributes
        self._set_model()

    def _set_model(self, quiet: bool = False):
        """
        Set the model to default parameters.

        Parameters
        ----------
        quiet : bool, default: False
            Flag to create a model without printing.

        Returns
        -------
        None
            Nothing is returned.

        """
        self.model = create_model(
            self.model_type,
            self.logger,
            quiet=quiet,
            **self.hyperparameters,
        )

    def fit(self, X_train, y_train, weights=None):
        """
        Fits the predictor to the training data.

        Parameters
        ----------
        X_train : array-like of shape (n_samples, n_features)
            Training data.
        y_train : array-like of shape (n_samples,)
            Prediction labels.
        weights : array-like of shape (n_samples,) or None, default: None
            Weights to address class imbalance.

        Returns
        -------
        None
            Nothing is returned.

        """
        self.model.fit(X_train, y_train.squeeze(), sample_weight=weights)

    def predict_proba(self, X):
        """
        Predicts probabilities from input data.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            Training data.

        Returns
        -------
        probabilities : np.array of shape (n_samples, 2)
            Predicted probabilities.

        """
        return self.model.predict_proba(X)

    def predict(self, X):
        """
        Predicts labels from input data.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            Training data.

        Returns
        -------
        labels : array-like of shape (n_samples,)
            Predicted labels.

        """
        return self.model.predict(X)

__init__(model_type, hyperparameters, logger=None)

Initialise a Predictor class instance.

Parameters:

Name Type Description Default
model_type hgb - c

Model type.

"hgb-c"
hyperparameters dict[str, value]

Model hyperparameter dictionary.

required
logger Logger or None

Logger object to log prints. If None print to terminal.

None

Returns:

Type Description
None

Nothing is returned.

Source code in src/medpipe/models/Predictor.py
def __init__(self, model_type, hyperparameters, logger=None):
    """
    Initialise a Predictor class instance.

    Parameters
    ----------
    model_type : {"hgb-c"}
        Model type.
    hyperparameters : dict[str, value]
        Model hyperparameter dictionary.
    logger : logging.Logger or None, default: None
        Logger object to log prints. If None print to terminal.

    Returns
    -------
    None
        Nothing is returned.

    """
    self.model_type = model_type
    self.hyperparameters = hyperparameters
    self.logger = logger

    # Create model based on attributes
    self._set_model()

fit(X_train, y_train, weights=None)

Fits the predictor to the training data.

Parameters:

Name Type Description Default
X_train array-like of shape (n_samples, n_features)

Training data.

required
y_train array-like of shape (n_samples,)

Prediction labels.

required
weights array-like of shape (n_samples,) or None

Weights to address class imbalance.

None

Returns:

Type Description
None

Nothing is returned.

Source code in src/medpipe/models/Predictor.py
def fit(self, X_train, y_train, weights=None):
    """
    Fits the predictor to the training data.

    Parameters
    ----------
    X_train : array-like of shape (n_samples, n_features)
        Training data.
    y_train : array-like of shape (n_samples,)
        Prediction labels.
    weights : array-like of shape (n_samples,) or None, default: None
        Weights to address class imbalance.

    Returns
    -------
    None
        Nothing is returned.

    """
    self.model.fit(X_train, y_train.squeeze(), sample_weight=weights)

predict(X)

Predicts labels from input data.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

Training data.

required

Returns:

Name Type Description
labels array-like of shape (n_samples,)

Predicted labels.

Source code in src/medpipe/models/Predictor.py
def predict(self, X):
    """
    Predicts labels from input data.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        Training data.

    Returns
    -------
    labels : array-like of shape (n_samples,)
        Predicted labels.

    """
    return self.model.predict(X)

predict_proba(X)

Predicts probabilities from input data.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

Training data.

required

Returns:

Name Type Description
probabilities np.array of shape (n_samples, 2)

Predicted probabilities.

Source code in src/medpipe/models/Predictor.py
def predict_proba(self, X):
    """
    Predicts probabilities from input data.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        Training data.

    Returns
    -------
    probabilities : np.array of shape (n_samples, 2)
        Predicted probabilities.

    """
    return self.model.predict_proba(X)

medpipe.models.core

Models functions module.

This module provides functions to core functions for models and pipelines.

Functions: - create_model: Creates a new model. - test_model: Tests a model on some test data. - save_pipeline: Pickles a pipeline. - load_pipeline: Loads a pickled pipeline. - get_positive_proba: Returns just the positive label probabilities of the each class. - get_full_proba: Returns probabilities for both labels.

create_model(model_type, logger=None, quiet=False, **config_params)

Creates a AI model.

Parameters:

Name Type Description Default
model_type (hgb - c, logistic, isotonic)

Type of model to create. hgb-c: histogram gradient boosting classifier. logistic: logistic regression. isotonic: isotonic regression.

"hgb-c"
quiet bool

Flag to create a model without printing.

False
**config_params

Configuration parameters for the model.

{}

Returns:

Name Type Description
model HistGradBoostingClassifier
LogisticRegression, IsotonicRegression,

Created model.

Raises:

Type Description
TypeError

If model_type is not a str. If an unexpected keyword argument is present.

ValueError

If model_type is not "hgb-c", "logistic" or "isotonic".

Source code in src/medpipe/models/core.py
def create_model(
    model_type: str,
    logger=None,
    quiet=False,
    **config_params,
):
    """
    Creates a AI model.

    Parameters
    ----------
    model_type : {"hgb-c", "logistic", "isotonic"}
        Type of model to create.
            hgb-c: histogram gradient boosting classifier.
            logistic: logistic regression.
            isotonic: isotonic regression.
    quiet : bool, default: False
        Flag to create a model without printing.
    **config_params
        Configuration parameters for the model.

    Returns
    -------
    model : HistGradBoostingClassifier
            LogisticRegression, IsotonicRegression,
        Created model.

    Raises
    ------
    TypeError
        If model_type is not a str.
        If an unexpected keyword argument is present.
    ValueError
        If model_type is not "hgb-c", "logistic" or "isotonic".

    """
    if type(model_type) is not str:
        raise TypeError(f"{model_type} shoud be a string")

    match model_type:
        case "hgb-c":
            if not quiet:
                print_message(
                    "Creating a Histogram Gradient Boosting Classifier",
                    logger,
                    SCRIPT_NAME,
                )
            model = HistGradientBoostingClassifier(**config_params)
        case "logistic":
            if not quiet:
                print_message(
                    "Creating a Logistic Regression calibrator", logger, SCRIPT_NAME
                )
            model = LogisticRegression(**config_params)

        case "isotonic":
            if not quiet:
                print_message(
                    "Creating an Isotonic Regression calibrator", logger, SCRIPT_NAME
                )
            model = IsotonicRegression(**config_params)

        case _:
            raise ValueError(f"{model_type} invalid model type. See function docstring")

    return model

get_full_proba(pos_proba)

Returns probabilities for both labels.

Parameters:

Name Type Description Default
pos_proba array-like of shape (n_samples, n_classes)

Probabilities of the positive labels for each class.

required

Returns:

Name Type Description
probabilities array-like of shape (n_classes, (n_samples, 2))

Probabilities for each class.

Source code in src/medpipe/models/core.py
def get_full_proba(pos_proba):
    """
    Returns probabilities for both labels.

    Parameters
    ----------
    pos_proba : array-like of shape (n_samples, n_classes)
        Probabilities of the positive labels for each class.

    Returns
    -------
    probabilities : array-like of shape (n_classes, (n_samples, 2))
        Probabilities for each class.

    """
    probabilities = []  # Empty list for the probabilities

    for i in range(pos_proba.shape[1]):
        probabilities.append(np.array([1 - pos_proba[:, i], pos_proba[:, i]]).T)

    if len(probabilities) == 1:
        return probabilities[0]
    else:
        return probabilities

get_positive_proba(probabilities)

Returns just the positive label probabilities of the each class.

Parameters:

Name Type Description Default
probabilities array-like of shape (n_classes, (n_samples, 2))

Probabilities for each class.

required

Returns:

Name Type Description
pos_proba array-like of shape (n_samples, n_classes)

Probabilities of the positive labels for each class.

Source code in src/medpipe/models/core.py
def get_positive_proba(probabilities):
    """
    Returns just the positive label probabilities of the each class.

    Parameters
    ----------
    probabilities : array-like of shape (n_classes, (n_samples, 2))
        Probabilities for each class.

    Returns
    -------
    pos_proba : array-like of shape (n_samples, n_classes)
        Probabilities of the positive labels for each class.

    """
    if type(probabilities) is type(np.array([])):
        return np.expand_dims(probabilities[:, 1], 1)

    pos_proba = np.zeros((probabilities[0].shape[0], len(probabilities)))
    for i, proba in enumerate(probabilities):
        pos_proba[:, i] = proba[:, 1]

    return pos_proba

load_pipeline(load_file)

Loads a saved Pipeline from a .pkl file.

Parameters:

Name Type Description Default
load_file str

Path to the file to load the Pipeline from.

required

Returns:

Name Type Description
pipeline Pipeline

Loaded pipeline.

Raises:

Type Description
TypeError

If load_file is not a str.

FileNotFoundError

If load_file does not exist.

IsADirectoryError

If load_file is a directory.

ValueError

If load_file extension is not .pkl file.

Source code in src/medpipe/models/core.py
def load_pipeline(load_file: str):
    """
    Loads a saved Pipeline from a .pkl file.

    Parameters
    ----------
    load_file : str
        Path to the file to load the Pipeline from.

    Returns
    -------
    pipeline : Pipeline
        Loaded pipeline.

    Raises
    ------
    TypeError
        If load_file is not a str.
    FileNotFoundError
        If load_file does not exist.
    IsADirectoryError
        If load_file is a directory.
    ValueError
        If load_file extension is not .pkl file.

    """
    file_checks(load_file, ".pkl")

    with open(load_file, "rb") as f:
        pipeline = pickle.load(f)

    return pipeline

save_pipeline(pipeline, save_file, extension='.pkl')

Saves a Pipeline to file.

Parameters:

Name Type Description Default
pipeline Pipeline

Pipeline to save.

required
save_file str

Path to the file to save the model.

required
extension str

Extension of the save file.

".pkl"

Returns:

Type Description
None

Nothing is returned.

Raises:

Type Description
TypeError

If save_file is not a str.

FileNotFoundError

If save_file does not exist.

IsADirectoryError

If save_file is a directory.

ValueError

If save_file extension is not extension.

Source code in src/medpipe/models/core.py
def save_pipeline(pipeline, save_file, extension=".pkl") -> None:
    """
    Saves a Pipeline to file.

    Parameters
    ----------
    pipeline : Pipeline
        Pipeline to save.
    save_file : str
        Path to the file to save the model.
    extension : str, default: ".pkl"
        Extension of the save file.

    Returns
    -------
    None
        Nothing is returned.

    Raises
    ------
    TypeError
        If save_file is not a str.
    FileNotFoundError
        If save_file does not exist.
    IsADirectoryError
        If save_file is a directory.
    ValueError
        If save_file extension is not extension.

    """
    file_checks(save_file, extension, exists=False)
    with open(save_file, "wb") as f:
        pickle.dump(pipeline, f)

test_model(y_test, y_pred, y_pred_proba)

Computes different metrics to test the model.

Parameters:

Name Type Description Default
y_test array-like of shape (n_samples, n_classes)

Ground truth test labels.

required
y_pred array-like of shape (n_samples, n_classes)

Predicted labels.

required
y_pred_proba np.array (n_classes,) of arrays (n_samples, 2)

Predicted probabilities.

required

Returns:

Name Type Description
metric_dict dict[str, dict[str, list[float or tuple(array-like)]]

Dictionary of the model performance for one fold. Keys are the metric name and values are the metric value. The test metrics used are: - accuracy - f1 - precision - recall - log_loss - roc (Receiver Operator Characteristic) - auroc (Area Under Receiver Operator Characteristic) - prc (Precision-Recall Curve) - ap (Average Precision)

Raises:

Type Description
TypeError

If X_test or y_test are not an array-like.

ValueError

If X_test and y_test do not have the same dimensions.

Source code in src/medpipe/models/core.py
def test_model(y_test, y_pred, y_pred_proba):
    """
    Computes different metrics to test the model.

    Parameters
    ----------
    y_test : array-like of shape (n_samples, n_classes)
        Ground truth test labels.
    y_pred : array-like of shape (n_samples, n_classes)
        Predicted labels.
    y_pred_proba : np.array (n_classes,) of arrays (n_samples, 2)
        Predicted probabilities.

    Returns
    -------
    metric_dict : dict[str, dict[str, list[float or tuple(array-like)]]
        Dictionary of the model performance for one fold.
        Keys are the metric name and values are the metric value.
        The test metrics used are:
         - accuracy
         - f1
         - precision
         - recall
         - log_loss
         - roc (Receiver Operator Characteristic)
         - auroc (Area Under Receiver Operator Characteristic)
         - prc (Precision-Recall Curve)
         - ap (Average Precision)

    Raises
    ------
    TypeError
        If X_test or y_test are not an array-like.
    ValueError
        If X_test and y_test do not have the same dimensions.

    """
    # Check that inputs are correct
    array_check(y_pred)
    array_check(y_pred_proba)

    metric_dict = compute_pred_metrics(
        ["accuracy", "f1", "recall", "precision"], y_test, y_pred
    )
    metric_dict.update(
        compute_score_metrics(
            ["roc", "auroc", "prc", "ap", "log_loss"], y_test, y_pred_proba
        )
    )
    return metric_dict