Automated Runs¶

Xcessiv includes support for various algorithms that aim to provide automation for things such as hyperparameter optimization and base learner/pipeline construction.

Once you begin an automated run, Xcessiv will take care of updating your base learner setups/base learners for you while you go do something else.

As of v0.4.0, Xcessiv supports two types of automated runs: Bayesian Hyperparameter Search and TPOT base learner construction.

TPOT base learner construction¶

Xcessiv is great for tuning different pipelines/base learners and stacking them together, but with all possible combinations of pipelines, it is a boon to use something that can build that pipeline for you automatically.

This is exactly what TPOT promises to do for you.

As of v0.4, Xcessiv has built-in support for directly exporting the pipeline code generated by TPOT as a base learner setup in Xcessiv.

Right next to the Add new base learner origin button, click on the Automated base learner generation with TPOT button. In the modal that pops up, enter the following code.:

from tpot import TPOTClassifier

tpot_learner = TPOTClassifier(generations=5, population_size=50, verbosity=2)

To use TPOT, simply define a TPOTClassifer or TPOTRegressor and assign it to the variable tpot_learner. The arguments for TPOTClassifer or TPOTRegressor can be found in the TPOT API documentation.

When you click Go, a new automated run will be created that runs tpot_learner on your training data then creates a new base learner setup containing the code for the best pipeline found by TPOT.

Once TPOT is finished, you’ll likely end up with something like this in your newly generated base learner.:

import numpy as np

from sklearn.ensemble import ExtraTreesClassifier
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import Normalizer

# NOTE: Make sure that the class is labeled 'class' in the data file
tpot_data = np.recfromcsv('PATH/TO/DATA/FILE', delimiter='COLUMN_SEPARATOR', dtype=np.float64)
features = np.delete(tpot_data.view(np.float64).reshape(tpot_data.size, -1), tpot_data.dtype.names.index('class'), axis=1)
training_features, testing_features, training_classes, testing_classes = \
    train_test_split(features, tpot_data['class'], random_state=42)

exported_pipeline = make_pipeline(
    Normalizer(norm="max"),
    ExtraTreesClassifier(bootstrap=False, criterion="entropy", max_features=0.15, min_samples_leaf=7, min_samples_split=13, n_estimators=100)
)

exported_pipeline.fit(training_features, training_classes)
results = exported_pipeline.predict(testing_features)

To convert it to an Xcessiv-compatible base learner, remove all the unneeded parts and modify the code to this.:

from sklearn.ensemble import ExtraTreesClassifier
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import Normalizer

base_learner = make_pipeline(
    Normalizer(norm="max"),
    ExtraTreesClassifier(bootstrap=False, criterion="entropy", max_features=0.15, min_samples_leaf=7, min_samples_split=13, n_estimators=100, random_state=8)
)

Notice two changes: we renamed exported_pipeline to base_learner to follow the Xcessiv format, and set the random_state parameter in the sklearn.ensemble.ExtraTreesClassifier object to 8 for determinism.

Set the name, meta-feature generator, and metrics for your base learner setup as usual, then verify and confirm. You will now be able to use your curated pipeline as any other base learner in your Xcessiv workflow.

Bayesian Hyperparameter Search¶

Aside from grid search and random search that were covered in the previous chapter, Xcessiv offers another popular hyperparameter optimization method - Bayesian optimization.

Unlike grid search and random search, where hyperparameters are explored independent of each other, Bayesian optimization records the results of previously explored hyperparameter combinations and uses them to figure out which hyperparameters to try next. Theoretically, this should allow for faster convergence to a local maximum and less time wasted on exploring hyperparameters that are not likely to produce good results.

Keep in mind that there are a few limitations to this method. First, since the hyperparameter combinations to explore are based on previously explored hyperparameters, the Bayesian hyperparameter search cannot take advantage of multiple Xcessiv workers in the same way as Grid Search and Random Search. All hyperparameter combinations are explored by a single worker.

Second, Bayesian optimization can only explore numerical hyperparameters. A hyperparameter that takes only strings (e.g. criterion in sklearn.ensemble.RandomForestClassifier), cannot be tuned with Bayesian optimization. Instead, you must set the value or leave it at default before the search begins.

The Bayesian optimization method used by Xcessiv is implemented through the open-source BayesianOptimization Python package.

Let’s begin.

Suppose you’re exploring the hyperparameter space of a scikit-learn Random Forest classifier on some classification data. Your base learner setup will have this code.:

from sklearn.ensemble import RandomForestClassifier

base_learner = RandomForestClassifier(random_state=8)

Make sure you also use “Accuracy” as a metric.

You want to use Bayesian optimization to tune the hyperparameters max_depth, min_samples_split, and min_samples_leaf. After verifying and finalizing the base learner, click the Bayesian Optimization button and enter the following configuration into the code block and hit Go.:

random_state = 8  # Random seed

# Default parameters of base learner
default_params = {
  'n_estimators': 200,
  'criterion': 'entropy'
}

# Min-max bounds of parameters to be searched
pbounds = {
  'max_depth': (10, 300),
  'min_samples_split': (0.001, 0.5),
  'min_samples_leaf': (0.001, 0.5)
}

# List of hyperparameters that should be rounded off to integers
integers = [
  'max_depth'
]

metric_to_optimize = 'Accuracy'  # metric to optimize

invert_metric = False  # Whether or not to invert metric e.g. optimizing a loss

# Configuration to pass to maximize()
maximize_config = {
  'init_points': 2,
  'n_iter': 10,
  'acq': 'ucb',
  'kappa': 5
}

If everything goes well, you should see that an “Automated Run” has started. From here, you can just watch as the Base Learners list updates with a new entry every time the Bayesian search explores a new hyperparameter combination.

Let’s review the code we used to configure the Bayesian search.

All variables shown need to be defined for Bayesian search to work properly.

First, the random_state parameter is used to seed the Numpy random generator that is used internally by the Bayesian search. You can set this to any integer you like.:

random_state = 8

Next, define the default parameters of your base learner in the default_params dictionary. In our case, we don’t really want to search n_estimators or criterion but we don’t want to leave them at their default values either. This dictionary will set n_estimators to 200 and criterion to “entropy” for base learners produced by the Bayesian search. If default_params is an empty dictionary, the default values for all non-searchable hyperparameters will be used.:

default_params = {
  'n_estimators': 200,
  'criterion': 'entropy'
}

The pbounds variable is a dictionary that maps the hyperparameters to tune with their minimum and maximum values. In our example, max_depth will be searched but kept between 10 and 300, while min_samples_split will be searched but kept between 0.001 and 0.5.:

# Min-max bounds of parameters to be searched
pbounds = {
  'max_depth': (10, 300),
  'min_samples_split': (0.001, 0.5),
  'min_samples_leaf': (0.001, 0.5)
}

integers is an array containing the list of hyperparameters that should be converted to an integer before using it to configure the base learner. In our example max_depth only accepts integer values, so we add it to the list.:

# List of hyperparameters that should be rounded off to integers
integers = [
  'max_depth'
]

metric_to_optimize defines the metric that the Bayesian search will use to determine the effectiveness of a single base learner. In our case, the search optimizes for higher accuracy.

invert_metric must be set to True when the metric you are optimizing is “better” at a lower value. For example, metrics such as the Brier Score Loss and Mean Squared Error are better when they are smaller.:

metric_to_optimize = 'Accuracy'  # metric to optimize

invert_metric = False  # Whether or not to invert metric e.g. optimizing a loss

maximize_config is a dictionary of parameters used by the actual Bayesian search to dictate behavior such as the number of points to explore and the algorithm’s acquisition function. init_points sets the number of initial points to randomly explore before the actual Bayesian search takes over. n_iter sets the number of hyperparameter combinations the Bayesian search will explore. acq and kappa refer to the parameters of the acquisition function and determine the search’s balance between exploration and exploitation. Keys included in maximize_config that are not directly used by the Bayesian search process are passed on to the underlying sklearn.gaussian_process.GaussianProcessRegressor object.:

# Configuration to pass to maximize()
maximize_config = {
  'init_points': 2,
  'n_iter': 10,
  'acq': 'ucb',
  'kappa': 5
}

For more info on setting maximize_config, please see the maximize() method of the bayes_opt.BayesianOptimization class in the BayesianOptimization source code. Seeing this notebook example will also give you some intuition on how the different acquisition function parameters acq, kappa, and xi affect the Bayesian search.

Greedy Forward Model Selection¶

Stacking is usually reserved as the last step of the Xcessiv process, after you’ve squeezed out all you can from pipeline and hyperparameter optimization. When creating stacked ensembles, you can usually expect its performance to be better than any single base learner in the ensemble.

The problem here lies in figuring out which base learners to include in your ensemble. Stacking together the top N base learners is a good first strategy, but not always optimal. Even if a base learner doesn’t perform that well on its own, it could still provide brand new information to the secondary learner, thereby boosting the entire ensemble’s performance even further. One way to look at it is that it provides the secondary learner a new angle to look at the problem and make better judgments moving forward.

Figuring out which base learners to add to a stacked ensemble is much like hyperparameter optimization. You can’t really be sure if something will work until you try it. Unfortunately, trying out every possible combination of base learners is unfeasible when you have hundreds of base learners to choose from.

Xcessiv provides an automated ensemble construction method based on a heuristic process called greedy forward model selection. This method is adapted from Ensemble Selection from Libraries of Models by Caruana et al.

In a nutshell, the algorithm is as follows:

Start with the empty ensemble
Add to the ensemble the model in the library that maximizes the ensemmble’s performance on the error metric.
Repeat step 2 for a fixed number of iterations or until all models have been used.

That’s it!

To perform greedy forward model selection in Xcessiv, simply click on the Automated ensemble search button in the Stacked Ensemble section.

Select your secondary base learner in the configuration modal (Logistic Regression is a good first choice for classification tasks) and copy the following code into the code box and click Go to start your automated run.:

secondary_learner_hyperparameters = {}  # hyperparameters of secondary learner

metric_to_optimize = 'Accuracy'  # metric to optimize

invert_metric = False  # Whether or not to invert metric e.g. optimizing a loss

max_num_base_learners = 6  # Maximum size of ensemble to consider (the higher this is, the longer the run will take)

secondary_learner_hyperparameters is a dictionary containing the hyperparameters for your chosen secondary learner. Again, an empty dictionary signifies default parameters.

metric_to_optimize and invert_metric mean the same things they do as in Bayesian Hyperparameter Search.

max_num_base_learners refers to the total number of iterations of the algorithm. As such, this also signifies the maximum number of base learners that a stacked ensemble found through this automated run can contain. Please note that the higher this number is, the longer the search will run.

Unlike TPOT pipeline construction and Bayesian optimization, which both have an element of randomness, greedy forward model selection will always explore the same ensembles if the pool of base learners remains unchanged.