How can I train an XGBoost model on a GPU but run predictions on CPU without allocating any GPU RAM?
My situation: I create an XGBoot model (tree_method='gpu_hist') in Python with predictor='cpu_predictor', then I train it on GPU, then I save (pickle) it to disk, then I read the model from disk, then I use it for predictions.
My problem: once the model starts doing predictions, even though I run it on CPU, it still allocates some small amount of GPU RAM (around ~289MB). This is a problem for the following reasons:
I run multiple copies of the model to parallelize predictions and if I run too many, the prediction processes crash.
I can not use GPU for training other models, if I run predictions on the same machine at the same time.
So, how can one tell XGBoost to not allocate any GPU RAM and use CPU and regular RAM only for predictions?
Thank you very much for your help!
To run predictions on CPU only, you can specify the predictor argument in the xgboost.XGBModel.predict method as 'cpu_predictor'. This will tell XGBoost to use the CPU for predictions and not allocate any GPU RAM.
For example:
# Load model from disk model = pickle.load(open(model_file, "rb")) # Use CPU for predictions predictions = model.predict(data, predictor='cpu_predictor')
Alternatively, you can also specify the n_jobs parameter in the xgboost.XGBModel.predict method to use multiple CPU cores for predictions. This can help speed up the prediction process if you have multiple CPU cores available.
For example:
# Load model from disk model = pickle.load(open(model_file, "rb")) # Use multiple CPU cores for predictions predictions = model.predict(data, predictor='cpu_predictor', n_jobs=-1)
By default, the n_jobs parameter is set to 1, which means that only a single CPU core will be used for predictions. Setting it to -1 will use all available CPU cores for predictions.