How can I train an XGBoost model on a GPU but run predictions on CPU without allocating any GPU RAM?
My situation: I create an XGBoot model (tree_method='gpu_hist') in Python with predictor='cpu_predictor', then I train it on GPU, then I save (pickle) it to disk, then I read the model from disk, then I use it for predictions.
My problem: once the model starts doing predictions, even though I run it on CPU, it still allocates some small amount of GPU RAM (around ~289MB). This is a problem for the following reasons:
I run multiple copies of the model to parallelize predictions and if I run too many, the prediction processes crash.
I can not use GPU for training other models, if I run predictions on the same machine at the same time.
So, how can one tell XGBoost to not allocate any GPU RAM and use CPU and regular RAM only for predictions?