Plans for daal4py to support missing values in tree models #682
              
  
  Closed
              
          
                  
                    
                      tpboudreau
                    
                  
                
                  started this conversation in
                General
              
            Replies: 2 comments
-
| Hi @tpboudreau | 
Beta Was this translation helpful? Give feedback.
                  
                    0 replies
                  
                
            -
| this was implemented some time ago - #1276 | 
Beta Was this translation helpful? Give feedback.
                  
                    0 replies
                  
                
            
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
When training data contains missing values, XGBoost will construct trees that direct missing valued observations down whichever path produces the greater gain. Whenever this results in missing values joining the "yes" (or, "less than the breakpoint for the feature") branch, daal4py refuses to load the resulting model for prediction.
For example, using the csv training dataset in the zip file below and running this script:
results in the following error:
(The resulting tree model is also included in the zip file, showing that missing values occasionally join the "yes" observations)
d4p.zip
Since missing values are not uncommon in our data, this limitation reduces daal4py's usefulness in our environment.
Are there any plans for daal4py to support such models in the near future?
Thanks!
EDIT: I'm running daal4py 2021.2.3 (from conda-forge) on Linux, but I don't believe this is a platform specific issue.
Beta Was this translation helpful? Give feedback.
All reactions