You must choose features that you think will inform the predictions. Distil will use the chosen features to automatically generate models that predict the target.
Select the features to model
To choose the features that your model should use in its predictions:
Click Add under the horsepower and cylinders features.
In this example, Distil adds the selected features to the list of Features to Model. You can filter or change the type of any of the features to model.Click Remove under the feature name in the Features to Model pane or click **Remove All** above the list of features.
Note the distribution of categories for the cylinders feature. Four, six and eight cylinder engines appear often, while three and five cylinder engines appear much less frequently.
- Click the three-cylinder category to view samples that match the value in the Samples to Model From table.
- To exclude these samples from the model, click Exclude.
- Repeat steps 3–4 for the five-cylinder category as well.
Review the updated Samples to Model From table. Click the column headers to sort by feature and check for extreme values that may indicate problems with the data. Note that one row has a value of 0 for horsepower, which is likely an error.
- To focus on records with this value, drag the slider at the end of the distribution of horsepower values to the left so the range ends at 5.
To remove these records, click Exclude.
Click Excluded Samples to review the records you removed. There is a collection of three and five cylinder vehicles, as well as some four cylinder vehicles with horsepower 0.
Depending on your task, next you can:
- Export the problem to export the model definition without generating models or predicting acceleration
- Generate the models to predict acceleration using horsepower and cylinders
Export the problem
If your task is to discover a problem without generating a predictive model, you can now export your model definition. Otherwise, skip ahead to Generate the Models.
To export the problem:
- Click Export Problem.
Generate the models
If your task to build a predictive model, you can now begin the model generation. Based on your selections, Distil:
- Automatically determines the number and type of models to predict your target feature.
- Uses a subset of the actual data to train the models to predict the target using the features selected in this step.
- Applies the trained models to the rest of the actual data to predict the target feature value in each record.
To begin the model generation with the current feature selections:
- Click Create Models.
In this example, Distil will attempt to predict the acceleration based on the horsepower and cylinders features for a subset of the dataset.
Review the generated models to determine whether they accurately predict the target feature.