In the Select Model Features step, available features are columns or fields in the chosen dataset. You can use an available feature in the model to predict the target.
A feature whose values consist of a finite set of categories reused across records. For example, in an automotive dataset, a feature that indicates the country of origin of a vehicle would have a finite set of countries as its values.
When you choose to predict a categorical feature, Distil produces classification results. Classification results show the number of correctly and incorrectly predicted target values for each row in the dataset.
A feature whose values consist of a range of real numbers. For example, in an automotive dataset, a feature that indicates the weight of a vehicle would have a set of unique weights as its values.
The collection of features and values that you select to build models that predict features.
In regression models, distance indicates how far from the actual value the predicted value is. Distance is used as a measure of error for each prediction.
In regression results, the error threshold allows you to configure how accurate the model should be. Error is calculated as the distance of the predicted value from the actual value.
A column or field in a dataset that contains a set of values for each record. In an automotive dataset, for example, one feature may be horsepower.
Feature to Model
A feature that you want the model to consider when attempting to predict the target.
When you choose to predict a timeseries feature, Distil produces forecasting results. Forecasting results show the predicted future values for each value in the timeseries feature.
Available features are listed in descending order by how much influence they have on other features in the dataset. Importance is calculated using Principal Component Analysis.
A set of automatically configured algorithms and parameters tunings that attempt to predict a target feature. Models use features you select to attempt to predict the target feature.
A set of automatically calculated values for the target feature generated by the model.
For continuous features (e.g. integers), Distil produces regression results, which break down correct and incorrect results by the error (distance of the predictions from the actual values in the dataset). The acceptable error is configurable for each model.
In the Select Model Features step, Distil shows a representative sample of values for the selected features to model and the target feature.
On the Select Model or Dataset page, the summary describes the contents of the available datasets that you can use in model building.
On the Check Models page, the feature summaries show the range of values in results in the dataset.
The feature that you want the model to predict.
A compound feature that you can create to track a value over time. To create a timeseries feature, you must group together the following features from your selected dataset:
Series ID Columns: The label or name of the variables being tracked across dataset rows. You can construct series ID from one or more features.
Time Column: The timestamp or date/time feature. Serves as the X-axis in the compound feature.
Value Column: The unique value for the group in each row. Serves as the Y-axis in the compound feature.
A value of a particular feature in one row of a dataset.