Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

K-fold cross validation is primarily used in applied machine learning to estimate the performance of a machine learning model on unseen data. It is a re-sampling procedure to evaluate machine learning models in limited data. In summary, the k-fold step splits the data set into k different subsets and iterates over them using one of them as the test set and the remaining k-1 elements as the training set. In the figure blow there is a example for k equals 10 shown.

...

Parameter

  • k: number of in how many pieces the data set gets split

  • output_path: file path in which the output of the k-fold validation gets stored

  • output_field: field in which the predicitons are going to be stored in

  • classifier and label fields are there for inheritance and are not actually used.

  • classifier_params: parameter of a lib.nlp classifier to be used during the k-fold validation

Example

Code Block
{
      "step": "classifier",
      "type": "kfold",
      "k": 5,
      "output_path": "./output.json",
      "output_field": "prediction",
      "label_field": "class",
      "classifier": "none",
      "classifier_params":   {
          "explanation_field": "explanation",
                "input_fields": [
                    "normalized_extract"
                ],
                "label_field": "label",
                "model_kwargs": {
                    "probability": true
                },
                "model_type": "SVC",
                "output_field": "prediction",
                "step": "classifier",
                "type": "sklearn"
        }
  }

Output

It outputs the success rate for each group of the k folds. In addition also lists the overall metrics of the output (in case of multiclass classification the precision, recall and f1-score are macro averaged).

...

This page can now be found at Models on the Squirro Docs site.