machine learning - Should I split my data into training/testing/validation sets with k-fold-cross validation?
When evaluating a recommender system, one could split his data into three pieces: training, validation and testing sets. In such case, the training set would be used to learn the recommendation model from data and the validation set would be used to choose the best model or parameters to use. Then, using the chosen model, the user could evaluate the performance of his algorithm using the testing set.
I have found a documentation page for the scikit-learn cross validation (http://scikit-learn.org/stable/modules/cross_validation.html) where it says that is not necessary to split the data into three pieces when using k-fold-cross validation, but only into two: training and testing.
A solution to this problem is a procedure called cross-validation (CV for short). A test set should still be held out for final evaluation, but the validation set is no longer needed when doing CV. In the basic approach, called k-fold CV, the training set is split into k smaller sets (other approaches are described below, but generally follow the same principles).
I am wondering if this would be a good approach. And if so, someone could show me a reference to an article/book backing this theory up?