Vote Up 0 Vote Down

data mining improve model performance

How to engineer new features to improve model performance, given the input contains numerical and categorical features.

I go fisrt, for categorical features I often count how many times each appears, and also combine two or more categorical ones to form a new categorical. For numerical ones, I scale them. Any idea how to combine numerical and categorical to form new features ? Typically I use Random Forest and Logistic Regression. ..


Vote Up 0 Vote Down
Just to throw something out there, you could create quadratic terms from the numerical values, like X1 squared, etc. Evaluate the performance.

For example, lets say you have X1,X2,X3 as numerical features and you're performance isn't as good as you wanted it to be. You can try to install a new feature, X4, which is defined as X1^2, so you can add an extra degree of freedom and possibly fit the data better (without overfitting). I said X1^2 as an arbitrary choice, but you can say X1^3, sqrt(X1), etc. I'm assuming you're trying to do a classification problem? ..
flag | link |

Your Answer

Login before answering

Login with facebook