特征选择是从原始特征中选择出最具有代表性和影响力的特征,以降低维度、提高模型性能、减少过拟合等。以下是一些常见的特征选择方法的示例代码,使用了scikit-learn
库:
首先,确保已安装 scikit-learn
库,可以通过以下命令安装:
1
| pip install scikit-learn
|
接下来,使用下面的代码示例:
- 方差阈值法:
1 2 3 4 5 6 7 8 9 10 11 12
| from sklearn.feature_selection import VarianceThreshold from sklearn.datasets import make_classification
X, y = make_classification(n_samples=100, n_features=5, random_state=42)
var_threshold = VarianceThreshold(threshold=0.1) X_selected = var_threshold.fit_transform(X)
print("Original number of features:", X.shape[1]) print("Number of selected features:", X_selected.shape[1])
|
- 互信息法:
1 2 3 4 5 6 7 8 9 10 11 12
| from sklearn.feature_selection import SelectKBest, mutual_info_classif from sklearn.datasets import make_classification
X, y = make_classification(n_samples=100, n_features=10, random_state=42)
kbest = SelectKBest(score_func=mutual_info_classif, k=5) X_selected = kbest.fit_transform(X, y)
print("Original number of features:", X.shape[1]) print("Number of selected features:", X_selected.shape[1])
|
- 递归特征消除法:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
| from sklearn.feature_selection import RFE from sklearn.linear_model import LogisticRegression from sklearn.datasets import make_classification
X, y = make_classification(n_samples=100, n_features=10, random_state=42)
estimator = LogisticRegression(solver='liblinear') rfe = RFE(estimator, n_features_to_select=5) X_selected = rfe.fit_transform(X, y)
print("Original number of features:", X.shape[1]) print("Number of selected features:", X_selected.shape[1])
|
在上述代码中,我们分别使用了方差阈值法、互信息法和递归特征消除法进行特征选择。您可以根据问题的需求选择适合的特征选择方法,调整相关参数以达到最佳效果。特征选择是数据预处理的重要一步,可以提高模型性能并降低计算成本。
奇异值分解在特征降维中的作用
奇异值分解在推荐系统中的应用