Python机器学习与深度学习速查宝典：从基础到进阶的代码指南

作者：carzy2025.09.19 17:05浏览量：0

简介：本文为Python开发者提供机器学习与深度学习的核心代码速查表，涵盖Scikit-learn、TensorFlow/Keras、PyTorch三大框架，包含数据预处理、模型构建、训练调优等全流程示例，助力快速实现AI项目落地。

一、机器学习基础代码速查

1.1 数据预处理核心模块

Scikit-learn数据标准化

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)  # 训练集拟合+转换
X_test_scaled = scaler.transform(X_test)        # 测试集直接转换

标准化是消除特征量纲影响的关键步骤，尤其适用于基于距离的算法（如SVM、KNN）。需注意测试集必须使用训练集的scaler参数，避免数据泄露。

类别特征编码

from sklearn.preprocessing import OneHotEncoder, LabelEncoder
# 独热编码（适用于非序数类别）
encoder = OneHotEncoder(sparse_output=False)
X_cat_encoded = encoder.fit_transform(X_cat.reshape(-1,1))
# 标签编码（适用于序数类别）
label_encoder = LabelEncoder()
y_encoded = label_encoder.fit_transform(y)

独热编码会显著增加稀疏矩阵维度，对于高基数类别特征可考虑目标编码（Target Encoding）等进阶方法。

1.2 经典模型实现

线性回归与正则化

from sklearn.linear_model import LinearRegression, Ridge, Lasso
# 普通线性回归
lr = LinearRegression().fit(X_train, y_train)
# L2正则化（岭回归）
ridge = Ridge(alpha=1.0).fit(X_train, y_train)
# L1正则化（Lasso回归）
lasso = Lasso(alpha=0.1).fit(X_train, y_train)

正则化参数alpha控制模型复杂度，可通过交叉验证选择最优值。Lasso在特征选择场景下具有天然优势。

树模型集成方法

from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
# 随机森林
rf = RandomForestClassifier(n_estimators=100, max_depth=5)
rf.fit(X_train, y_train)
# 梯度提升树
gbm = GradientBoostingClassifier(n_estimators=200, learning_rate=0.1)
gbm.fit(X_train, y_train)

树模型对异常值不敏感，但需注意max_depth、min_samples_split等参数调优。对于大规模数据，XGBoost/LightGBM可提升训练效率。

二、深度学习框架核心代码

2.1 TensorFlow/Keras速查

全连接网络构建

import tensorflow as tf
from tensorflow.keras import layers, models
model = models.Sequential([
    layers.Dense(64, activation='relu', input_shape=(784,)),
    layers.Dropout(0.2),
    layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
model.fit(X_train, y_train, epochs=10, batch_size=32)

关键点：

输入shape需与数据维度匹配
Dropout层防止过拟合，典型值0.2-0.5
对于多分类问题，输出层激活函数使用softmax

CNN图像分类

model = models.Sequential([
    layers.Conv2D(32, (3,3), activation='relu', input_shape=(28,28,1)),
    layers.MaxPooling2D((2,2)),
    layers.Conv2D(64, (3,3), activation='relu'),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(10, activation='softmax')
])

CNN设计原则：

浅层提取边缘特征，深层提取抽象特征
常用3×3卷积核，步长默认为1
池化层通常采用2×2最大池化

2.2 PyTorch实现范式

动态计算图示例

import torch
import torch.nn as nn
import torch.optim as optim
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(784, 128)
        self.fc2 = nn.Linear(128, 10)
    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x
model = Net()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
# 训练循环
for epoch in range(10):
    optimizer.zero_grad()
    outputs = model(X_train_torch)
    loss = criterion(outputs, y_train_torch)
    loss.backward()
    optimizer.step()

PyTorch核心优势：

动态图机制支持灵活模型设计
手动梯度清零（zero_grad()）避免梯度累积
支持GPU加速只需model.to(‘cuda’)

三、进阶技巧与最佳实践

3.1 模型调优策略

学习率调度

# TensorFlow学习率衰减
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate=1e-2,
    decay_steps=1000,
    decay_rate=0.9)
optimizer = tf.keras.optimizers.Adam(learning_rate=lr_schedule)
# PyTorch余弦退火
scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=50)

常见调度策略：

指数衰减：适合稳定收敛阶段
余弦退火：帮助跳出局部最优
预热学习率：防止训练初期震荡

3.2 分布式训练示例

TensorFlow多GPU训练

strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
    model = create_model()  # 在strategy作用域内创建模型
    model.compile(...)
model.fit(train_dataset, epochs=10)

PyTorch数据并行

model = nn.DataParallel(model).cuda()  # 自动使用所有可见GPU
# 需确保batch_size是GPU数量的整数倍

分布式训练要点：

同步更新需处理梯度聚合
通信开销随GPU数量增加而增大
批大小（batch_size）需相应调整

四、生产环境部署代码

4.1 模型导出与加载

TensorFlow SavedModel格式

# 导出模型
model.save('path/to/model')
# 加载模型
loaded_model = tf.keras.models.load_model('path/to/model')

PyTorch TorchScript

# 跟踪模式导出
traced_script_module = torch.jit.trace(model, example_input)
traced_script_module.save("model.pt")
# 加载模型
loaded_model = torch.jit.load("model.pt")

模型导出注意事项：

包含自定义层时需注册
检查输入输出shape是否匹配
量化模型可减小体积但可能损失精度

4.2 ONNX跨框架转换

# PyTorch转ONNX
dummy_input = torch.randn(1, 784)
torch.onnx.export(model, dummy_input, "model.onnx")
# TensorFlow转ONNX
import tf2onnx
model_proto, _ = tf2onnx.convert.from_keras(model, output_path="model.onnx")

ONNX优势：

跨框架部署（支持TensorRT、OpenVINO等）
硬件加速优化
版本兼容性管理

本速查表覆盖了从数据预处理到生产部署的全流程代码模板，开发者可根据具体场景调整参数和结构。建议结合官方文档进行深度学习，例如TensorFlow的tf.keras.utils.plot_model()可视化工具和PyTorch的torchsummary库可辅助模型调试。实际应用中需特别注意数据泄露问题，确保在交叉验证或训练测试分割时严格遵循时间顺序（针对时序数据）或独立同分布原则。

发表评论

开发者关注产品榜

最热文章

关于作者

被阅读数
被赞数
被收藏数

开发者热搜

Python机器学习与深度学习速查宝典：从基础到进阶的代码指南

一、机器学习基础代码速查

1.1 数据预处理核心模块

1.2 经典模型实现

二、深度学习框架核心代码

2.1 TensorFlow/Keras速查

2.2 PyTorch实现范式

三、进阶技巧与最佳实践

3.1 模型调优策略

3.2 分布式训练示例

四、生产环境部署代码

4.1 模型导出与加载

4.2 ONNX跨框架转换

相关文章推荐

文心一言接入指南：通过百度智能云千帆大模型平台API调用

从 MLOps 到 LMOps 的关键技术嬗变

Sugar BI教你怎么做数据可视化 - 拓扑图，让节点连接信息一目了然

更轻量的百度百舸，CCE Stack 智算版发布

打造合规数据闭环，加速自动驾驶技术研发

LMOps 工具链与千帆大模型平台

发表评论

开发者关注产品榜

千帆大模型服务与开发平台ModelBuilder

千帆大模型应用开发平台AppBuilder

秒哒-生成式应用开发平台

百度智能云客悦智能客服平台

最热文章

关于作者