Full ML Pipeline (Scaling, Training, Scoring)
AdvancedProject 5 — Full ML Pipeline (Scaling, Training, Scoring)
File: project5_pipeline.py
Purpose: a full clean pipeline with StandardScaler + SVM on Iris dataset, showing training, evaluation, and saving pipeline with joblib.
# project5_pipeline.py
from sklearn.datasets import load_iris
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split, GridSearchCV
import joblib
import numpy as np
def main():
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
pipeline = Pipeline([
("scaler", StandardScaler()),
("svc", SVC(probability=True))
])
# Optionally perform a grid search for hyperparams
param_grid = {
"svc__C": [0.1, 1.0, 10],
"svc__kernel": ["rbf", "linear"]
}
grid = GridSearchCV(pipeline, param_grid, cv=5)
grid.fit(X_train, y_train)
print("Best params:", grid.best_params_)
best_model = grid.best_estimator_
test_score = best_model.score(X_test, y_test)
print(f"Test accuracy: {test_score:.4f}")
# Save pipeline
joblib.dump(best_model, "iris_pipeline.joblib")
print("Saved pipeline to iris_pipeline.joblib")
# Example of loading and predicting
loaded = joblib.load("iris_pipeline.joblib")
sample = X_test[0].reshape(1, -1)
pred = loaded.predict(sample)
pred_proba = loaded.predict_proba(sample)
print("Sample true:", y_test[0], "Pred:", pred[0], "Proba:", pred_proba)
if __name__ == "__main__":
main()Requirements
pip install scikit-learn joblibFinal Notes and Best Practices
- GPU: Training CNNs/LSTMs is faster with a GPU. If you have no GPU, reduce epochs or batch size.
- Model files:
- Project 1 saves
cnn_mnist.h5 - Project 2 saves
lstm_sentiment.h5 - Project 4 saves
stock_lstm.h5 - Project 5 saves
iris_pipeline.joblib
- Project 1 saves
- Flask deployment (Project 3) expects
cnn_mnist.h5. You can deploy any model by adjusting input preprocessing and output labels. - Security & Ethics: When deploying models (especially for finance, health, or people), consider fairness, privacy, and regulatory compliance.
- Troubleshooting: If Keras/TensorFlow raises GPU or compatibility issues, ensure TensorFlow version matches your Python environment. A common stable install is
pip install tensorflow==2.11.0(change as needed).