Full ML Pipeline (Scaling, Training, Scoring)

Project 5 — Full ML Pipeline (Scaling, Training, Scoring)

File: project5_pipeline.py

Purpose: a full clean pipeline with StandardScaler + SVM on Iris dataset, showing training, evaluation, and saving pipeline with joblib.

# project5_pipeline.py
from sklearn.datasets import load_iris
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split, GridSearchCV
import joblib
import numpy as np

def main():
    iris = load_iris()
    X, y = iris.data, iris.target
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    pipeline = Pipeline([
        ("scaler", StandardScaler()),
        ("svc", SVC(probability=True))
    ])

    # Optionally perform a grid search for hyperparams
    param_grid = {
        "svc__C": [0.1, 1.0, 10],
        "svc__kernel": ["rbf", "linear"]
    }
    grid = GridSearchCV(pipeline, param_grid, cv=5)
    grid.fit(X_train, y_train)
    print("Best params:", grid.best_params_)

    best_model = grid.best_estimator_
    test_score = best_model.score(X_test, y_test)
    print(f"Test accuracy: {test_score:.4f}")

    # Save pipeline
    joblib.dump(best_model, "iris_pipeline.joblib")
    print("Saved pipeline to iris_pipeline.joblib")

    # Example of loading and predicting
    loaded = joblib.load("iris_pipeline.joblib")
    sample = X_test[0].reshape(1, -1)
    pred = loaded.predict(sample)
    pred_proba = loaded.predict_proba(sample)
    print("Sample true:", y_test[0], "Pred:", pred[0], "Proba:", pred_proba)

if __name__ == "__main__":
    main()

Requirements

pip install scikit-learn joblib

Final Notes and Best Practices

GPU: Training CNNs/LSTMs is faster with a GPU. If you have no GPU, reduce epochs or batch size.
Model files:
- Project 1 saves cnn_mnist.h5
- Project 2 saves lstm_sentiment.h5
- Project 4 saves stock_lstm.h5
- Project 5 saves iris_pipeline.joblib
Flask deployment (Project 3) expects cnn_mnist.h5. You can deploy any model by adjusting input preprocessing and output labels.
Security & Ethics: When deploying models (especially for finance, health, or people), consider fairness, privacy, and regulatory compliance.
Troubleshooting: If Keras/TensorFlow raises GPU or compatibility issues, ensure TensorFlow version matches your Python environment. A common stable install is pip install tensorflow==2.11.0 (change as needed).

machine-learning-ai Topics

machine-learning-ai Tutorial

Project 5 — Full ML Pipeline (Scaling, Training, Scoring)

Final Notes and Best Practices

More Projects