Interactive Data Visualization Dashboard using Plotly and Pandas

1) Project Overview

What it does
This project builds an interactive web dashboard that loads a tabular dataset (sales / transactions), performs on-the-fly aggregation and filtering with Pandas, and renders multiple linked Plotly visualizations (time-series, category breakdown, scatter, and a data table). Users can filter by date range, region, product category and metric; the charts update reactively.

Real-world use case
Marketing analysts, product managers, and business stakeholders use dashboards like this for sales analysis, cohort analysis, product performance, and KPI monitoring. The dashboard can be the basis for sales monitoring, A/B test results visualization, or operational dashboards.

Technical goals

Build an interactive dashboard with Dash + Plotly Express.
Use Pandas for data cleaning, aggregation, and time handling.
Create reactive callbacks that update multiple linked charts.
Produce exportable CSV of filtered data.
Ship a single-file app that runs locally.

2) Key Technologies & Libraries

Python 3.8+
pandas — data loading and manipulation
plotly (plotly.express) — interactive plots
dash — web application framework for interactive dashboards
dash_table — interactive table component (part of dash)
(Optional) numpy, scikit-learn for more advanced features

Install required packages:

pip install pandas plotly dash

3) Learning Outcomes

Building a reactive dashboard using Dash and Plotly.
Practical Pandas feature engineering (date parsing, aggregations, groupby).
Designing UX for interactive analytics (filters, date pickers, selectors).
Linking multiple visualizations with shared filters and callbacks.
Exporting filtered data and packaging dashboards for stakeholders.

4) Step-by-Step Explanation

Project scaffold — Create a working directory, create app.py. Install packages (pandas, plotly, dash).
Prepare dataset — Either load a CSV or generate a realistic synthetic dataset (the code below generates a sample dataset so you can run immediately).
Build dashboard layout — Add controls: DatePickerRange, dropdowns for Region and Category, metric selector. Add Plotly figures placeholders: time-series, bar/treemap, scatter, and a data table.
Write callbacks — A single callback takes filter inputs and returns updated figures and table rows. Use Pandas to filter and aggregate: daily or monthly series, top categories, KPI cards if desired.
Run and test — Launch with python app.py and open http://127.0.0.1:8050/. Interact with controls and verify charts update.
Iterate — Add export buttons, caching, or authentication for production.

5) Full Working and Verified Python Code

Save this as app.py. It is self-contained: it creates a synthetic dataset if none is provided, builds the Dash app, and serves interactive charts. The code is tested to be syntactically correct.

# app.py
"""
Interactive Data Visualization Dashboard using Plotly and Pandas

Run:
    pip install pandas plotly dash
    python app.py

Open in browser:
    http://127.0.0.1:8050/
"""

from datetime import datetime, timedelta
import random
from pathlib import Path

import pandas as pd
import numpy as np
import plotly.express as px

from dash import Dash, dcc, html, Input, Output, State, dash_table

# -------------------------
# 1) Generate or load data
# -------------------------
DATA_PATH = Path("data")
DATA_PATH.mkdir(exist_ok=True)
SAMPLE_CSV = DATA_PATH / "sales_sample.csv"

def generate_sample_data(path: Path, n_days: int = 365, n_records_per_day: int = 60):
    rng = pd.date_range(end=pd.Timestamp.today(), periods=n_days, freq="D")
    regions = ["North", "South", "East", "West"]
    categories = ["Electronics", "Clothing", "Home", "Sports", "Toys"]
    products = {
        "Electronics": ["Phone", "Laptop", "Headphones"],
        "Clothing": ["T-Shirt", "Jeans", "Jacket"],
        "Home": ["Mug", "Vacuum", "Lamp"],
        "Sports": ["Sneakers", "Ball", "Bike"],
        "Toys": ["Puzzle", "Doll", "Board Game"]
    }

    rows = []
    for day in rng:
        for _ in range(n_records_per_day):
            category = random.choices(categories, weights=[3,3,2,2,1], k=1)[0]
            product = random.choice(products[category])
            region = random.choice(regions)
            price = round(abs(np.random.normal(loc=80 if category == "Electronics" else 35, scale=40)), 2)
            quantity = int(np.random.poisson(lam=1.2)) + 1
            revenue = round(price * quantity, 2)
            customers = max(1, int(np.random.poisson(lam=1.1)))
            order_id = f"ORD{random.randint(100000, 999999)}"
            rows.append({
                "order_id": order_id,
                "date": day + pd.Timedelta(minutes=random.randint(0, 1439)),
                "region": region,
                "category": category,
                "product": product,
                "price": price,
                "quantity": quantity,
                "revenue": revenue,
                "customers": customers
            })
    df = pd.DataFrame(rows)
    df.to_csv(path, index=False)
    return df

if not SAMPLE_CSV.exists():
    print("Generating sample dataset (this may take a few seconds)...")
    df_all = generate_sample_data(SAMPLE_CSV, n_days=180, n_records_per_day=40)
else:
    df_all = pd.read_csv(SAMPLE_CSV, parse_dates=["date"])

# Basic preprocessing
df_all["date"] = pd.to_datetime(df_all["date"])
df_all["day"] = df_all["date"].dt.date
df_all["month"] = df_all["date"].dt.to_period("M").astype(str)

# -------------------------
# 2) Build Dash App
# -------------------------
app = Dash(__name__, title="Interactive Sales Dashboard")
server = app.server

regions = sorted(df_all["region"].unique())
categories = sorted(df_all["category"].unique())
metrics = {
    "Revenue": "revenue",
    "Quantity": "quantity",
    "Average Price": "price",
    "Customers": "customers"
}

app.layout = html.Div([
    html.Div([
        html.H2("Interactive Sales Dashboard", style={"margin-bottom": "5px"}),
        html.Div("Explore sales data with filters and linked visualizations.", style={"color": "#555"}),
    ], style={"padding": "10px 20px"}),

    html.Div([
        html.Div([
            html.Label("Date Range"),
            dcc.DatePickerRange(
                id="date-range",
                min_date_allowed=df_all["date"].min().date(),
                max_date_allowed=df_all["date"].max().date(),
                start_date=(df_all["date"].max() - pd.Timedelta(days=60)).date(),
                end_date=df_all["date"].max().date()
            )
        ], style={"display": "inline-block", "margin-right": 20}),

        html.Div([
            html.Label("Region"),
            dcc.Dropdown(id="region-filter",
                         options=[{"label": r, "value": r} for r in regions],
                         value=regions,
                         multi=True,
                         placeholder="Select regions"),
        ], style={"width": "250px", "display": "inline-block", "margin-right": 20}),

        html.Div([
            html.Label("Category"),
            dcc.Dropdown(id="category-filter",
                         options=[{"label": c, "value": c} for c in categories],
                         value=categories,
                         multi=True),
        ], style={"width": "300px", "display": "inline-block", "margin-right": 20}),

        html.Div([
            html.Label("Metric"),
            dcc.Dropdown(id="metric-select",
                         options=[{"label": k, "value": v} for k, v in metrics.items()],
                         value="revenue",
                         clearable=False),
        ], style={"width": "220px", "display": "inline-block"}),
    ], style={"padding": "10px 20px", "border-bottom": "1px solid #eee", "background-color": "#fafafa"}),

    html.Div(id="kpi-row", style={"display": "flex", "gap": "20px", "padding": "12px 20px"}),

    html.Div([
        html.Div([
            dcc.Graph(id="time-series", config={"displayModeBar": True})
        ], style={"width": "65%", "display": "inline-block", "vertical-align": "top"}),

        html.Div([
            dcc.Graph(id="category-pie", config={"displayModeBar": True}),
            dcc.Graph(id="scatter-qty-price", config={"displayModeBar": True, "displaylogo": False}),
        ], style={"width": "33%", "display": "inline-block", "vertical-align": "top"}),
    ], style={"padding": "10px 20px"}),

    html.Div([
        html.H4("Filtered Data"),
        html.Button("Download CSV", id="download-btn"),
        dcc.Download(id="download-dataframe-csv"),
        dash_table.DataTable(
            id="data-table",
            columns=[{"name": c, "id": c} for c in ["order_id", "date", "region", "category", "product", "price", "quantity", "revenue", "customers"]],
            page_size=10,
            style_table={"overflowX": "auto"},
            style_cell={"textAlign": "left"},
        )
    ], style={"padding": "10px 20px 60px 20px"})
], style={"font-family": "Arial, sans-serif"})

# -------------------------
# 3) Callbacks
# -------------------------
def filter_dataframe(df: pd.DataFrame, start_date, end_date, regions_sel, categories_sel):
    dff = df.copy()
    if start_date:
        dff = dff[dff["date"] >= pd.to_datetime(start_date)]
    if end_date:
        dff = dff[dff["date"] <= pd.to_datetime(end_date) + pd.Timedelta(days=1) - pd.Timedelta(seconds=1)]
    if regions_sel:
        dff = dff[dff["region"].isin(regions_sel)]
    if categories_sel:
        dff = dff[dff["category"].isin(categories_sel)]
    return dff

@app.callback(
    Output("kpi-row", "children"),
    Output("time-series", "figure"),
    Output("category-pie", "figure"),
    Output("scatter-qty-price", "figure"),
    Output("data-table", "data"),
    Input("date-range", "start_date"),
    Input("date-range", "end_date"),
    Input("region-filter", "value"),
    Input("category-filter", "value"),
    Input("metric-select", "value")
)
def update_dashboard(start_date, end_date, regions_sel, categories_sel, metric):
    if isinstance(regions_sel, str):
        regions_sel = [regions_sel]
    if isinstance(categories_sel, str):
        categories_sel = [categories_sel]

    dff = filter_dataframe(df_all, start_date, end_date, regions_sel, categories_sel)

    total_revenue = dff["revenue"].sum()
    total_orders = dff["order_id"].nunique()
    avg_order_value = dff["revenue"].sum() / max(1, total_orders)
    unique_customers = dff["customers"].sum()

    kpis = [
        html.Div([
            html.H6("Total Revenue", style={"margin": "0", "color": "#777"}),
            html.H3(f"${total_revenue:,.2f}", style={"margin": "0"})
        ], style={"padding": "10px", "border": "1px solid #eee", "border-radius": "6px", "width": "200px", "background": "#fff"}),
        html.Div([
            html.H6("Total Orders", style={"margin": "0", "color": "#777"}),
            html.H3(f"{total_orders:,}", style={"margin": "0"})
        ], style={"padding": "10px", "border": "1px solid #eee", "border-radius": "6px", "width": "200px", "background": "#fff"}),
        html.Div([
            html.H6("Avg Order Value", style={"margin": "0", "color": "#777"}),
            html.H3(f"${avg_order_value:,.2f}", style={"margin": "0"})
        ], style={"padding": "10px", "border": "1px solid #eee", "border-radius": "6px", "width": "200px", "background": "#fff"}),
        html.Div([
            html.H6("Estimated Customers", style={"margin": "0", "color": "#777"}),
            html.H3(f"{unique_customers:,}", style={"margin": "0"})
        ], style={"padding": "10px", "border": "1px solid #eee", "border-radius": "6px", "width": "200px", "background": "#fff"}),
    ]

    ts = dff.set_index("date").resample("D").agg({
        "revenue": "sum",
        "quantity": "sum",
        "price": "mean",
        "customers": "sum"
    }).reset_index()

    metric_label = {v: k for k, v in metrics.items()}.get(metric, metric)
    fig_ts = px.line(ts, x="date", y=metric, title=f"{metric_label} over Time", markers=True)
    fig_ts.update_layout(transition_duration=300, template="plotly_white")

    cat_agg = dff.groupby("category").agg(revenue=("revenue","sum"), quantity=("quantity","sum")).reset_index()
    fig_pie = px.pie(cat_agg, names="category", values="revenue", title="Revenue by Category", hole=0.4)
    fig_pie.update_traces(textposition="inside", textinfo="percent+label")

    sample = dff.sample(n=min(len(dff), 1500), random_state=1) if len(dff) > 1500 else dff
    fig_scatter = px.scatter(sample, x="price", y="quantity", color="category",
                             hover_data=["order_id", "region", "product", "revenue"],
                             title="Quantity vs Price (sampled)")

    table_data = dff.sort_values("date", ascending=False).head(100).to_dict("records")

    return kpis, fig_ts, fig_pie, fig_scatter, table_data

@app.callback(
    Output("download-dataframe-csv", "data"),
    Input("download-btn", "n_clicks"),
    State("date-range", "start_date"),
    State("date-range", "end_date"),
    State("region-filter", "value"),
    State("category-filter", "value"),
    prevent_initial_call=True
)
def download_csv(n_clicks, start_date, end_date, regions_sel, categories_sel):
    if isinstance(regions_sel, str):
        regions_sel = [regions_sel]
    if isinstance(categories_sel, str):
        categories_sel = [categories_sel]
    dff = filter_dataframe(df_all, start_date, end_date, regions_sel, categories_sel)
    dff_copy = dff.copy()
    dff_copy["date"] = dff_copy["date"].dt.strftime("%Y-%m-%d %H:%M:%S")
    return dcc.send_data_frame(dff_copy.to_csv, "filtered_sales.csv", index=False)

# -------------------------
# 4) Run server
# -------------------------
if __name__ == "__main__":
    app.run_server(debug=True)

Notes about the code
• Uses a synthetic dataset generator (makes data/sales_sample.csv) so you can run the app immediately.
• Dashboard contains:
  – Date range picker, Region & Category multi-select, Metric selector.
  – KPI cards (Total Revenue, Orders, Avg Order Value, Customers).
  – Time-series line chart (daily aggregated metric).
  – Category revenue pie.
  – Scatter (price vs quantity) for inspection.
  – Data table showing recent filtered rows and a CSV download button.
• All interactive filtering is done with a single callback for efficiency; callback returns multiple outputs.

6) Sample Output or Results

When you run python app.py and open http://127.0.0.1:8050/, you will see:

Top header: Interactive Sales Dashboard with controls below.
KPIs row: tiles showing total revenue, orders, average order value, and estimated customers.
Left: time-series line chart of the selected metric (Revenue by default).
Right: pie chart showing revenue share by category and a scatter plot (quantity vs price).
Bottom: data table with the latest 100 filtered records and a Download CSV button.

Interacting with the date picker and dropdowns updates all visuals instantly. Try selecting a single region and category and notice how the KPI numbers and charts change.

7) Possible Enhancements

Authentication & Role-based views — add login and restrict access to authorized users.
More charts — add heatmap by hour, cohort analysis, or geographic maps (choropleth) for region-level plots.
Caching — integrate flask_caching or Dash caching for expensive aggregations on big datasets.
Deploy — dockerize and deploy to Heroku / Render / Dash Enterprise.
Time series forecasting — add an ARIMA / Prophet / LSTM panel to forecast future revenue.
Streaming data — connect to a live data source (Kafka, PubSub) and show real-time updates.
Annotations & Alerts — allow users to annotate charts and configure threshold-based alerts.

python Topics

python Tutorial