🎉 Welcome to PyVerse! Start Learning Today

Working with JSON and CSV Files

Who this is for: Grade 8–9 students who already know Python basics (loops, functions, lists/dicts) and want real-world data skills.

What you'll learn:

  • What JSON and CSV are and when to use them
  • Reading and writing JSON files in Python
  • Reading and writing CSV files in Python
  • Converting between JSON and CSV
  • Handling common pitfalls (encodings, numbers, headers)
  • A hands-on mini project: Gradebook Analyzer

1) Quick concepts

JSON (JavaScript Object Notation)

  • Great for structured data with nested objects and lists.
  • Common in web APIs and app configs.
  • Type mapping (JSON → Python):
    • object → dict
    • array → list
    • string → str
    • number → int/float
    • true/false → True/False
    • null → None

CSV (Comma-Separated Values)

  • Great for tabular data (rows and columns), like spreadsheets.
  • Simple and human-readable, but not good for deeply nested structures.

2) JSON in Python (module: json)

Read JSON from a file:

Use json.load to read files, json.loads to parse a string.

Example: reading, updating, and saving JSON

import json from pathlib import Path data_path = Path("data") data_path.mkdir(exist_ok=True) # Create a sample JSON file sample = { "class": "9A", "teacher": "Ms. Lee", "students": [ {"name": "Alex", "age": 14, "scores": {"math": 88, "science": 92}}, {"name": "Rita", "age": 15, "scores": {"math": 95, "science": 85}} ] } with (data_path / "class.json").open("w", encoding="utf-8") as f: json.dump(sample, f, indent=2, ensure_ascii=False) # Read it back with (data_path / "class.json").open("r", encoding="utf-8") as f: data = json.load(f) # Pretty print to console print(json.dumps(data, indent=2, ensure_ascii=False)) # Update a value data["students"][0]["scores"]["math"] = 90 # Save updated JSON with (data_path / "class_updated.json").open("w", encoding="utf-8") as f: json.dump(data, f, indent=2, ensure_ascii=False, sort_keys=True)

Notes:

  • indent makes it human-readable.
  • ensure_ascii=False keeps non-English characters readable.
  • sort_keys=True writes keys in alphabetical order (useful for consistent files).

Handling bad JSON safely:

bad = "{ name: 'Alex' }" # invalid JSON (unquoted key, single quotes) try: parsed = json.loads(bad) except json.JSONDecodeError as e: print("Invalid JSON:", e)

3) CSV in Python (module: csv)

Read CSV as rows (each row is a list):

Good for simple data without headers.

Read CSV as dictionaries (DictReader):

  • Best when the first row has column names.
  • You get rows as dicts like {"name": "Alex", "math": "88"}

Example: reading and writing CSV

import csv from pathlib import Path data_path = Path("data") data_path.mkdir(exist_ok=True) # Create a sample CSV file with (data_path / "students.csv").open("w", newline="", encoding="utf-8") as f: writer = csv.writer(f) writer.writerow(["name", "math", "science"]) writer.writerow(["Alex", "88", "92"]) writer.writerow(["Rita", "95", "85"]) writer.writerow(["Sam", "", "78"]) # missing math score # Read with DictReader def to_float(s): try: return float(s) except (TypeError, ValueError): return None rows = [] with (data_path / "students.csv").open("r", newline="", encoding="utf-8") as f: reader = csv.DictReader(f) for row in reader: math = to_float(row["math"]) science = to_float(row["science"]) avg = None nums = [x for x in (math, science) if x is not None] if nums: avg = sum(nums) / len(nums) rows.append({ "name": row["name"], "math": math, "science": science, "average": avg }) print(rows) # Write a new CSV with averages with (data_path / "averages.csv").open("w", newline="", encoding="utf-8") as f: fieldnames = ["name", "math", "science", "average"] writer = csv.DictWriter(f, fieldnames=fieldnames) writer.writeheader() writer.writerows(rows)

Important CSV tips:

  • Always open CSV files with newline="" to avoid extra blank lines (especially on Windows).
  • Use encoding="utf-8" for international text.
  • DictReader requires that the first row has headers.
  • All values read from CSV are strings. Convert to int/float if needed.

4) Converting between JSON and CSV

CSV → JSON (flat rows):

Read with DictReader, convert types, dump with json.dump.

JSON → CSV:

  • Works well if JSON is a list of flat objects.
  • For nested objects, you can flatten keys like "scores.math" or split into multiple files.

Example: flat JSON to CSV

import json, csv from pathlib import Path data_path = Path("data") # Suppose we have a flat JSON list people = [ {"name": "Lina", "city": "Lagos", "age": 14}, {"name": "Jai", "city": "Mumbai", "age": 15} ] with (data_path / "people.json").open("w", encoding="utf-8") as f: json.dump(people, f, indent=2, ensure_ascii=False) # Read JSON and write CSV with (data_path / "people.json").open("r", encoding="utf-8") as f: people_data = json.load(f) fieldnames = sorted({k for p in people_data for k in p.keys()}) with (data_path / "people.csv").open("w", newline="", encoding="utf-8") as f: writer = csv.DictWriter(f, fieldnames=fieldnames) writer.writeheader() writer.writerows(people_data)

Example: flatten a simple nested JSON for CSV

def flatten(prefix, d, out): for k, v in d.items(): key = f"{prefix}.{k}" if prefix else k if isinstance(v, dict): flatten(key, v, out) else: out[key] = v return out record = {"name": "Alex", "scores": {"math": 90, "science": 92}} flat = flatten("", record, {}) # flat -> {"name": "Alex", "scores.math": 90, "scores.science": 92}

5) Practical project: Gradebook Analyzer

Goal:

  • Read a CSV of students and their test scores.
  • Compute each student's average and letter grade.
  • Save a JSON report for the class.
  • Save a CSV leaderboard of top students.

Input CSV (data/gradebook.csv):

  • Columns: name, math, science, english
  • Missing values allowed (blank cells)

Step-by-step code:

import csv, json from pathlib import Path data_path = Path("data") data_path.mkdir(exist_ok=True) # 1) Create a sample gradebook with (data_path / "gradebook.csv").open("w", newline="", encoding="utf-8") as f: writer = csv.writer(f) writer.writerow(["name", "math", "science", "english"]) writer.writerow(["Alex", "88", "92", "84"]) writer.writerow(["Rita", "95", "85", "91"]) writer.writerow(["Sam", "", "78", "80"]) # missing math writer.writerow(["Maya", "100", "98", ""]) # missing english # 2) Helpers def to_float(s): try: return float(s) except (TypeError, ValueError): return None def average(values): vals = [v for v in values if v is not None] return sum(vals) / len(vals) if vals else None def letter_grade(avg): if avg is None: return "N/A" if avg >= 90: return "A" if avg >= 80: return "B" if avg >= 70: return "C" if avg >= 60: return "D" return "F" # 3) Read CSV and compute metrics students = [] with (data_path / "gradebook.csv").open("r", newline="", encoding="utf-8") as f: reader = csv.DictReader(f) for row in reader: scores = { "math": to_float(row.get("math")), "science": to_float(row.get("science")), "english": to_float(row.get("english")) } avg = average(scores.values()) students.append({ "name": row["name"], "scores": scores, "average": avg, "letter": letter_grade(avg) }) # 4) Save JSON report report = { "class": "9A", "count": len(students), "students": students, "top_student": max(students, key=lambda s: (- (s["average"] or -1), s["name"]))["name"] } with (data_path / "grade_report.json").open("w", encoding="utf-8") as f: json.dump(report, f, indent=2, ensure_ascii=False) # 5) Save CSV leaderboard (sorted by average desc) leaderboard = sorted( (s for s in students if s["average"] is not None), key=lambda s: s["average"], reverse=True ) with (data_path / "leaderboard.csv").open("w", newline="", encoding="utf-8") as f: fieldnames = ["name", "average", "letter"] writer = csv.DictWriter(f, fieldnames=fieldnames) writer.writeheader() for s in leaderboard: writer.writerow({ "name": s["name"], "average": round(s["average"], 2), "letter": s["letter"] }) # 6) Print a quick summary print("Saved:", data_path / "grade_report.json") print("Saved:", data_path / "leaderboard.csv")

What you practiced:

  • Reading CSV with DictReader
  • Handling missing numbers
  • Writing formatted JSON with json.dump
  • Writing a CSV with DictWriter and headers
  • Sorting data and computing fields

6) Common pitfalls and pro tips

  • CSV numbers are strings: Always convert to int/float before math.
  • Newline handling: When opening CSV files, use newline="" to prevent blank lines on Windows.
  • Encodings: Use encoding="utf-8" for both JSON and CSV to support all characters.
  • Missing data: Use helper functions to safely convert or default values.
  • Nested JSON to CSV: Flatten nested dicts or choose a subset of fields.
  • Pretty JSON: Use indent=2 in dumps/dump for readability; skip for maximum speed/size.
  • Validation: Wrap json.load in try/except json.JSONDecodeError for untrusted files.
  • Big files: Process CSV line by line (streaming). For JSON, consider JSON Lines (each line is a JSON object) if you need streaming.

Writing JSON Lines:

# write each record on its own line with open("data/students.jsonl", "w", encoding="utf-8") as f: for s in students: f.write(json.dumps(s, ensure_ascii=False) + "\n")

7) Summary

  • JSON is best for nested, structured data; CSV is best for tables.
  • In Python, use json.load/dump for files and csv.DictReader/DictWriter for labeled rows.
  • Always handle encodings, newlines, and type conversions carefully.
  • You can convert between JSON and CSV when data is flat or after flattening.
  • The Gradebook Analyzer project shows how to read data, compute results, and produce both JSON and CSV outputs—skills you'll use in real data tasks.

Next steps:

  • Try adding median and standard deviation to the Gradebook Analyzer.
  • Add per-subject leaderboards.
  • Explore pandas for more powerful data analysis once you're comfortable with csv/json.

Loading quizzes...