Real-World Shell Scripts

You've learned the syntax, commands, and patterns. Now it's time to build something real. This tutorial presents complete, practical scripts that solve actual system administration and automation problems. Each demonstrates multiple concepts working together in production-quality code.

What makes these "real-world" scripts?

Robust error handling: They don't break on edge cases
User-friendly: Clear output, helpful usage messages
Maintainable: Well-structured, commented where needed
Tested patterns: Based on years of production experience
Immediately useful: You can deploy these today

Each script builds on the advanced techniques from previous tutorials: strict mode, logging, error handling, argument parsing, and cleanup. Study the patterns—they're your toolkit for any automation task.

1. Filename Normalizer — Taming Chaotic File Names

The Problem: Users create files like My Report (final) v2 FINAL.docx, breaking scripts that don't handle spaces and special characters. Web servers choke on filenames with spaces. Archives become a nightmare.

The Solution: Automatically normalize filenames to shell-friendly format:

Convert to lowercase
Replace spaces with underscores
Remove special characters
Eliminate consecutive underscores
Preserve file extensions

#!/usr/bin/env bash
set -euo pipefail
 
# normalize_filenames.sh — Rename files to be shell-friendly
#
# WHAT IT DOES:
#   - Converts spaces to underscores
#   - Removes special characters (keeps alphanumeric, dash, underscore)
#   - Converts to lowercase
#   - Eliminates consecutive underscores
#   - Preserves file extensions
#
# USAGE:
#   ./normalize_filenames.sh [DIRECTORY]
#
# EXAMPLES:
#   ./normalize_filenames.sh              # Normalize files in current directory
#   ./normalize_filenames.sh ~/Downloads  # Normalize files in ~/Downloads
#
# TRANSFORMATIONS:
#   "My Document (Final).pdf" → "my_document_final.pdf"
#   "Photo 2024-02-11.JPG"    → "photo_2024-02-11.jpg"
#   "Script___test.sh"        → "script_test.sh"
 
readonly SCRIPT_NAME="$(basename "$0")"
 
# Color output (only if terminal)
if [[ -t 1 ]]; then
    readonly GREEN='\033[0;32m'
    readonly YELLOW='\033[0;33m'
    readonly NC='\033[0m'
else
    readonly GREEN='' YELLOW='' NC=''
fi
 
log()  { echo -e "${GREEN}→${NC} $*"; }
skip() { echo -e "${YELLOW}⊗${NC} $*"; }
 
normalize_filename() {
    local file="$1"
    local dir base ext newname
    
    # Skip if not a regular file
    [[ -f "$file" ]] || return 0
    
    dir="$(dirname "$file")"
    base="$(basename "$file")"
    
    # Separate extension
    if [[ "$base" == *.* ]]; then
        ext=".${base##*.}"
        base="${base%.*}"
    else
        ext=""
    fi
    
    # Normalize the basename
    newname=$(echo "$base" | \
        tr '[:upper:]' '[:lower:]' |       # Lowercase
        tr ' ' '_' |                        # Spaces → underscores
        tr -cd '[:alnum:]_-' |             # Keep only alphanumeric, _, -
        sed 's/__*/_/g' |                  # Multiple _ → single _
        sed 's/^_//;s/_$//')               # Remove leading/trailing _
    
    # Add back extension (lowercased)
    ext=$(echo "$ext" | tr '[:upper:]' '[:lower:]')
    newname="${newname}${ext}"
    
    # Rename if changed
    if [[ "$base${ext}" != "$newname" ]]; then
        if [[ -e "$dir/$newname" ]]; then
            skip "Skipping '$base${ext}' — '$newname' already exists"
        else
            mv "$file" "$dir/$newname"
            log "Renamed: '$base${ext}' → '$newname'"
        fi
    fi
}
 
usage() {
    cat << EOF
Usage: $SCRIPT_NAME [DIRECTORY]
 
Normalize filenames in DIRECTORY (default: current directory).
 
Transformations:
  - Convert to lowercase
  - Replace spaces with underscores
  - Remove special characters (keep alphanumeric, dash, underscore)
  - Eliminate consecutive underscores
  - Preserve file extensions
 
Examples:
  $SCRIPT_NAME                # Normalize current directory
  $SCRIPT_NAME ~/Downloads    # Normalize ~/Downloads
EOF
}
 
# Parse arguments
if [[ "${1:-}" == "-h" || "${1:-}" == "--help" ]]; then
    usage
    exit 0
fi
 
target_dir="${1:-.}"
 
# Validate target directory
if [[ ! -d "$target_dir" ]]; then
    echo "Error: Directory not found: $target_dir" >&2
    exit 1
fi
 
# Process files (non-recursively, regular files only)
count=0
processed=0
 
while IFS= read -r -d '' file; do
    ((count++))
    old_name="$(basename "$file")"
    normalize_filename "$file"
    new_name="$(basename "$file" 2>/dev/null)" || new_name="$old_name"
    [[ "$old_name" != "$new_name" ]] && ((processed++))
done < <(find "$target_dir" -maxdepth 1 -type f -print0)
 
echo ""
echo "Processed $count files, normalized $processed filenames"

Why this script is production-ready:

Handles all edge cases:
- Files without extensions
- Files with dots in the name (file.backup.tar.gz)
- Files that already exist (doesn't overwrite)
- Non-file objects (skips directories, symlinks)
Safe transformations:
- Uses tr -cd to remove characters (keeps only safe ones)
- sed 's/__*/_/g' collapses multiple underscores
- Preserves file extensions correctly
User-friendly:
- Shows what's being renamed
- Skips conflicts instead of failing
- Provides summary statistics

Real-world use cases:

Clean up Downloads folder before archiving
Prepare files for web upload (web servers hate spaces)
Standardize filenames in a photo library
Prepare files for version control (Git works better with simple names)

2. System Health Monitor — Know Before It Breaks

The Problem: You find out your server has been overloaded or out of disk space after things have failed. By then, it's crisis management instead of prevention.

The Solution: Automated health monitoring that checks CPU, memory, disk, and critical services. Run it via cron every 5 minutes, and you'll know about problems before users do.

#!/usr/bin/env bash
set -euo pipefail
 
# health_monitor.sh — Comprehensive system health checker
#
# USAGE:
#   ./health_monitor.sh [-w] [-e EMAIL] [-t THRESHOLDS]
#
# OPTIONS:
#   -w           Write to log file (/var/log/health_monitor.log)
#   -e EMAIL     Send alert email if issues found
#   -s           Silent mode (only output if problems detected)
#   -h           Show help
#
# THRESHOLDS:
#   Customize with environment variables:
#     WARN_CPU=80     CPU load warning (percentage)
#     WARN_MEM=85     Memory usage warning (percentage)
#     WARN_DISK=90    Disk usage warning (percentage)
#
# CRON EXAMPLE:
#   */5 * * * * /usr/local/bin/health_monitor.sh -w -e [email protected] -s
 
readonly SCRIPT_NAME="$(basename "$0")"
readonly LOGFILE="${LOGFILE:-/var/log/health_monitor.log}"
readonly WARN_CPU="${WARN_CPU:-80}"
readonly WARN_MEM="${WARN_MEM:-85}"
readonly WARN_DISK="${WARN_DISK:-90}"
 
# Options
write_log=false
email_alert=""
silent=false
 
# Track issues
declare -a issues=()
declare -a warnings=()
 
# Color output (only if terminal and not silent)
if [[ -t 1 ]]; then
    readonly RED='\033[0;31m'
    readonly GREEN='\033[0;32m'
    readonly YELLOW='\033[0;33m'
    readonly BLUE='\033[0;34m'
    readonly BOLD='\033[1m'
    readonly NC='\033[0m'
else
    readonly RED='' GREEN='' YELLOW='' BLUE='' BOLD='' NC=''
fi
 
# Logging functions
log_msg() {
    local msg="[$(date '+%Y-%m-%d %H:%M:%S')] $*"
    $write_log && echo "$msg" >> "$LOGFILE"
    $silent || echo "$msg"
}
 
ok()   { log_msg "$(echo -e "${GREEN}[OK]${NC}   $*")"; }
warn() { log_msg "$(echo -e "${YELLOW}[WARN]${NC} $*")"; warnings+=("$*"); }
fail() { log_msg "$(echo -e "${RED}[FAIL]${NC} $*")"; issues+=("$*"); }
info() { log_msg "$(echo -e "${BLUE}[INFO]${NC} $*")"; }
 
usage() {
    cat << EOF
Usage: $SCRIPT_NAME [OPTIONS]
 
Monitor system health: CPU, memory, disk, services.
 
OPTIONS:
    -w              Write to log file ($LOGFILE)
    -e EMAIL        Send email alert if issues detected
    -s              Silent mode (only output if problems found)
    -h              Show this help
 
ENVIRONMENT:
    WARN_CPU=80     CPU load warning threshold (default: 80%)
    WARN_MEM=85     Memory usage warning (default: 85%)
    WARN_DISK=90    Disk usage warning (default: 90%)
 
EXAMPLES:
    $SCRIPT_NAME                           # Run interactively
    $SCRIPT_NAME -w                        # Log to file
    $SCRIPT_NAME -w -e [email protected] -s  # Cron mode: log and email on issues
 
CRON:
    # Check every 5 minutes, log, email on issues
    */5 * * * * $SCRIPT_NAME -w -e [email protected] -s
EOF
}
 
# Parse options
while getopts ":we:sh" opt; do
    case $opt in
        w) write_log=true ;;
        e) email_alert="$OPTARG" ;;
        s) silent=true ;;
        h) usage; exit 0 ;;
        \?) echo "Unknown option: -$OPTARG" >&2; usage; exit 1 ;;
        :) echo "-$OPTARG requires an argument" >&2; exit 1 ;;
    esac
done
 
# Header (skip if silent with no issues)
print_header() {
    $silent && return
    log_msg ""
    log_msg "=========================================="
    log_msg "  System Health Report"
    log_msg "  $(date)"
    log_msg "  Hostname: $(hostname)"
    log_msg "=========================================="
    log_msg ""
}
 
print_header
 
# ==================== Check CPU Load ====================
check_cpu() {
    local load cores load_pct
    
    # Get 1-minute load average
    if [[ -f /proc/loadavg ]]; then
        load=$(awk '{print $1}' /proc/loadavg)
    else
        load=$(uptime | awk -F'load average:' '{print $2}' | cut -d, -f1 | tr -d ' ')
    fi
    
    # Get number of CPU cores
    cores=$(nproc 2>/dev/null || grep -c ^processor /proc/cpuinfo 2>/dev/null || echo 1)
    
    # Calculate load percentage
    load_pct=$(echo "$load $cores" | awk '{printf "%.0f", ($1/$2)*100}')
    
    if (( load_pct >= WARN_CPU )); then
        fail "CPU load: ${load_pct}% (load: $load, cores: $cores)"
    else
        ok "CPU load: ${load_pct}% (load: $load, cores: $cores)"
    fi
}
 
# ==================== Check Memory ====================
check_memory() {
    if ! command -v free &>/dev/null; then
        warn "Memory check skipped: 'free' command not available"
        return
    fi
    
    # Parse free output: total, used, available
    local mem_info
    mem_info=$(free | awk 'NR==2{printf "%d %d %d", $2, $3, $7}')
    
    local total used available mem_pct
    read -r total used available <<< "$mem_info"
    
    mem_pct=$((used * 100 / total))
    
    if (( mem_pct >= WARN_MEM )); then
        fail "Memory usage: ${mem_pct}% (${used}/${total} KB)"
    else
        ok "Memory usage: ${mem_pct}% (${used}/${total} KB)"
    fi
}
 
# ==================== Check Disk Space ====================
check_disk() {
    local found_issue=false
    
    while IFS= read -r line; do
        local usage mount
        usage=$(echo "$line" | awk '{print $1}')
        mount=$(echo "$line" | awk '{print $2}')
        usage_num="${usage%\%}"
        
        if (( usage_num >= WARN_DISK )); then
            fail "Disk $mount: ${usage}% full"
            found_issue=true
        else
            ok "Disk $mount: ${usage} used"
        fi
    done < <(df -h --output=pcent,target -x tmpfs -x devtmpfs 2>/dev/null | tail -n +2)
    
    # Fallback for systems without GNU coreutils
    if [[ ! -f /proc/mounts ]]; then
        df -h | tail -n +2 | while read -r _ _ _ _ usage mount _; do
            usage_num="${usage%\%}"
            if (( usage_num >= WARN_DISK )); then
                fail "Disk $mount: ${usage}% full"
            else
                ok "Disk $mount: ${usage} used"
            fi
        done
    fi
}
 
# ==================== Check Services ====================
check_service() {
    local service="$1"
    
    # Try multiple methods to check if service is running
    if pgrep -x "$service" >/dev/null 2>&1; then
        ok "Service $service: running"
    elif systemctl is-active --quiet "$service" 2>/dev/null; then
        ok "Service $service: running (systemd)"
    else
        warn "Service $service: not running"
    fi
}
 
check_services() {
    # Common critical services (adjust for your environment)
    local services=("sshd" "cron")
    
    # Add custom services from environment variable
    if [[ -n "${MONITOR_SERVICES:-}" ]]; then
        IFS=',' read -ra additional_services <<< "$MONITOR_SERVICES"
        services+=("${additional_services[@]}")
    fi
    
    for service in "${services[@]}"; do
        check_service "$service"
    done
}
 
# ==================== Check System Uptime ====================
check_uptime() {
    local uptime_str
    uptime_str=$(uptime -p 2>/dev/null || uptime | sed 's/.*up //' | sed 's/,.*//')
    info "System uptime: $uptime_str"
}
 
# ==================== Run All Checks ====================
check_cpu
check_memory
check_disk
check_services
check_uptime
 
# ==================== Summary ====================
print_summary() {
    $silent && [[ ${#issues[@]} -eq 0 && ${#warnings[@]} -eq 0 ]] && return
    
    log_msg ""
    log_msg "=========================================="
    
    if [[ ${#issues[@]} -eq 0 && ${#warnings[@]} -eq 0 ]]; then
        log_msg "$(echo -e "${GREEN}${BOLD}✓ All checks passed!${NC}")"
    else
        if [[ ${#issues[@]} -gt 0 ]]; then
            log_msg "$(echo -e "${RED}${BOLD}✗ ${#issues[@]} critical issue(s):${NC}")"
            for issue in "${issues[@]}"; do
                log_msg "  • $issue"
            done
        fi
        if [[ ${#warnings[@]} -gt 0 ]]; then
            log_msg "$(echo -e "${YELLOW}${BOLD}⚠ ${#warnings[@]} warning(s):${NC}")"
            for warning in "${warnings[@]}"; do
                log_msg "  • $warning"
            done
        fi
    fi
    
    log_msg "=========================================="
}
 
print_summary
 
# ==================== Email Alert ====================
send_alert() {
    [[ -z "$email_alert" ]] && return
    [[ ${#issues[@]} -eq 0 && ${#warnings[@]} -eq 0 ]] && return
    
    local subject="[ALERT] System Health Issues on $(hostname)"
    local body
    
    body="System: $(hostname)
Time: $(date)
 
"
    
    if [[ ${#issues[@]} -gt 0 ]]; then
        body+="CRITICAL ISSUES (${#issues[@]}):
"
        for issue in "${issues[@]}"; do
            body+="  - $issue
"
        done
        body+="
"
    fi
    
    if [[ ${#warnings[@]} -gt 0 ]]; then
        body+="WARNINGS (${#warnings[@]}):
"
        for warning in "${warnings[@]}"; do
            body+="  - $warning
"
        done
    fi
    
    if command -v mail &>/dev/null; then
        echo "$body" | mail -s "$subject" "$email_alert"
        info "Alert email sent to $email_alert"
    else
        warn "Cannot send email: 'mail' command not available"
    fi
}
 
send_alert
 
# Exit with error if critical issues found
[[ ${#issues[@]} -eq 0 ]] || exit 1

Why this is production-grade:

Flexible deployment:
- Interactive mode: See colorful output
- Cron mode: -s (silent unless problems), logs to file, sends email
- Customizable thresholds via environment variables
Comprehensive checks:
- CPU load as percentage of cores (8-core machine with load 4.0 is 50%)
- Memory usage (actual used, not cached)
- All mounted filesystems (skips tmpfs/devtmpfs)
- Critical services (extensible via MONITOR_SERVICES)
Robust detection:
- Tries multiple methods to check service status (pgrep, systemctl)
- Handles systems without GNU coreutils
- Gracefully skips unavailable checks
Actionable alerts:
- Email only when problems detected
- Clear separation: critical issues vs warnings
- Exit code indicates problems (for automation)

Real-world deployment:

# Install
sudo cp health_monitor.sh /usr/local/bin/
sudo chmod +x /usr/local/bin/health_monitor.sh
 
# Cron: check every 5 minutes, log, email on issues
sudo crontab -e
*/5 * * * * WARN_CPU=90 WARN_MEM=90 WARN_DISK=95 /usr/local/bin/health_monitor.sh -w -e [email protected] -s

3. File Organizer — Automatic Folder Structure

The Problem: Downloads folder is chaos. Hundreds of mixed files (PDFs, images, videos, code) all in one directory.

The Solution: Automatically sort files into category folders by extension.

#!/usr/bin/env bash
set -euo pipefail
 
# file_organizer.sh — Sort files into folders by type
#
# USAGE:
#   ./file_organizer.sh [DIRECTORY]
#
# EXAMPLES:
#   ./file_organizer.sh              # Organize current directory
#   ./file_organizer.sh ~/Downloads  # Organize Downloads
 
readonly SCRIPT_NAME="$(basename "$0")"
 
# Define file categories (extension → folder)
declare -A CATEGORIES=(
    # Images
    [jpg]="Images" [jpeg]="Images" [png]="Images" [gif]="Images"
    [svg]="Images" [bmp]="Images" [webp]="Images" [ico]="Images"
    [heic]="Images" [tiff]="Images" [raw]="Images"
    
    # Documents
    [pdf]="Documents" [doc]="Documents" [docx]="Documents"
    [txt]="Documents" [md]="Documents" [odt]="Documents"
    [xls]="Documents" [xlsx]="Documents" [csv]="Documents"
    [ppt]="Documents" [pptx]="Documents" [rtf]="Documents"
    
    # Videos
    [mp4]="Video" [mkv]="Video" [avi]="Video" [mov]="Video"
    [wmv]="Video" [flv]="Video" [webm]="Video" [m4v]="Video"
    
    # Audio
    [mp3]="Audio" [wav]="Audio" [flac]="Audio" [aac]="Audio"
    [ogg]="Audio" [wma]="Audio" [m4a]="Audio" [opus]="Audio"
    
    # Archives
    [zip]="Archives" [tar]="Archives" [gz]="Archives"
    [rar]="Archives" [7z]="Archives" [bz2]="Archives"
    [xz]="Archives" [tgz]="Archives"
    
    # Code
    [py]="Code" [js]="Code" [ts]="Code" [jsx]="Code" [tsx]="Code"
    [html]="Code" [css]="Code" [scss]="Code" [sass]="Code"
    [sh]="Code" [bash]="Code" [rb]="Code" [java]="Code"
    [c]="Code" [cpp]="Code" [h]="Code" [go]="Code"
    [rs]="Code" [php]="Code" [swift]="Code" [kt]="Code"
    [sql]="Code" [json]="Code" [xml]="Code" [yaml]="Code" [yml]="Code"
    
    # Executables
    [exe]="Executables" [msi]="Executables" [dmg]="Executables"
    [app]="Executables" [deb]="Executables" [rpm]="Executables"
)
 
# Color output
if [[ -t 1 ]]; then
    readonly GREEN='\033[0;32m'
    readonly YELLOW='\033[0;33m'
    readonly BLUE='\033[0;34m'
    readonly NC='\033[0m'
else
    readonly GREEN='' YELLOW='' BLUE='' NC=''
fi
 
moved=0
skipped=0
errors=0
 
organize_file() {
    local file="$1"
    local filename ext dest_dir
    
    filename=$(basename "$file")
    
    # Get extension (lowercase)
    if [[ "$filename" == *.* ]]; then
        ext="${filename##*.}"
        ext="${ext,,}"  # Lowercase
    else
        echo -e "${YELLOW}⊗${NC} Skipped (no extension): $filename"
        ((skipped++))
        return
    fi
    
    # Check if we have a category for this extension
    if [[ -n "${CATEGORIES[$ext]:-}" ]]; then
        dest_dir="$(dirname "$file")/${CATEGORIES[$ext]}"
        
        # Create destination directory
        if ! mkdir -p "$dest_dir"; then
            echo -e "${RED}✗${NC} Error creating directory: $dest_dir" >&2
            ((errors++))
            return
        fi
        
        # Check if destination file already exists
        if [[ -e "$dest_dir/$filename" ]]; then
            echo -e "${YELLOW}⊗${NC} Skipped (already exists): $filename → ${CATEGORIES[$ext]}/"
            ((skipped++))
            return
        fi
        
        # Move file
        if mv "$file" "$dest_dir/"; then
            echo -e "${GREEN}✓${NC} Moved: $filename → ${CATEGORIES[$ext]}/"
            ((moved++))
        else
            echo -e "${RED}✗${NC} Error moving: $filename" >&2
            ((errors++))
        fi
    else
        echo -e "${BLUE}⊗${NC} Skipped (unknown type .$ext): $filename"
        ((skipped++))
    fi
}
 
usage() {
    cat << EOF
Usage: $SCRIPT_NAME [DIRECTORY]
 
Organize files into category folders by file extension.
 
Categories:
  Images      → jpg, png, gif, svg, bmp, webp, heic, etc.
  Documents   → pdf, doc, docx, txt, md, xls, xlsx, ppt, etc.
  Video       → mp4, mkv, avi, mov, wmv, webm, etc.
  Audio       → mp3, wav, flac, aac, ogg, m4a, etc.
  Archives    → zip, tar, gz, rar, 7z, bz2, etc.
  Code        → py, js, ts, html, css, sh, rb, java, go, etc.
  Executables → exe, msi, dmg, app, deb, rpm, etc.
 
Examples:
  $SCRIPT_NAME              # Organize current directory
  $SCRIPT_NAME ~/Downloads  # Organize Downloads folder
EOF
}
 
# Parse arguments
if [[ "${1:-}" == "-h" || "${1:-}" == "--help" ]]; then
    usage
    exit 0
fi
 
target="${1:-.}"
 
# Validate directory
if [[ ! -d "$target" ]]; then
    echo "Error: Directory not found: $target" >&2
    exit 1
fi
 
echo "Organizing files in: $target"
echo ""
 
# Process all files (non-recursively)
while IFS= read -r -d '' file; do
    organize_file "$file"
done < <(find "$target" -maxdepth 1 -type f -print0)
 
# Summary
echo ""
echo "=========================================="
echo "Summary:"
echo "  Moved:   $moved files"
echo "  Skipped: $skipped files"
[[ $errors -gt 0 ]] && echo "  Errors:  $errors"
echo "=========================================="
 
exit 0

Production features:

Comprehensive categories: 60+ file extensions mapped to 7 categories
Conflict handling: Won't overwrite existing files
Error handling: Reports failures without stopping
Clear feedback: Shows exactly what's happening to each file
Safe: Only processes current directory (not recursive)

Extend it:

# Add custom categories
declare -A CATEGORIES=(
    # ... existing categories ...
    
    # My custom categories
    [blend]="3D_Models" [obj]="3D_Models" [fbx]="3D_Models"
    [sketch]="Design" [xd]="Design" [fig]="Design"
    [book]="Ebooks" [epub]="Ebooks" [mobi]="Ebooks"
)

4. Backup with Intelligent Rotation

The Problem: Daily backups fill up your disk. Manual cleanup is tedious and error-prone. You need retention policies: keep daily backups for 7 days, weekly for 4 weeks, monthly for 12 months.

The Solution: Automated backup with grandfather-father-son (GFS) rotation.

#!/usr/bin/env bash
set -euo pipefail
 
# backup_rotate.sh — Backup with intelligent GFS rotation
#
# RETENTION POLICY:
#   Daily:   7 backups (1 week)
#   Weekly:  4 backups (1 month)  — Saved on Sundays
#   Monthly: 12 backups (1 year)  — Saved on 1st of month
#
# USAGE:
#   ./backup_rotate.sh -s SOURCE -n NAME [-b BASE_DIR]
#
# OPTIONS:
#   -s SOURCE    Directory to backup
#   -n NAME      Backup name (identifier)
#   -b BASE      Base backup directory (default: /backup)
#   -c           Compress with gzip
#   -h           Show help
#
# EXAMPLES:
#   ./backup_rotate.sh -s /var/www -n website -c
#   ./backup_rotate.sh -s /home/alice -n home -b /mnt/backups
#
# CRON:
#   # Daily backup at 2 AM
#   0 2 * * * /usr/local/bin/backup_rotate.sh -s /var/www -n website -c
 
readonly SCRIPT_NAME="$(basename "$0")"
BACKUP_BASE="/backup"
LOGFILE=""
COMPRESS=false
 
# Exit codes
readonly E_SUCCESS=0
readonly E_MISSING_ARGS=2
readonly E_SOURCE_NOT_FOUND=3
readonly E_BACKUP_FAILED=4
 
log() { 
    local msg="[$(date '+%Y-%m-%d %H:%M:%S')] $*"
    echo "$msg"
    [[ -n "$LOGFILE" ]] && echo "$msg" >> "$LOGFILE"
}
 
error() {
    echo "[ERROR] $*" >&2
    [[ -n "$LOGFILE" ]] && echo "[ERROR] $*" >> "$LOGFILE"
}
 
die() {
    error "$*"
    exit "${2:-1}"
}
 
usage() {
    cat << EOF
Usage: $SCRIPT_NAME -s SOURCE -n NAME [OPTIONS]
 
Create backups with intelligent GFS rotation policy.
 
REQUIRED:
    -s SOURCE    Directory to backup
    -n NAME      Backup identifier (e.g., 'website', 'database')
 
OPTIONS:
    -b BASE      Base backup directory (default: /backup)
    -c           Compress backups with gzip
    -h           Show this help
 
RETENTION POLICY:
    Daily:   7 backups (last 7 days)
    Weekly:  4 backups (saved on Sundays)
    Monthly: 12 backups (saved on 1st of month)
 
EXAMPLES:
    $SCRIPT_NAME -s /var/www/html -n website -c
    $SCRIPT_NAME -s /home/user -n home_backup -b /mnt/external
 
CRON EXAMPLE:
    # Daily backup at 2 AM with compression
    0 2 * * * $SCRIPT_NAME -s /var/www -n website -c 2>&1 | logger -t backup
 
DIRECTORY STRUCTURE:
    /backup/
      └── NAME/
          ├── daily/      # Last 7 days
          ├── weekly/     # Last 4 weeks (Sundays)
          └── monthly/    # Last 12 months (1st of month)
EOF
}
 
# Parse arguments
SOURCE=""
NAME=""
 
while getopts ":s:n:b:ch" opt; do
    case $opt in
        s) SOURCE="$OPTARG" ;;
        n) NAME="$OPTARG" ;;
        b) BACKUP_BASE="$OPTARG" ;;
        c) COMPRESS=true ;;
        h) usage; exit $E_SUCCESS ;;
        :) die "-$OPTARG requires an argument" $E_MISSING_ARGS ;;
        \?) die "Unknown option: -$OPTARG (use -h for help)" $E_MISSING_ARGS ;;
    esac
done
 
# Validate required arguments
[[ -z "$SOURCE" ]] && die "Source directory is required (-s)" $E_MISSING_ARGS
[[ -z "$NAME" ]] && die "Backup name is required (-n)" $E_MISSING_ARGS
[[ -d "$SOURCE" ]] || die "Source directory not found: $SOURCE" $E_SOURCE_NOT_FOUND
 
# Setup logging
LOGFILE="$BACKUP_BASE/$NAME/backup.log"
mkdir -p "$(dirname "$LOGFILE")" 2>/dev/null || true
 
log "=========================================="
log "Backup started"
log "Source: $SOURCE"
log "Name: $NAME"
log "Base: $BACKUP_BASE"
log "Compress: $COMPRESS"
log "=========================================="
 
# Create directory structure
DAILY="$BACKUP_BASE/$NAME/daily"
WEEKLY="$BACKUP_BASE/$NAME/weekly"
MONTHLY="$BACKUP_BASE/$NAME/monthly"
 
for dir in "$DAILY" "$WEEKLY" "$MONTHLY"; do
    if ! mkdir -p "$dir"; then
        die "Cannot create directory: $dir" $E_BACKUP_FAILED
    fi
done
 
# Date information
DATE=$(date +%Y%m%d_%H%M%S)
DAY_OF_WEEK=$(date +%u)    # 1=Monday, 7=Sunday
DAY_OF_MONTH=$(date +%d)
 
# Determine file extension
if $COMPRESS; then
    EXT=".tar.gz"
else
    EXT=".tar"
fi
 
# Create daily backup
BACKUP_FILE="$DAILY/${NAME}_${DATE}${EXT}"
log "Creating daily backup: $BACKUP_FILE"
 
# Perform backup
if $COMPRESS; then
    if tar czf "$BACKUP_FILE" -C "$(dirname "$SOURCE")" "$(basename "$SOURCE")" 2>&1 | grep -v "Removing leading"; then
        :
    else
        die "Backup failed!" $E_BACKUP_FAILED
    fi
else
    if tar cf "$BACKUP_FILE" -C "$(dirname "$SOURCE")" "$(basename "$SOURCE")" 2>&1 | grep -v "Removing leading"; then
        :
    else
        die "Backup failed!" $E_BACKUP_FAILED
    fi
fi
 
# Verify backup was created
if [[ ! -f "$BACKUP_FILE" ]]; then
    die "Backup file was not created: $BACKUP_FILE" $E_BACKUP_FAILED
fi
 
# Report size
SIZE=$(du -h "$BACKUP_FILE" | cut -f1)
log "Backup created successfully: $SIZE"
 
# Copy to weekly on Sundays
if [[ "$DAY_OF_WEEK" -eq 7 ]]; then
    WEEKLY_FILE="$WEEKLY/${NAME}_week_${DATE}${EXT}"
    cp "$BACKUP_FILE" "$WEEKLY_FILE"
    log "Weekly backup saved: $(basename "$WEEKLY_FILE")"
fi
 
# Copy to monthly on the 1st
if [[ "$DAY_OF_MONTH" == "01" ]]; then
    MONTHLY_FILE="$MONTHLY/${NAME}_month_${DATE}${EXT}"
    cp "$BACKUP_FILE" "$MONTHLY_FILE"
    log "Monthly backup saved: $(basename "$MONTHLY_FILE")"
fi
 
# Rotation: remove old backups
log "Cleaning up old backups..."
 
# Daily: keep 7 days
daily_removed=$(find "$DAILY" -name "*${EXT}" -mtime +7 -delete -print | wc -l)
[[ $daily_removed -gt 0 ]] && log "  Removed $daily_removed old daily backup(s)"
 
# Weekly: keep 4 weeks (28 days)
weekly_removed=$(find "$WEEKLY" -name "*${EXT}" -mtime +28 -delete -print | wc -l)
[[ $weekly_removed -gt 0 ]] && log "  Removed $weekly_removed old weekly backup(s)"
 
# Monthly: keep 12 months (365 days)
monthly_removed=$(find "$MONTHLY" -name "*${EXT}" -mtime +365 -delete -print | wc -l)
[[ $monthly_removed -gt 0 ]] && log "  Removed $monthly_removed old monthly backup(s)"
 
# Summary
log "=========================================="
log "Backup completed successfully!"
log "Current backup counts:"
log "  Daily:   $(find "$DAILY" -name "*${EXT}" | wc -l)"
log "  Weekly:  $(find "$WEEKLY" -name "*${EXT}" | wc -l)"
log "  Monthly: $(find "$MONTHLY" -name "*${EXT}" | wc -l)"
log "=========================================="
 
exit $E_SUCCESS

Why this rotation policy works:

Granularity: Recent changes (last 7 days) are all recoverable
Long-term: Can recover state from weeks/months ago
Space-efficient: Old backups automatically cleaned up
Predictable: Know exactly how many backups you have

Real-world deployment:

# Daily website backup at 2 AM
0 2 * * * /usr/local/bin/backup_rotate.sh -s /var/www/html -n website -c
 
# Daily database dump + backup at 3 AM
0 3 * * * pg_dump mydb > /tmp/db.sql && /usr/local/bin/backup_rotate.sh -s /tmp/db.sql -n database -c

5. Web Server Log Analyzer

The Problem: You have gigabytes of Apache/Nginx access logs. You need to quickly answer: Which IPs are hitting us most? What pages are popular? Are there errors? When is peak traffic?

The Solution: Parse and analyze access logs with awk, generating a comprehensive report.

#!/usr/bin/env bash
set -euo pipefail
 
# log_analyzer.sh — Web server access log analyzer
#
# USAGE:
#   ./log_analyzer.sh ACCESS_LOG [ACCESS_LOG...]
#
# SUPPORTED FORMATS:
#   - Apache Combined Log Format
#   - Nginx default access log format
#
# OUTPUT:
#   - Total requests
#   - Requests by HTTP status code
#   - Top IP addresses
#   - Top requested URLs
#   - Traffic by hour
#   - Top user agents
#   - Bandwidth usage (if log includes bytes)
 
readonly SCRIPT_NAME="$(basename "$0")"
 
# Color output
if [[ -t 1 ]]; then
    readonly BOLD='\033[1m'
    readonly BLUE='\033[0;34m'
    readonly GREEN='\033[0;32m'
    readonly YELLOW='\033[0;33m'
    readonly RED='\033[0;31m'
    readonly NC='\033[0m'
else
    readonly BOLD='' BLUE='' GREEN='' YELLOW='' RED='' NC=''
fi
 
usage() {
    cat << EOF
Usage: $SCRIPT_NAME LOG_FILE [LOG_FILE...]
 
Analyze web server access logs and generate comprehensive report.
 
Supports:
  - Apache Combined Log Format
  - Nginx default access log format
 
Examples:
  $SCRIPT_NAME /var/log/apache2/access.log
  $SCRIPT_NAME /var/log/nginx/access.log*
  $SCRIPT_NAME access.log access.log.1
 
Report includes:
  - Total requests
  - Status code distribution (2xx, 3xx, 4xx, 5xx)
  - Top IP addresses
  - Top requested URLs
  - Traffic patterns by hour
  - Top user agents
  - Bandwidth usage
EOF
}
 
# Validate arguments
if [[ $# -eq 0 ]]; then
    usage
    exit 1
fi
 
for file in "$@"; do
    if [[ ! -f "$file" ]]; then
        echo "Error: File not found: $file" >&2
        exit 1
    fi
done
 
# Combine all log files
TEMP_LOG=$(mktemp)
trap "rm -f '$TEMP_LOG'" EXIT
 
cat "$@" > "$TEMP_LOG"
 
# Count total requests
total_requests=$(wc -l < "$TEMP_LOG")
 
# Print header
echo ""
echo -e "${BOLD}════════════════════════════════════════════════════${NC}"
echo -e "${BOLD}  Web Server Log Analysis${NC}"
echo -e "${BOLD}════════════════════════════════════════════════════${NC}"
echo -e "  ${BLUE}Files analyzed:${NC} $#"
echo -e "  ${BLUE}Total requests:${NC} $(printf "%'d" "$total_requests")"
echo -e "  ${BLUE}Date range:${NC} $(head -1 "$TEMP_LOG" | awk -F'[\\[\\]]' '{print $2}' | cut -d: -f1) to $(tail -1 "$TEMP_LOG" | awk -F'[\\[\\]]' '{print $2}' | cut -d: -f1)"
echo -e "${BOLD}════════════════════════════════════════════════════${NC}"
echo ""
 
# ==================== Status Code Distribution ====================
echo -e "${BOLD}${BLUE}📊 HTTP Status Code Distribution${NC}"
echo -e "${BOLD}────────────────────────────────────────────────────${NC}"
 
awk '{
    # Extract status code (field 9 in combined log format)
    match($0, /" [0-9]{3} /, arr)
    if (RSTART > 0) {
        code = substr($0, RSTART+2, 3)
        if (code >= 200 && code < 300) status["2xx"]++
        else if (code >= 300 && code < 400) status["3xx"]++
        else if (code >= 400 && code < 500) status["4xx"]++
        else if (code >= 500 && code < 600) status["5xx"]++
        codes[code]++
    }
}
END {
    # Print summary by category
    printf "  %-10s %10d  (%5.1f%%)  %s\n", "2xx (OK)", status["2xx"], (status["2xx"]/NR)*100, "'"${GREEN}■■■■■${NC}"'"
    printf "  %-10s %10d  (%5.1f%%)  %s\n", "3xx (Redir)", status["3xx"], (status["3xx"]/NR)*100, "'"${YELLOW}■■■${NC}"'"
    printf "  %-10s %10d  (%5.1f%%)  %s\n", "4xx (Client)", status["4xx"], (status["4xx"]/NR)*100, "'"${RED}■■■■${NC}"'"
    printf "  %-10s %10d  (%5.1f%%)  %s\n", "5xx (Server)", status["5xx"], (status["5xx"]/NR)*100, "'"${RED}■■■■■■${NC}"'"
    print ""
    
    # Top 5 specific codes
    print "  Top status codes:"
    n = asorti(codes, sorted_codes, "@val_num_desc")
    for (i = 1; i <= (n < 5 ? n : 5); i++) {
        code = sorted_codes[i]
        printf "    %s: %d requests (%.1f%%)\n", code, codes[code], (codes[code]/NR)*100
    }
}' "$TEMP_LOG"
 
echo ""
 
# ==================== Top IP Addresses ====================
echo -e "${BOLD}${BLUE}🌐 Top 10 IP Addresses${NC}"
echo -e "${BOLD}────────────────────────────────────────────────────${NC}"
 
awk '{print $1}' "$TEMP_LOG" | sort | uniq -c | sort -rn | head -10 | \
    awk -v total="$total_requests" '{
        pct = ($1/total)*100
        printf "  %8d requests  (%5.1f%%)  %s\n", $1, pct, $2
    }'
 
echo ""
 
# ==================== Top Requested URLs ====================
echo -e "${BOLD}${BLUE}📄 Top 10 Requested URLs${NC}"
echo -e "${BOLD}────────────────────────────────────────────────────${NC}"
 
awk '{
    match($0, /"[A-Z]+ [^ ]+ HTTP/, arr)
    if (RSTART > 0) {
        url_part = substr($0, RSTART+1)
        match(url_part, /[A-Z]+ ([^ ]+)/, url_match)
        if (url_match[1] != "") {
            print url_match[1]
        }
    }
}' "$TEMP_LOG" | sort | uniq -c | sort -rn | head -10 | \
    awk -v total="$total_requests" '{
        count = $1
        $1 = ""
        url = $0
        sub(/^ */, "", url)
        pct = (count/total)*100
        printf "  %8d  (%5.1f%%)  %s\n", count, pct, url
    }'
 
echo ""
 
# ==================== Traffic by Hour ====================
echo -e "${BOLD}${BLUE}🕐 Traffic Distribution by Hour${NC}"
echo -e "${BOLD}────────────────────────────────────────────────────${NC}"
 
awk -F'[\\[:]' '{
    hour = $3
    if (hour >= 0 && hour <= 23) {
        hours[hour]++
    }
}
END {
    max = 0
    for (h in hours) {
        if (hours[h] > max) max = hours[h]
    }
    
    for (h = 0; h < 24; h++) {
        count = hours[h]
        if (count == "") count = 0
        bar_len = int((count / max) * 30)
        bar = ""
        for (i = 0; i < bar_len; i++) bar = bar "'"${GREEN}■${NC}"'"
        printf "  %02d:00  %8d  %s\n", h, count, bar
    }
}' "$TEMP_LOG"
 
echo ""
 
# ==================== Top User Agents ====================
echo -e "${BOLD}${BLUE}🤖 Top 10 User Agents${NC}"
echo -e "${BOLD}────────────────────────────────────────────────────${NC}"
 
awk -F'"' '{
    # User agent is typically the 6th quoted field
    if (NF >= 6) print $6
}' "$TEMP_LOG" | sort | uniq -c | sort -rn | head -10 | \
    awk -v total="$total_requests" '{
        count = $1
        $1 = ""
        agent = $0
        sub(/^ */, "", agent)
        pct = (count/total)*100
        
        # Truncate long user agents
        if (length(agent) > 60) {
            agent = substr(agent, 1, 57) "..."
        }
        
        printf "  %8d  (%5.1f%%)  %s\n", count, pct, agent
    }'
 
echo ""
 
# ==================== Bandwidth Usage ====================
echo -e "${BOLD}${BLUE}📦 Bandwidth Usage${NC}"
echo -e "${BOLD}────────────────────────────────────────────────────${NC}"
 
awk '{
    # Bytes sent is typically the last numeric field before user agent
    match($0, /" [0-9]+ [0-9]+ "/, arr)
    if (RSTART > 0) {
        bytes_part = substr($0, RSTART+2)
        match(bytes_part, /[0-9]+/, bytes_match)
        if (bytes_match[0] != "" && bytes_match[0] != "-") {
            total_bytes += bytes_match[0]
        }
    }
}
END {
    gb = total_bytes / (1024*1024*1024)
    mb = total_bytes / (1024*1024)
    printf "  Total:     %8.2f GB (%10.2f MB)\n", gb, mb
    if (NR > 0) {
        avg_bytes = total_bytes / NR
        avg_kb = avg_bytes / 1024
        printf "  Per request: %7.2f KB\n", avg_kb
    }
}' "$TEMP_LOG"
 
echo ""
 
# ==================== Footer ====================
echo -e "${BOLD}════════════════════════════════════════════════════${NC}"
echo -e "  ${GREEN}✓${NC} Analysis complete"
echo -e "${BOLD}════════════════════════════════════════════════════${NC}"
echo ""

Advanced analysis features:

Status code distribution: Shows 2xx/3xx/4xx/5xx breakdown
Traffic patterns: Hourly distribution with bar chart
Security insights: Top IPs (potential DoS), 4xx/5xx errors
Performance data: Bandwidth usage, average request size
Handles multiple files: Analyze rotated logs together

Real-world usage:

# Analyze current log
./log_analyzer.sh /var/log/nginx/access.log
 
# Analyze all rotated logs (last week)
./log_analyzer.sh /var/log/nginx/access.log*
 
# Analyze specific date range (using grep)
grep "11/Feb/2026" /var/log/apache2/access.log | ./log_analyzer.sh /dev/stdin

Exercises

🏋️ Exercise 1: Build a Duplicate File Finder

Task: Write duplicate_finder.sh that:

Finds duplicate files by content (using md5sum)
Groups duplicates together
Reports total wasted space
Optionally deletes duplicates (keeping one copy)

Bonus: Add size filter (only check files > 1MB) for speed.

Show Solution

#!/usr/bin/env bash
set -euo pipefail
 
readonly SCRIPT_NAME="$(basename "$0")"
MIN_SIZE="${MIN_SIZE:-0}"  # Minimum file size (bytes)
DELETE=false
 
usage() {
    cat << EOF
Usage: $SCRIPT_NAME [OPTIONS] DIRECTORY
 
Find duplicate files by content (MD5 hash).
 
OPTIONS:
    -d          Delete duplicates (keep first, remove others)
    -m SIZE     Minimum file size in MB (default: 0)
    -h          Show help
 
ENVIRONMENT:
    MIN_SIZE=N  Minimum file size in bytes
 
EXAMPLES:
    $SCRIPT_NAME ~/Downloads
    $SCRIPT_NAME -m 1 ~/Documents          # Only files > 1MB
    $SCRIPT_NAME -d ~/Downloads            # Delete duplicates
 
OUTPUT:
    Lists duplicate groups with file paths and wasted space.
EOF
}
 
while getopts ":dm:h" opt; do
    case $opt in
        d) DELETE=true ;;
        m) MIN_SIZE=$((OPTARG * 1024 * 1024)) ;;
        h) usage; exit 0 ;;
        \?) echo "Unknown option: -$OPTARG" >&2; usage; exit 1 ;;
        :) echo "-$OPTARG requires an argument" >&2; exit 1 ;;
    esac
done
 
shift $((OPTIND - 1))
 
dir="${1:-.}"
[[ -d "$dir" ]] || { echo "Error: Directory not found: $dir" >&2; exit 1; }
 
echo "Scanning for duplicate files in: $dir"
[[ $MIN_SIZE -gt 0 ]] && echo "Minimum file size: $((MIN_SIZE / 1024 / 1024)) MB"
echo ""
 
temp_hashes=$(mktemp)
trap "rm -f '$temp_hashes'" EXIT
 
# Find files, compute hashes
find "$dir" -type f -size +"${MIN_SIZE}c" -exec md5sum {} \; 2>/dev/null | \
    sort > "$temp_hashes"
 
# Find duplicates
total_dupes=0
wasted_space=0
 
awk '{
    hash = $1
    file = substr($0, 35)  # MD5 is 32 chars + 2 spaces
    
    if (hash == prev_hash) {
        if (dup_count == 0) {
            # First duplicate found - print header
            print "\nDuplicate group (hash: " hash "):"
            print "  [1] " prev_file
            group_files[1] = prev_file
        }
        dup_count++
        total_dupes++
        print "  [" (dup_count + 1) "] " file
        group_files[dup_count + 1] = file
        
        # Calculate wasted space (all but first copy)
        cmd = "stat -f%z \"" file "\" 2>/dev/null || stat -c%s \"" file "\""
        cmd | getline size
        close(cmd)
        wasted += size
    } else {
        # Process previous group if delete mode
        if (dup_count > 0 && delete_mode == "true") {
            print "  Deleting duplicates, keeping: " group_files[1]
            for (i = 2; i <= dup_count + 1; i++) {
                system("rm -f \"" group_files[i] "\"")
                print "    Deleted: " group_files[i]
            }
        }
        
        # Reset for new hash
        dup_count = 0
        delete group_files
    }
    
    prev_hash = hash
    prev_file = file
}
END {
    # Handle last group
    if (dup_count > 0 && delete_mode == "true") {
        print "  Deleting duplicates, keeping: " group_files[1]
        for (i = 2; i <= dup_count + 1; i++) {
            system("rm -f \"" group_files[i] "\"")
            print "    Deleted: " group_files[i]
        }
    }
    
    print "\n=========================================="
    if (total_dupes > 0) {
        print "Found " total_dupes " duplicate file(s)"
        printf "Wasted space: %.2f MB\n", wasted/1024/1024
    } else {
        print "No duplicates found"
    }
    print "=========================================="
}' delete_mode="$DELETE" "$temp_hashes"

🏋️ Exercise 2: Build a Website Uptime Monitor

Task: Write uptime_monitor.sh that:

Checks a website every 30 seconds
Logs downtime with timestamp
When site recovers, logs total downtime duration
Sends email alert on downtime (optional)
Tracks uptime percentage

Show Solution

#!/usr/bin/env bash
set -euo pipefail
 
readonly SCRIPT_NAME="$(basename "$0")"
URL="${1:-}"
INTERVAL="${2:-30}"
EMAIL="${EMAIL:-}"
LOGFILE="/var/log/uptime_$(echo "$URL" | tr '/:' '_').log"
 
[[ -z "$URL" ]] && {
    echo "Usage: $SCRIPT_NAME URL [INTERVAL_SECONDS]" >&2
    echo "Example: $SCRIPT_NAME https://example.com 60" >&2
    exit 1
}
 
log() {
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" | tee -a "$LOGFILE"
}
 
send_alert() {
    local subject="$1"
    local message="$2"
    
    [[ -z "$EMAIL" ]] && return
    
    if command -v mail &>/dev/null; then
        echo "$message" | mail -s "$subject" "$EMAIL"
        log "Alert sent to $EMAIL"
    fi
}
 
is_down=false
down_since=""
total_checks=0
failed_checks=0
 
log "=========================================="
log "Monitoring: $URL (interval: ${INTERVAL}s)"
log "=========================================="
 
while true; do
    ((total_checks++))
    
    if curl -sf --max-time 10 "$URL" >/dev/null 2>&1; then
        # Site is UP
        if $is_down; then
            # RECOVERY
            now=$(date +%s)
            down_start=$(date -d "$down_since" +%s 2>/dev/null || echo "$now")
            duration=$((now - down_start))
            
            log "✓ RECOVERY: $URL is back up (downtime: ${duration}s)"
            send_alert "[RECOVERY] $URL is back up" \
                "Site: $URL\nDowntime: ${duration}s\nRecovered: $(date)"
            
            is_down=false
        fi
    else
        # Site is DOWN
        ((failed_checks++))
        
        if ! $is_down; then
            # INITIAL FAILURE
            down_since=$(date '+%Y-%m-%d %H:%M:%S')
            log "✗ DOWN: $URL is not responding!"
            send_alert "[ALERT] $URL is down!" \
                "Site: $URL\nStatus: Not responding\nTime: $down_since"
            
            is_down=true
        else
            # STILL DOWN
            now=$(date +%s)
            down_start=$(date -d "$down_since" +%s 2>/dev/null || echo "$now")
            duration=$((now - down_start))
            log "✗ STILL DOWN: $URL (${duration}s)"
        fi
    fi
    
    # Calculate uptime percentage
    uptime_pct=$(echo "$total_checks $failed_checks" | \
        awk '{printf "%.2f", (1 - $2/$1) * 100}')
    
    log "Stats: $total_checks checks, uptime: ${uptime_pct}%"
    
    sleep "$INTERVAL"
done

🏋️ Exercise 3: Build a CSV Data Validator

Task: Write validate_csv.sh that processes a CSV file:

Input: name,email,age,score format
Validates:
- No empty fields
- Valid email format (regex)
- Age: 1-150
- Score: 0-100
Outputs:
- _clean.csv with valid records
- _errors.csv with invalid records and error descriptions
Prints summary statistics

Show Solution

#!/usr/bin/env bash
set -euo pipefail
 
INPUT="${1:?Usage: $0 INPUT.csv}"
CLEAN="${INPUT%.csv}_clean.csv"
ERRORS="${INPUT%.csv}_errors.csv"
 
[[ -f "$INPUT" ]] || { echo "File not found: $INPUT" >&2; exit 1; }
 
echo "Validating: $INPUT"
echo "Output: $CLEAN (valid), $ERRORS (invalid)"
echo ""
 
# Write headers
head -1 "$INPUT" > "$CLEAN"
echo "line,field,error,original_value" > "$ERRORS"
 
line_num=0
clean_count=0
error_count=0
 
declare -A error_types=()
 
while IFS=, read -r name email age score; do
    ((line_num++))
    [[ $line_num -eq 1 ]] && continue  # Skip header
    
    valid=true
    errors_this_line=()
    
    # Validate name (not empty)
    if [[ -z "$name" ]]; then
        errors_this_line+=("name:empty")
        echo "$line_num,name,empty,\"\"" >> "$ERRORS"
        ((error_types[empty_name]++))
        valid=false
    fi
    
    # Validate email (basic regex)
    email_regex='^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    if [[ ! "$email" =~ $email_regex ]]; then
        errors_this_line+=("email:invalid")
        echo "$line_num,email,invalid,\"$email\"" >> "$ERRORS"
        ((error_types[invalid_email]++))
        valid=false
    fi
    
    # Validate age (1-150)
    if ! [[ "$age" =~ ^[0-9]+$ ]] || (( age < 1 || age > 150 )); then
        errors_this_line+=("age:out_of_range")
        echo "$line_num,age,out_of_range,\"$age\"" >> "$ERRORS"
        ((error_types[invalid_age]++))
        valid=false
    fi
    
    # Validate score (0-100)
    if ! [[ "$score" =~ ^[0-9]+$ ]] || (( score < 0 || score > 100 )); then
        errors_this_line+=("score:out_of_range")
        echo "$line_num,score,out_of_range,\"$score\"" >> "$ERRORS"
        ((error_types[invalid_score]++))
        valid=false
    fi
    
    # Write to appropriate file
    if $valid; then
        echo "$name,$email,$age,$score" >> "$CLEAN"
        ((clean_count++))
    else
        ((error_count++))
    fi
done < "$INPUT"
 
# Summary
echo "=========================================="
echo "Validation Complete"
echo "=========================================="
echo "Total records:  $((line_num - 1))"
echo "Valid:          $clean_count (→ $CLEAN)"
echo "Invalid:        $error_count (→ $ERRORS)"
echo ""
 
if [[ $error_count -gt 0 ]]; then
    echo "Error breakdown:"
    for error_type in "${!error_types[@]}"; do
        printf "  %-20s %d\n" "$error_type:" "${error_types[$error_type]}"
    done
fi
 
echo "=========================================="

Summary

You've seen 5 production-quality scripts demonstrating:

Design Patterns:

Strict mode + error handling in every script
Clear usage messages and help text
Logging with timestamps
Exit codes for automation
Cleanup with trap

Common Tasks:

File management (normalize, organize)
System monitoring (health checks, uptime)
Data processing (log analysis, CSV validation)
Backup automation (with rotation policies)

Key Techniques:

getopts for argument parsing
Associative arrays for categorization
awk for complex text processing
Date arithmetic for rotation policies
Colorized output for terminal vs log files

What's Next:

Customize these scripts for your environment:
- Adjust thresholds in health monitor
- Add file categories to organizer
- Extend log analyzer for your log format
Deploy to production:
- Install in /usr/local/bin/
- Set up cron jobs
- Configure email alerts
Build your own:
- Start with one of these templates
- Add your specific logic
- Follow the same patterns (strict mode, logging, error handling)

You now have a complete toolkit for shell automation. Every problem—file management, monitoring, data processing, backups—follows these same patterns. Master them and you can automate anything.