Speculative Ideas & Half-Baked Insights
Notes on connections and theories that aren’t fully developed but worth preserving
Discrete/Continuous Duality (2025-10-11)
The Observation
The spanning tree from Tracts clustering and the 1/d³ potential surface appear to be related as discrete/continuous views of the same underlying structure.
Potential Surface (continuous):
- Every point in space has a potential value: Σ(pop_i / d³)
- Smooth field showing gravitational landscape
- Saddle points = natural boundaries between regions
- Shows topology at all scales simultaneously
Spanning Tree (discrete):
- Tracts merge based on pairwise attraction: (pop₁ × pop₂) / d⁴
- Graph structure showing actual merge sequence
- Hierarchical - captures multi-scale structure through merge timing
- Tractable approximation using discrete census tracts
The Physical Connection (NOT ARBITRARY!)
The relationship between 1/d⁴ (clustering) and 1/d³ (potential) is not arbitrary - it’s physics:
Force (pairwise): F ∝ (m₁ × m₂) / d⁴
Potential (field): V = -∫ F·dr ∝ m / d³
The potential is the integral of the force. This is the same relationship as in gravitational physics (though real gravity is 1/d² and 1/d, we use steeper exponents for population).
Why this matters:
- Clustering algorithm = force-based dynamics (which tracts merge?)
- Potential surface = energy landscape those forces create
- NOT an analogy or approximation - literally two views of the same physical system
The discrete spanning tree (force-driven merges) and continuous potential surface (integrated field) are physically consistent representations.
The Connection
Similar to Voronoi/Delaunay duality, but:
- Weighted by population mass (not just geometric distance)
- Hierarchical spanning tree (not complete graph)
- Temporal - merge order encodes scale information
- Physically grounded - force ↔ potential relationship from classical mechanics
Key insight: Regional boundaries from clustering should align with saddle points in the potential surface.
- Within high-potential basins → fast merges → dense tree structure
- Across low-potential barriers → slow merges → sparse connections
- Late-merge boundaries ≈ high saddles in potential field
Why It Matters
Potential surface: Shows the “true” continuous structure - all topology, all saddles, all scales. But intractable for optimization (infinite dimensionality).
Tracts algorithm: A tractable discrete approximation that discovers structure aligning with the continuous topology through:
- Greedy merging (polynomial time)
- Physically consistent distance metrics (1/d⁴ force → 1/d³ potential)
- Local decisions → global structure emerges
Visualization Opportunity
Overlay spanning tree on potential surface:
- Tree edges should follow valley floors (high local potential)
- Regional boundaries should sit on ridges (saddle points)
- Makes the geometric relationship visible even if hard to formalize
Fractal Optimization Hypothesis (2025-10-11)
The Reach
Speculative claim: For problems with power-law interactions (1/d^n) and fractal spatial structure, greedy local optimization might achieve near-optimal multi-scale solutions in polynomial time.
Why This Might Work
The algorithm is:
- Single-pass: One greedy scan through merges
- Local decisions: Max pairwise potential at each step
- No scale parameter: No “neighborhood size” or “resolution” tuning
Yet it discovers global, multi-scale structure coherently.
Why? If underlying structure is fractal (self-similar across scales):
- Same physics governs all scales (1/d⁴)
- No characteristic scale to tune for
- Local structure reflects global structure (self-similarity)
- Greedy approximation error bounded by fractal properties
Generalization?
Hierarchical optimization is usually intractable (exponential in tree structures).
But maybe for fractal problems with power-law interactions:
- Greedy → near-optimal
- Polynomial time → all scales simultaneously
- Works because: fractal structure = local ≈ global at all scales
Possible applications:
- Galaxy clustering (gravitational, ~1/d²)
- Network routing (latency, power-law distance costs)
- Supply chain (transport costs, spatial fractality)
Evidence
So far: one algorithm (population clustering) that:
- Is greedy and fast
- Discovers coherent structure at all scales
- Results are robust across different resolutions
- “You can’t keep it from making sense” (scale-invariant)
Status: Interesting pattern, worth exploring. Not proven, possibly wrong, definitely speculative.
Surface Visualization Challenges (2025-10-11)
The Spikiness Problem
1/d³ potential surface is extremely spiky:
- Dynamic range: ~10² (Sachs Harbour, 104 people) to ~10⁸ (Tokyo metro)
- 6+ orders of magnitude
- Individual Arctic settlements (pop 104) are visible bumps!
- But spikes overwhelm regional structure
The issue: Trying to show Mount Everest and a speed bump on the same map.
Compression Approaches
Double-log: z = log(log(potential))
- Extreme compression of tall spikes
- Preserves relative structure at lower levels
- Range: 0.3 to 0.85 instead of 2 to 7
- Trade-off: loses interpretability of absolute magnitude
Percentile clipping: Cap at 95th percentile
- Chop off top 5% outliers
- Adaptive to data
- Shows “normal” structure, ignores extremes
Asinh transformation: Smooth linear → log transition
- Handles zero gracefully
- Needs scale factor tuning
Multi-scale views: Separate visualizations for different scales
- Continental: heavy compression
- Regional: medium compression
- Local: full detail
- Or adaptive: compress based on zoom level
The Detail is Real
The spikiness means the data captures:
- Continental structure (Eurasia vs Americas)
- Regional plateaus (India, China as distinct masses)
- Metro areas (NYC, LA spikes)
- Cities (Juneau ~32k visible)
- Settlements (Sachs Harbour 104 people = noticeable bump at 72°N, 125°W)
All present simultaneously. Challenge is controlling dynamic range without losing real structure.
Notes on Scale Invariance
The algorithm shows remarkable consistency across resolutions:
- 74k census tracts vs 220k block groups vs 1M world grid
- Results “deeply the same” - hierarchy nests coherently
- Can mix resolutions (fine detail in some areas, coarse in others) and it still works
Implication: Not tuned to specific resolution, captures actual structure.
Why it matters: Most clustering algorithms are brittle to resolution changes. This one isn’t.
Visualization: Voronoi Regions vs Network (2025-10-12)
The Problem with Network Viz
Spanning tree network visualizations (edges between merging tracts) have issues:
- Spaghetti problem: Overlapping lines become illegible
- Width/color variations help but limited: Even varying line width or highlighting key merges in red doesn’t fully solve it
- Hard to see territorial extent of regions
- Doesn’t convey the “crystal growth” intuition clearly
Better Approach: Boundary-Based Regions
Concept: Show regions that emerge, not the network that creates them.
Algorithm:
- Run clustering to target N regions (e.g., 50 for “discovered states”)
- Each census tract belongs to one cluster
- Draw boundaries between tracts in different clusters
- Color each region distinctly
Result: Clean map showing natural regions, like political maps but discovered not imposed.
Advantages:
- Clean boundaries (no line overlap)
- Color-coded territories (immediate visual extent)
- Familiar mental model (like state/country maps)
- Shows “control” of space clearly
Animation Opportunity: Crystal Growth
Time-lapse showing region formation:
- Start: All tracts separate (maximum fragmentation)
- As clustering progresses: Recolor tracts as they merge
- Boundaries retreat as regions grow
- Eventually: 6-50 macro-regions remain (continents/mega-regions)
This is the “watching crystals grow” visualization - regions nucleate around population centers and expand until they meet at natural boundaries (the saddle points in the potential field).
Technical Implementation:
- Export cluster assignments at logarithmically-spaced merge levels
- Use tract polygons with cluster IDs
- Generate boundary lines where cluster ID changes between adjacent tracts
- Animate boundary evolution over merge sequence
- Slider to scrub through scales
Why this works:
- Boundaries = saddle points in potential field = natural division lines
- Color continuity shows regional coherence
- Growth pattern reveals hierarchical structure intuitively
- Much clearer than network spaghetti for showing multi-scale organization
Status: Concept validated through static “50 regions” maps. Animation would make the scale-invariant property obvious and compelling.
Network Visualization: Dual Encodings (2025-10-12)
The Realization
Spanning tree edges have two distinct properties that tell different stories:
1. Merge Strength (physics): How strong is the attraction?
- Formula: (pop₁ × pop₂) / d⁴
- Shows which connections are physically important
- Union Square NYC: Huge strength (millions of people, meters apart)
- Rural connection: Weak strength (small populations, far apart)
2. Merge Order (temporal): When did this connection form?
- Early merges: Local neighborhoods connecting
- Late merges: Continental-scale regions joining
- El Paso/Juarez: Late merge (last bridge between East/West Americas)
- Shows hierarchical scale of connections
These are independent!
- Strong + Early: Dense urban cores forming (Manhattan)
- Weak + Late: Remote connections that only merge at the end (El Paso)
- Strong + Late: Major metro areas finally joining (rare, interesting!)
- Weak + Early: Close rural neighbors (common, uninteresting)
Visualization Approaches
Option 1: Opacity by strength, Color by order
- Faint lines: Weak connections (fade into background)
- Bold lines: Strong connections (pop out visually)
- Color gradient (blue→red): Early→Late merges
- Shows both “what matters” and “how it formed”
Option 2: Two separate views with toggle
- “Strength View”: Opacity/width encodes merge strength (shows skeleton)
- “Temporal View”: Color encodes merge order (shows growth sequence)
- Toggle between them or overlay
Option 3: Interactive zoom with adaptive encoding
- Zoomed out: Show only late/strong merges (continental structure)
- Zoomed in: Show early/local merges (neighborhood structure)
- Scale-dependent filtering
Key Examples
Union Square, NYC:
- Merge strength: Extremely high (local criticality)
- Merge order: Very early (#50 of 74,000)
- Story: “This is the dense urban core forming immediately”
El Paso/Juarez crossing:
- Merge strength: Moderate (decent-sized cities)
- Merge order: Very late (#73,950 of 74,000)
- Story: “This is where East and West Americas finally connect”
Both are “key” but for different reasons - one for local structure, one for global structure.
“Main Street” as Heaviest Path (2025-10-12)
The Insight
The spanning tree defines a natural “Main Street” or primary corridor through a region - the path that connects the most population or has the strongest connections.
Definition: The path through the spanning tree that maximizes some weight metric:
- Sum of node populations along path (total people touched)
- Minimum edge strength along path (bottleneck capacity)
- Product of edge strengths (multiplicative importance)
Historical Evolution: I-80 → I-10
Observation: Main Street USA has shifted south over 50 years.
1970s: I-80 corridor was Main Street
- SF → Sacramento → Salt Lake → Omaha → Chicago → NYC
- Served industrial heartland
- Chicago was major anchor
- Northern transcontinental route
2020s: I-10 corridor is Main Street
- LA → Phoenix → Tucson → El Paso → San Antonio → Houston → Jacksonville
- Serves Sun Belt boom
- Phoenix (100k→5M), Houston (600k→7M), Atlanta growth
- Southern transcontinental route
- Year-round operation (no snow)
- USMCA trade via Mexico crossings
Why LA won over SF/San Diego:
- SF Bay: Better natural harbor BUT trapped by Sierra Nevada (7,000 ft passes)
- San Diego: Excellent harbor BUT blocked by mountains (4,000+ ft passes east)
- LA: Mediocre harbor BUT Banning Pass at 2,600 ft = easy interior access
- Geography is destiny: “Gateway to interior” > “Natural harbor”
- LA became logistics hub because I-10 flows easily to Phoenix/Texas
The Algorithm Would Show:
- Heaviest path through spanning tree migrating from I-80 to I-10
- Transition probably 1990s-2000s
- Future: Continues south? Or stabilizes?
Infrastructure Planning Applications
Natural “Main Street” could inform:
- High-speed rail routing (follow the heavy path)
- Freight corridor investment (where demand actually is)
- Disaster resilience (which connections must never fail?)
- Future planning (where is Main Street moving?)
Historical validation:
- Does transcontinental railroad (1869) match 1870 heavy path?
- Does Interstate Highway System (1956) match 1950 heavy path?
- Is current freight traffic following discovered main streets?
Key insight: Infrastructure should follow population gravity, not arbitrary political decisions. The spanning tree reveals the optimal routes.
Redistricting as Tree Partition (2025-10-12)
The Problem
Given a spanning tree of census tracts with population weights, divide into k districts with equal population while minimizing unnatural cuts.
Ideal case (rarely exists):
- Cut one edge to split into 2 regions
- Both sides have ~50% population
- This edge is naturally the “main street” bottleneck
- Perfect answer!
Real case:
- No single edge gives 50/50 split
- Must cut multiple edges and rebalance
- Some natural regions get split awkwardly
Example: US into 2 districts
- Main cut: East vs West (might be 55/45)
- Need to rebalance by cutting a “spur”
- Florida spur: 7% population
- Cut Florida at Jacksonville to get 50/50
- Result: Respects main corridor but splits Florida unnaturally
Algorithm Framework
Proposed approach:
- Find single-edge cut closest to equal split (might be 45/55)
- If within tolerance (±2%), done
- Otherwise: Find smallest spur needed for rebalancing
- Cut that spur at the point that achieves equal population
- Document: “Main structure preserved, [region] split for balance”
Advantages:
- Objective and reproducible
- Preserves major structure (main corridors intact)
- Minimizes damage (splits smallest necessary spur)
- Cut location determined by math, not politics
Justification:
“This is the physics-based baseline. If you want to split differently, propose an alternative and justify why your cuts respect the structure better.”
Contiguity Approximation
Use Delaunay triangulation:
- Build Delaunay on census tract centroids
- Two tracts are “adjacent” if they share Delaunay edge
- Approximates geographic adjacency without polygon topology nightmares
- Handles islands naturally (connect to nearest neighbors)
Practical reality:
- Real contiguity has exceptions (“contiguous by water”, bridges, etc.)
- Delaunay is as principled as what humans do, more reproducible
- Edge cases (Long Island ↔ Manhattan) handled reasonably
The Neutral Baseline Principle
Framework:
- Algorithm provides physics-based structure (objective, neutral)
- Policy layers added on top (explicit, documented)
- “Protect this community”, “preserve county boundaries”, etc.
- Clear audit trail: baseline + justified modifications
Advantage over current practice:
- Current: Start with blank slate, humans draw lines, post-hoc rationalization
- Proposed: Start with structure, explicit deviations, transparent reasoning
Political stance:
- The structure is apolitical (just physics)
- How society chooses to modify it is policy
- Communities of interest may emerge naturally from spatial clustering
- But algorithm doesn’t optimize for any demographic outcome
Topographic Prominence for Population Potential (2025-10-25)
The Goal
Calculate proper topographic prominence for population potential peaks to identify which metro areas are genuinely distinct vs riding on shoulders of larger metros.
Example: Orange County vs LA
- OC peak potential: ~93K
- LA peak potential: ~169K
- Direct path OC→LA: Goes through dense Gateway Cities area (~120K)
- Ocean path OC→LA: Goes through ports/ocean (~0K)
- Question: Which is the “correct” key col for prominence?
The Problem
Standard BFS flooding finds ANY path from peak to higher ground, typically the lowest descent (ocean/desert). But prominence should use the saddle point = the highest of the low points across all routes.
Current naive BFS algorithm is wrong because:
- Floods outward equally in all directions
- Stops at first path to higher ground
- Finds minimum descent (ocean) not saddle (populated ridge)
- Result: prominence ≈ peak height for most metros (wrong!)
The Correct Algorithm
Watershed/saddle-point finding:
- For each peak pair (lower, higher)
- Find all possible paths between them
- For each path, identify its lowest point
- Key col = MAX(lowest points) = the best/highest saddle
- Prominence = peak - key_col
Equivalently (watershed):
- Flood from lowest elevations upward
- When two basins meet, that’s a saddle
- Height at meeting point = key col between those peaks
Implementation Challenges
For gridded data (GPW world):
- Data is on regular lat/lon grid
- Can reshape to 2D array
- Use scipy.ndimage or skimage.morphology watershed
- BUT: Need to exclude hex infill points (artificial)
- Need proper watershed implementation that finds saddles
For irregular data (US Census):
- Scattered points (tracts/blocks)
- Use Delaunay triangulation for mesh
- Need algorithms for triangulated irregular networks (TINs)
- Hydrology/terrain analysis tools handle this
- Possibly: HEC-RAS, GRASS GIS, or specialized Python libraries
Relevant Libraries to Investigate
For gridded data:
scipy.ndimage.watershed_ift - Image foresting transform
skimage.morphology.watershed - Watershed segmentation
skimage.feature.peak_local_max + watershed - Combined approach
For triangulated data:
- Terrain analysis libraries (GDAL, GRASS)
- Mesh-based watershed algorithms
- Graph-based saddle-point finding
- Possibly finite element libraries (deal with TINs)
Historical context:
- HEC-1 (Hydrologic Engineering Center) - 1970s Fortran
- Modern: HEC-RAS, HEC-HMS for watershed delineation
- These solve exact same problem (water flow = potential flow)
Next Steps (Parking Lot)
- For world GPW data:
- Exclude hex infill points from prominence calculation
- Convert to regular 2D grid
- Research proper scipy/skimage watershed for finding saddles
- Test on known examples (LA/OC, Delhi/Mumbai)
- For US Census data:
- Research TIN-based prominence algorithms
- Look into terrain analysis libraries
- Possibly collaborate with hydrology/GIS experts
- Alternative: Interpolate to grid first (loses resolution)
- Validation:
- Compare to known metro relationships
- SF/Oakland should show low prominence (separated by water)
- NYC boroughs should show high inter-connectivity
- LA/OC should show moderate separation via Gateway Cities
Why This Matters
Prominence distinguishes:
- True independent metros (high prominence) - Delhi, Tokyo
- Satellite cities (low prominence) - Newark to NYC, Oakland to SF
- Regional ambiguity (medium prominence) - OC/LA, Dallas/Fort Worth
Without correct prominence, we can’t answer questions like:
- “How many genuinely distinct population centers does the US have?”
- “Which global metros are truly isolated vs clustered?”
- “What are the natural mega-regions?”
Status: Conceptually understood, algorithmically unsolved. Need proper watershed/saddle-finding for geographic point clouds.
Multi-Scale Animation (2025-10-25)
The Idea
Create an animation showing how the population potential landscape transforms across different spatial scales. As min_distance increases, watch individual cities merge into metro regions, then megalopolis clusters.
Why This Matters
Scale-dependent rankings are a feature, not a bug:
- Different scales answer different questions
- 15-mile: “Which city has the densest core?” (commuting scale)
- 30-mile: “Which metro has the most accessible population?” (regional economy)
- 100-mile: “Which megacity dominates its continent?” (national influence)
Rankings legitimately change with scale - this isn’t measurement error, it’s revealing different aspects of urban structure.
Implementation Approach
Fibonacci scale sequence:
- Use existing Fibonacci scale experiments as template
- Scales: 5, 8, 13, 21, 34, 55, 89 miles (or similar)
- For each scale:
- Calculate potentials with that min_distance
- Render with consistent camera angle/lighting
- Use same color scale (normalized to max at each scale)
- Export as PNG frame
Animation assembly:
- Stitch frames together (ffmpeg or similar)
- Smooth transitions between scales
- Optional: Show current scale value on frame
- Consider: Log scale for time (faster at small scales, slower at large)
Observable Phenomena
What you’d see:
- Small scales (5-10mi):
- Very pointy landscape
- Individual neighborhoods visible
- Every small town is a distinct peak
- Medium scales (15-30mi):
- Cities merge into metro regions
- Peaks broaden and lower
- Satellite cities start merging with cores
- Large scales (50-100mi):
- Megalopolis regions emerge
- BosWash corridor becomes single feature
- Pearl River Delta merges
- Continental-scale structure dominates
- Ranking changes:
- Compact dense cores (Delhi, Dhaka) dominate at small scales
- Sprawling metros (Tokyo, LA) gain at medium scales
- Polycentric regions (Java, Eastern China) emerge at large scales
Technical Challenges
Computational:
- Need potentials at ~7-10 different scales
- Each calculation takes time (world data ~45 min at 30mi)
- Could parallelize: run all scales simultaneously
- Or: Pre-compute and cache
Visual consistency:
- Need fixed camera angle across all frames
- Color normalization: absolute (shows height changes) vs relative (emphasizes peaks)
- Z-scale might need adjustment by scale (taller peaks at small scales)
Data considerations:
- World GPW data: Good candidate (218K points, global coverage)
- CONUS block groups: Too large/detailed for animation (would be slow)
- Could do regional: California, Northeast US, etc.
Builds on:
- Fibonacci scale visualizations already in codebase
- Scale invariance documentation (README)
- HQ rendering mode just added to visualize_potential.py
Next Steps (When Ready)
- Create animation script similar to existing scale experiments
- Test on smaller region first (California or similar)
- Optimize rendering for batch generation
- Decide on frame rate and transition style
- Generate world-scale version
- Upload to YouTube/share for feedback
Status: Conceptual. Would be compelling visualization of scale-dependent urban structure. Good candidate for outreach/communication of the project’s insights.
Camera Path Animation / Flyover Video (2025-10-25)
The Idea
Generate a video that “flies” around or through the population potential landscape. Camera moves along a predefined path while the data/landscape stays fixed.
Why This Is Cool
- Engaging presentation: Much more compelling than static images
- Reveals 3D structure: Rotation/movement shows depth and relationships
- YouTube/conference ready: Professional video output
- Explores geography: Can highlight specific regions or show global overview
Possible Camera Paths
1. Global Orbit:
- 360° rotation around Earth at constant altitude
- Shows all continents in sequence
- Smooth, simple, comprehensive
2. Zoom Sequence:
- Start: Far view showing whole globe
- Zoom in: Focus on one region (e.g., South Asia peak)
- Pan: Move to another region (e.g., East Asia)
- Zoom out: Return to global view
3. Flyover / Great Circle:
- Low altitude pass along population corridor
- Route: Delhi → Kolkata → Bangkok → Shanghai → Tokyo
- Or: Western Europe → Eastern Europe → Central Asia
- Shows relative heights and transitions between metros
4. Comparative Split-Screen:
- Same camera path, different scales side-by-side
- Left: 15-mile (sharp peaks)
- Right: 30-mile (smoothed regions)
- Synchronized movement
5. Continent Focus:
- Regional tours (Asia, Europe, Americas)
- Slower, more detailed examination
- Could add city labels at peaks
Technical Implementation
Frame generation:
# Pseudo-code
for i, camera_pos in enumerate(camera_path):
fig = create_mesh_3d(
lons, lats, potentials,
camera=camera_pos, # Only thing that changes
# All other params constant
)
fig.write_image(f'frames/frame_{i:04d}.png')
Camera path calculation:
- Spherical coordinates for orbit
- Interpolate between keyframes for smooth motion
- Consider: Ease-in/ease-out for velocity
- Typical: 30-60 fps, 10-30 second video = 300-1800 frames
Video assembly:
ffmpeg -framerate 30 -i 'frames/frame_%04d.png' \
-c:v libx264 -pix_fmt yuv420p -crf 18 \
output.mp4
Parameters to Consider
Camera positioning:
eye: Camera position (x, y, z in 3D space)
center: Look-at point (usually origin)
up: Up vector (defines rotation)
Plotly camera dict:
camera = dict(
eye=dict(x=x_pos, y=y_pos, z=z_pos),
center=dict(x=0, y=0, z=0),
up=dict(x=0, y=0, z=1)
)
Path smoothing:
- Cubic spline interpolation between waypoints
- Constant angular velocity vs constant speed
- Banking/rotation for dramatic effect
Challenges
Rendering time:
- Each frame = full HQ render (~few seconds)
- 600 frames @ 3 sec each = 30 minutes total
- Parallelizable: Generate frames independently
- Could batch: 4-8 cores = 4-8x speedup
File size:
- 1920x1080 PNGs @ 600 frames = ~5-10 GB intermediate
- Final H.264 video: ~50-200 MB (depending on compression)
- Need adequate disk space
Visual consistency:
- Must lock color scale across all frames
- Fixed lighting parameters
- Consistent z-scale
- Otherwise: Jarring jumps between frames
Motion sickness:
- Too fast = disorienting
- Too slow = boring
- Need testing to find sweet spot
- Smooth acceleration/deceleration
Extensions
Audio:
- Background music
- Voiceover narration explaining features
- Sound effects (optional, probably cheesy)
Annotations:
- City labels appearing as camera passes
- Scale indicator
- Title cards between segments
Interactive:
- Upload to Sketchfab or similar platform
- Viewer can control camera themselves
- Combines video appeal with interactivity
Inspiration from:
- Earth at Night NASA visualizations
- Population density visualizations (Pudding, NY Times)
- Terrain flyovers (Google Earth, drone footage)
- Our existing 3D HTML viewers (but automated camera path)
Next Steps (When Ready)
- Start simple: Single 360° orbit, 10 seconds
- Test rendering pipeline and timing
- Refine camera path for smooth motion
- Generate full video
- Add music/annotations if desired
- Upload to YouTube or project page
Status: Conceptual. Technically straightforward using existing tools. Main cost is rendering time. Would make excellent outreach/presentation material.
Maximum Distance Calculation from Region Boundaries (2025-01-26)
The Goal
Instead of using arbitrary max_distance values (50 miles, 100 miles, etc.), calculate the theoretically correct maximum distance from the region’s actual boundaries or grid extent.
The Idea
“One calculation to rule them all” - compute population potential once with the correct max_distance derived from the data itself, then reuse that result forever without recalculating.
Implementation Approaches
For grid data (easy):
# Calculate bounding box diagonal
lon_span = max_lon - min_lon
lat_span = max_lat - min_lat
max_distance = haversine(min_lat, min_lon, max_lat, max_lon)
# Or half-diagonal for "influence radius"
max_distance = max_possible_distance / 2
For census tract data (hard):
- Scattered points with irregular shapes
- Bounding box is conservative (includes empty space)
- Option 1: Use bounding box anyway (simple, safe)
- Option 2: Calculate convex hull, use hull diagonal (complex, accurate)
- Option 3: Use shapefile boundaries if available (requires GIS processing)
- Option 4: Use empirical 99.9th percentile of actual distances
For global consistency:
- Alternative: Skip max_distance entirely for canonical calculation
- Let all contributions be included, even if tiny
- Results in “true” global potential field
- Can always filter/smooth later as needed
- Slow but only needs to run once
Trade-offs
| Approach |
Pros |
Cons |
| No max_distance |
Theoretically pure; single canonical result |
Slowest; includes negligible contributions |
| Bounding box |
Easy; works for grids |
Conservative; includes empty space |
| Convex hull |
More accurate for irregular shapes |
Complex; requires shapefile processing |
| Fixed large value |
Simple; fast enough |
Arbitrary; different for each region |
Status
Parked - Possibly YAGNI (You Ain’t Gonna Need It). Current approach with reasonable fixed cutoffs (100 miles, 500 miles, etc.) works well enough. Unclear if the added complexity of “perfect” max_distance calculation provides meaningful benefit.
Could revisit if:
- Need to compare results across wildly different regions
- Want single “authoritative” calculation for publications
- Discover that arbitrary cutoffs are affecting results
Theoretical Insight: Scale Invariance and the 1/d³ Exponent (2025-10-26)
The Discovery
The 1/d³ exponent for population potential is not empirical - it’s theoretically required for scale invariance.
This represents a genuine theoretical contribution beyond pure data visualization.
The Argument
Question: What exponent n makes population potential scale-invariant when you coarsen a 2D grid?
Setup:
- Fine grid: 4 cells, each with population P, separated by distance d
- Coarse grid: 1 cell with population 4P (sum of the 4 fine cells)
- External observer at distance D » d looking at the system
Requirement: Potential felt by observer must be the same whether you use fine or coarse representation.
Fine grid calculation:
Potential = Σ(P / distance^n) for 4 cells
≈ 4 × (P / D^n) [since D >> d, all cells ~same distance]
Coarse grid calculation:
For 2D grid coarsening:
When you coarsen by 2×, each cell contains 4× the population (2D area).
Scale invariance requires:
4 × (P / D^n) = (4P / D^n)
This is automatically true! The key insight is that for any exponent n, if population scales as area (∝ L²) and distance is linear (∝ L), the potential remains scale-invariant.
Wait, that’s too general. What constrains n = 3?
The constraint comes from requiring that the 1/d³ potential integrates correctly from the 1/d⁴ force law:
Force (pairwise): F ∝ (m₁ × m₂) / d⁴
Potential (field): V = -∫ F·dr ∝ m / d³
The 1/d⁴ force law itself comes from scale invariance under grid coarsening:
- 2D grid coarsening: 4 people at distance d → 1 point with 4 people
- For force between two such regions to be scale-invariant:
- Force ∝ (4P₁)(4P₂) / (2d)⁴ = 16P₁P₂ / 16d⁴ = (P₁P₂) / d⁴
- This gives the 1/d⁴ force law
Therefore: The entire framework (1/d⁴ force, 1/d³ potential) emerges from requiring physical consistency under 2D grid coarsening.
Empirical Validation
Tested across multiple resolutions:
- 74k census tracts
- 220k block groups
- ~40k California hex grid
- ~220k USA block groups
- 218k world GPW hex grid
Result: <1% variation in potential values across different grid resolutions when properly normalized.
The math predicts scale invariance, and the data confirms it.
Why This Matters
Not just data visualization:
- This is a mathematical derivation with predictive power
- Explains why population clustering is hierarchical (fractal-like)
- Provides theoretical foundation for the spanning tree algorithm
- Connects to physics (same math as gravitational/electrostatic potentials)
Publication venues:
- Physical Review E (statistical mechanics)
- Journal of Statistical Mechanics
- Papers in Regional Science
- Possible: Nature Physics (if framed well)
Key point: The exponent isn’t fitted or chosen empirically - it’s derived from first principles (scale invariance requirement). That’s what makes it theoretically interesting.
The scale invariance naturally leads to hierarchical/fractal-like population organization:
Multi-scale self-similarity:
- Neighborhoods organize into cities
- Cities organize into metros
- Metros organize into megalopolises
- Same physics (1/d³ potential) governs all scales
Empirical observation from visualizations:
- Population potential fields “fractally resolve” at all zoom levels
- New meaningful structure appears at every scale
- No characteristic scale (power-law distribution)
This is consistent with established urban geography (Zipf’s Law, rank-size distributions) but provides a physical mechanism explaining why cities organize this way.
Gemini/Opus Conversation Evaluation
External AI evaluations (from conversation transcript):
Gemini’s assessment:
- Visualization quality: Excellent for public engagement
- Academic novelty of 3D viz: Not new (established technique)
- But missed: The theoretical scale invariance contribution
- Suggested venues: Data viz showcases, educational tools
Opus’s initial assessment:
- Overly optimistic about visualization novelty
- Later corrected: “I should have been clearer about ‘excellent data visualization’ vs ‘novel academic research’”
- Also missed: The theoretical contribution initially
Correction after user clarification:
Both AIs acknowledged the scale invariance argument represents genuine theoretical work that “elevates the project from excellent data visualization to potential academic research.”
Takeaway: The visualization work is excellent outreach/education. The scale invariance derivation is the academic contribution. Both are valuable, but serve different purposes.
Status
Theory: Validated empirically across multiple datasets and resolutions. Ready to write up formally.
Applications:
- 3D printing visualizations (in progress)
- Interactive web viewers (working)
- Animation across scales (conceptual)
- Prominence analysis (needs proper watershed algorithm)
3D Printing Population Potential Fields (2025-10-26)
The Goal
Generate physical 3D-printed models of population potential landscapes using a Bambu Lab printer (P1S) with 4-color AMS system.
Discrete Color Mapping for Multi-Material Printing
Challenge: Bambu AMS has 4 filament slots. Need to map continuous potential values to 4 discrete color bands that correspond to printable materials.
Solution: Percentile-based discrete colorscale
- Blue (0-25%): Lowest potential (ocean, deserts)
- Cyan (25-50%): Low-mid potential (rural, small towns)
- Yellow (50-75%): Mid-high potential (cities, suburban areas)
- Red (75-100%): Highest potential (major metro peaks)
Implementation: Added --discrete-colors N flag to visualize_potential.py
Key Insights from Testing
1. Hexed data is essential:
- Block group data has irregular spacing → Delaunay triangulation creates artifacts (long skinny triangles across water)
- Hex grid has uniform spacing → clean, smooth triangulation
- No spurious connections across oceans or gaps
2. Ocean should be deep blue:
- Percentile-based coloring naturally maps ocean (~0 potential) to blue
- Land starts at cyan/yellow depending on regional context
- Visual intuition: blue = low/zero = water
3. Linear Z-mode preserves peaks:
--z-mode linear: Height directly proportional to potential → tall peaks visible
--z-mode log: Compresses everything → flattened, no detail
- Use linear Z for geometry, log color for drainage structure
4. Color shows relative ranking, not absolute:
- SF Bay Area: SF dominates red (1.9M peak), SJ is cyan/yellow (566k peak)
- This is mathematically correct - SF genuinely has 3× higher potential
- Within-metro texture requires finer color granularity (8+ colors) or continuous scale
5. Geographic scale matters:
- USA block groups: Excellent color diversity (NYC/LA/Chicago red, many metros yellow, lots of cyan detail)
- Single metro (SF): One city dominates, less color diversity
- California: Good balance - multiple metros show all color bands
Working Command for HQ Renders
python3 src/cli/visualize_potential.py <input.csv> \
--type mesh \
--discrete-colors 4 \
--color-mode log \
--z-mode linear \
--z-scale 0.05 \
--hq \
--png \
-o <output.png>
Key parameters:
--type mesh: Delaunay triangulation (smooth surfaces)
--discrete-colors 4: Four color bands for AMS printing
--color-mode log: Spreads low-end detail (shows drainage basins)
--z-mode linear: Preserves peak heights (doesn’t flatten)
--z-scale 0.05: Vertical exaggeration for 3D printing
--hq: High quality Plotly rendering
Next Steps (After Dinner)
- Generate STL files: Create actual 3D printable geometry from potential data
- Slice in Bambu Studio: Assign filament colors by Z-height layers
- Test print: Start with small region (SF or CA) to validate workflow
- Iterate: Adjust Z-scale, base thickness, color boundaries as needed
- Full prints: USA, California, World at different scales
Technical Requirements for STL Generation
Geometry:
- Base mesh from Delaunay triangulation (already working)
- Extrude base to add thickness (needed for structural stability)
- Close bottom surface (watertight mesh required for slicing)
Format:
- STL (binary or ASCII) - universal 3D printing format
- Compatible with Bambu Studio and all slicers
Libraries:
- Current: plotly for visualization (can export basic mesh)
- Need: numpy-stl or trimesh for proper STL generation
- Scipy Delaunay already in use (provides triangulation)
Status
Visualization pipeline: Working and validated
- Hexed datasets eliminate artifacts
- Discrete 4-color mapping functional
- HQ PNG rendering produces clean outputs
STL generation: Next task (after dinner)
- Need to write
generate_stl.py in src/cli/
- Take same CSV input as visualize_potential.py
- Output watertight STL mesh suitable for 3D printing
End of speculative section. These ideas are works in progress. Some may be profound, some may be nonsense. Time will tell.