javaMap

Speculative Ideas & Half-Baked Insights

Notes on connections and theories that aren’t fully developed but worth preserving


Discrete/Continuous Duality (2025-10-11)

The Observation

The spanning tree from Tracts clustering and the 1/d³ potential surface appear to be related as discrete/continuous views of the same underlying structure.

Potential Surface (continuous):

Spanning Tree (discrete):

The Physical Connection (NOT ARBITRARY!)

The relationship between 1/d⁴ (clustering) and 1/d³ (potential) is not arbitrary - it’s physics:

Force (pairwise):  F ∝ (m₁ × m₂) / d⁴
Potential (field): V = -∫ F·dr  ∝  m / d³

The potential is the integral of the force. This is the same relationship as in gravitational physics (though real gravity is 1/d² and 1/d, we use steeper exponents for population).

Why this matters:

The discrete spanning tree (force-driven merges) and continuous potential surface (integrated field) are physically consistent representations.

The Connection

Similar to Voronoi/Delaunay duality, but:

Key insight: Regional boundaries from clustering should align with saddle points in the potential surface.

Why It Matters

Potential surface: Shows the “true” continuous structure - all topology, all saddles, all scales. But intractable for optimization (infinite dimensionality).

Tracts algorithm: A tractable discrete approximation that discovers structure aligning with the continuous topology through:

Visualization Opportunity

Overlay spanning tree on potential surface:


Fractal Optimization Hypothesis (2025-10-11)

The Reach

Speculative claim: For problems with power-law interactions (1/d^n) and fractal spatial structure, greedy local optimization might achieve near-optimal multi-scale solutions in polynomial time.

Why This Might Work

The algorithm is:

Yet it discovers global, multi-scale structure coherently.

Why? If underlying structure is fractal (self-similar across scales):

Generalization?

Hierarchical optimization is usually intractable (exponential in tree structures).

But maybe for fractal problems with power-law interactions:

Possible applications:

Evidence

So far: one algorithm (population clustering) that:

Status: Interesting pattern, worth exploring. Not proven, possibly wrong, definitely speculative.


Surface Visualization Challenges (2025-10-11)

The Spikiness Problem

1/d³ potential surface is extremely spiky:

The issue: Trying to show Mount Everest and a speed bump on the same map.

Compression Approaches

Double-log: z = log(log(potential))

Percentile clipping: Cap at 95th percentile

Asinh transformation: Smooth linear → log transition

Multi-scale views: Separate visualizations for different scales

The Detail is Real

The spikiness means the data captures:

All present simultaneously. Challenge is controlling dynamic range without losing real structure.


Notes on Scale Invariance

The algorithm shows remarkable consistency across resolutions:

Implication: Not tuned to specific resolution, captures actual structure.

Why it matters: Most clustering algorithms are brittle to resolution changes. This one isn’t.


Visualization: Voronoi Regions vs Network (2025-10-12)

The Problem with Network Viz

Spanning tree network visualizations (edges between merging tracts) have issues:

Better Approach: Boundary-Based Regions

Concept: Show regions that emerge, not the network that creates them.

Algorithm:

  1. Run clustering to target N regions (e.g., 50 for “discovered states”)
  2. Each census tract belongs to one cluster
  3. Draw boundaries between tracts in different clusters
  4. Color each region distinctly

Result: Clean map showing natural regions, like political maps but discovered not imposed.

Advantages:

Animation Opportunity: Crystal Growth

Time-lapse showing region formation:

This is the “watching crystals grow” visualization - regions nucleate around population centers and expand until they meet at natural boundaries (the saddle points in the potential field).

Technical Implementation:

Why this works:

Status: Concept validated through static “50 regions” maps. Animation would make the scale-invariant property obvious and compelling.


Network Visualization: Dual Encodings (2025-10-12)

The Realization

Spanning tree edges have two distinct properties that tell different stories:

1. Merge Strength (physics): How strong is the attraction?

2. Merge Order (temporal): When did this connection form?

These are independent!

Visualization Approaches

Option 1: Opacity by strength, Color by order

Option 2: Two separate views with toggle

Option 3: Interactive zoom with adaptive encoding

Key Examples

Union Square, NYC:

El Paso/Juarez crossing:

Both are “key” but for different reasons - one for local structure, one for global structure.


“Main Street” as Heaviest Path (2025-10-12)

The Insight

The spanning tree defines a natural “Main Street” or primary corridor through a region - the path that connects the most population or has the strongest connections.

Definition: The path through the spanning tree that maximizes some weight metric:

Historical Evolution: I-80 → I-10

Observation: Main Street USA has shifted south over 50 years.

1970s: I-80 corridor was Main Street

2020s: I-10 corridor is Main Street

Why LA won over SF/San Diego:

The Algorithm Would Show:

Infrastructure Planning Applications

Natural “Main Street” could inform:

Historical validation:

Key insight: Infrastructure should follow population gravity, not arbitrary political decisions. The spanning tree reveals the optimal routes.


Redistricting as Tree Partition (2025-10-12)

The Problem

Given a spanning tree of census tracts with population weights, divide into k districts with equal population while minimizing unnatural cuts.

Ideal case (rarely exists):

Real case:

Example: US into 2 districts

Algorithm Framework

Proposed approach:

  1. Find single-edge cut closest to equal split (might be 45/55)
  2. If within tolerance (±2%), done
  3. Otherwise: Find smallest spur needed for rebalancing
  4. Cut that spur at the point that achieves equal population
  5. Document: “Main structure preserved, [region] split for balance”

Advantages:

Justification: “This is the physics-based baseline. If you want to split differently, propose an alternative and justify why your cuts respect the structure better.”

Contiguity Approximation

Use Delaunay triangulation:

Practical reality:

The Neutral Baseline Principle

Framework:

Advantage over current practice:

Political stance:


Topographic Prominence for Population Potential (2025-10-25)

The Goal

Calculate proper topographic prominence for population potential peaks to identify which metro areas are genuinely distinct vs riding on shoulders of larger metros.

Example: Orange County vs LA

The Problem

Standard BFS flooding finds ANY path from peak to higher ground, typically the lowest descent (ocean/desert). But prominence should use the saddle point = the highest of the low points across all routes.

Current naive BFS algorithm is wrong because:

The Correct Algorithm

Watershed/saddle-point finding:

  1. For each peak pair (lower, higher)
  2. Find all possible paths between them
  3. For each path, identify its lowest point
  4. Key col = MAX(lowest points) = the best/highest saddle
  5. Prominence = peak - key_col

Equivalently (watershed):

Implementation Challenges

For gridded data (GPW world):

For irregular data (US Census):

Relevant Libraries to Investigate

For gridded data:

For triangulated data:

Historical context:

Next Steps (Parking Lot)

  1. For world GPW data:
    • Exclude hex infill points from prominence calculation
    • Convert to regular 2D grid
    • Research proper scipy/skimage watershed for finding saddles
    • Test on known examples (LA/OC, Delhi/Mumbai)
  2. For US Census data:
    • Research TIN-based prominence algorithms
    • Look into terrain analysis libraries
    • Possibly collaborate with hydrology/GIS experts
    • Alternative: Interpolate to grid first (loses resolution)
  3. Validation:
    • Compare to known metro relationships
    • SF/Oakland should show low prominence (separated by water)
    • NYC boroughs should show high inter-connectivity
    • LA/OC should show moderate separation via Gateway Cities

Why This Matters

Prominence distinguishes:

Without correct prominence, we can’t answer questions like:

Status: Conceptually understood, algorithmically unsolved. Need proper watershed/saddle-finding for geographic point clouds.


Multi-Scale Animation (2025-10-25)

The Idea

Create an animation showing how the population potential landscape transforms across different spatial scales. As min_distance increases, watch individual cities merge into metro regions, then megalopolis clusters.

Why This Matters

Scale-dependent rankings are a feature, not a bug:

Rankings legitimately change with scale - this isn’t measurement error, it’s revealing different aspects of urban structure.

Implementation Approach

Fibonacci scale sequence:

Animation assembly:

Observable Phenomena

What you’d see:

  1. Small scales (5-10mi):
    • Very pointy landscape
    • Individual neighborhoods visible
    • Every small town is a distinct peak
  2. Medium scales (15-30mi):
    • Cities merge into metro regions
    • Peaks broaden and lower
    • Satellite cities start merging with cores
  3. Large scales (50-100mi):
    • Megalopolis regions emerge
    • BosWash corridor becomes single feature
    • Pearl River Delta merges
    • Continental-scale structure dominates
  4. Ranking changes:
    • Compact dense cores (Delhi, Dhaka) dominate at small scales
    • Sprawling metros (Tokyo, LA) gain at medium scales
    • Polycentric regions (Java, Eastern China) emerge at large scales

Technical Challenges

Computational:

Visual consistency:

Data considerations:

Builds on:

Next Steps (When Ready)

  1. Create animation script similar to existing scale experiments
  2. Test on smaller region first (California or similar)
  3. Optimize rendering for batch generation
  4. Decide on frame rate and transition style
  5. Generate world-scale version
  6. Upload to YouTube/share for feedback

Status: Conceptual. Would be compelling visualization of scale-dependent urban structure. Good candidate for outreach/communication of the project’s insights.


Camera Path Animation / Flyover Video (2025-10-25)

The Idea

Generate a video that “flies” around or through the population potential landscape. Camera moves along a predefined path while the data/landscape stays fixed.

Why This Is Cool

Possible Camera Paths

1. Global Orbit:

2. Zoom Sequence:

3. Flyover / Great Circle:

4. Comparative Split-Screen:

5. Continent Focus:

Technical Implementation

Frame generation:

# Pseudo-code
for i, camera_pos in enumerate(camera_path):
    fig = create_mesh_3d(
        lons, lats, potentials,
        camera=camera_pos,  # Only thing that changes
        # All other params constant
    )
    fig.write_image(f'frames/frame_{i:04d}.png')

Camera path calculation:

Video assembly:

ffmpeg -framerate 30 -i 'frames/frame_%04d.png' \
  -c:v libx264 -pix_fmt yuv420p -crf 18 \
  output.mp4

Parameters to Consider

Camera positioning:

Plotly camera dict:

camera = dict(
    eye=dict(x=x_pos, y=y_pos, z=z_pos),
    center=dict(x=0, y=0, z=0),
    up=dict(x=0, y=0, z=1)
)

Path smoothing:

Challenges

Rendering time:

File size:

Visual consistency:

Motion sickness:

Extensions

Audio:

Annotations:

Interactive:

Inspiration from:

Next Steps (When Ready)

  1. Start simple: Single 360° orbit, 10 seconds
  2. Test rendering pipeline and timing
  3. Refine camera path for smooth motion
  4. Generate full video
  5. Add music/annotations if desired
  6. Upload to YouTube or project page

Status: Conceptual. Technically straightforward using existing tools. Main cost is rendering time. Would make excellent outreach/presentation material.


Maximum Distance Calculation from Region Boundaries (2025-01-26)

The Goal

Instead of using arbitrary max_distance values (50 miles, 100 miles, etc.), calculate the theoretically correct maximum distance from the region’s actual boundaries or grid extent.

The Idea

“One calculation to rule them all” - compute population potential once with the correct max_distance derived from the data itself, then reuse that result forever without recalculating.

Implementation Approaches

For grid data (easy):

# Calculate bounding box diagonal
lon_span = max_lon - min_lon
lat_span = max_lat - min_lat
max_distance = haversine(min_lat, min_lon, max_lat, max_lon)

# Or half-diagonal for "influence radius"
max_distance = max_possible_distance / 2

For census tract data (hard):

For global consistency:

Trade-offs

Approach Pros Cons
No max_distance Theoretically pure; single canonical result Slowest; includes negligible contributions
Bounding box Easy; works for grids Conservative; includes empty space
Convex hull More accurate for irregular shapes Complex; requires shapefile processing
Fixed large value Simple; fast enough Arbitrary; different for each region

Status

Parked - Possibly YAGNI (You Ain’t Gonna Need It). Current approach with reasonable fixed cutoffs (100 miles, 500 miles, etc.) works well enough. Unclear if the added complexity of “perfect” max_distance calculation provides meaningful benefit.

Could revisit if:


Theoretical Insight: Scale Invariance and the 1/d³ Exponent (2025-10-26)

The Discovery

The 1/d³ exponent for population potential is not empirical - it’s theoretically required for scale invariance.

This represents a genuine theoretical contribution beyond pure data visualization.

The Argument

Question: What exponent n makes population potential scale-invariant when you coarsen a 2D grid?

Setup:

Requirement: Potential felt by observer must be the same whether you use fine or coarse representation.

Fine grid calculation:

Potential = Σ(P / distance^n) for 4 cells
         ≈ 4 × (P / D^n)  [since D >> d, all cells ~same distance]

Coarse grid calculation:

Potential = (4P / D^n)

For 2D grid coarsening: When you coarsen by 2×, each cell contains 4× the population (2D area).

Scale invariance requires:

4 × (P / D^n) = (4P / D^n)

This is automatically true! The key insight is that for any exponent n, if population scales as area (∝ L²) and distance is linear (∝ L), the potential remains scale-invariant.

Wait, that’s too general. What constrains n = 3?

The constraint comes from requiring that the 1/d³ potential integrates correctly from the 1/d⁴ force law:

Force (pairwise):  F ∝ (m₁ × m₂) / d⁴
Potential (field): V = -∫ F·dr  ∝  m / d³

The 1/d⁴ force law itself comes from scale invariance under grid coarsening:

Therefore: The entire framework (1/d⁴ force, 1/d³ potential) emerges from requiring physical consistency under 2D grid coarsening.

Empirical Validation

Tested across multiple resolutions:

Result: <1% variation in potential values across different grid resolutions when properly normalized.

The math predicts scale invariance, and the data confirms it.

Why This Matters

Not just data visualization:

Publication venues:

Key point: The exponent isn’t fitted or chosen empirically - it’s derived from first principles (scale invariance requirement). That’s what makes it theoretically interesting.

The scale invariance naturally leads to hierarchical/fractal-like population organization:

Multi-scale self-similarity:

Empirical observation from visualizations:

This is consistent with established urban geography (Zipf’s Law, rank-size distributions) but provides a physical mechanism explaining why cities organize this way.

Gemini/Opus Conversation Evaluation

External AI evaluations (from conversation transcript):

Gemini’s assessment:

Opus’s initial assessment:

Correction after user clarification: Both AIs acknowledged the scale invariance argument represents genuine theoretical work that “elevates the project from excellent data visualization to potential academic research.”

Takeaway: The visualization work is excellent outreach/education. The scale invariance derivation is the academic contribution. Both are valuable, but serve different purposes.

Status

Theory: Validated empirically across multiple datasets and resolutions. Ready to write up formally.

Applications:


3D Printing Population Potential Fields (2025-10-26)

The Goal

Generate physical 3D-printed models of population potential landscapes using a Bambu Lab printer (P1S) with 4-color AMS system.

Discrete Color Mapping for Multi-Material Printing

Challenge: Bambu AMS has 4 filament slots. Need to map continuous potential values to 4 discrete color bands that correspond to printable materials.

Solution: Percentile-based discrete colorscale

Implementation: Added --discrete-colors N flag to visualize_potential.py

Key Insights from Testing

1. Hexed data is essential:

2. Ocean should be deep blue:

3. Linear Z-mode preserves peaks:

4. Color shows relative ranking, not absolute:

5. Geographic scale matters:

Working Command for HQ Renders

python3 src/cli/visualize_potential.py <input.csv> \
  --type mesh \
  --discrete-colors 4 \
  --color-mode log \
  --z-mode linear \
  --z-scale 0.05 \
  --hq \
  --png \
  -o <output.png>

Key parameters:

Next Steps (After Dinner)

  1. Generate STL files: Create actual 3D printable geometry from potential data
  2. Slice in Bambu Studio: Assign filament colors by Z-height layers
  3. Test print: Start with small region (SF or CA) to validate workflow
  4. Iterate: Adjust Z-scale, base thickness, color boundaries as needed
  5. Full prints: USA, California, World at different scales

Technical Requirements for STL Generation

Geometry:

Format:

Libraries:

Status

Visualization pipeline: Working and validated

STL generation: Next task (after dinner)


End of speculative section. These ideas are works in progress. Some may be profound, some may be nonsense. Time will tell.