Networking and Tailscale: Architecture and Connectivity

Network Architecture Overview

Your snek.cc server operates as an edge node in a larger network topology. Understanding this architecture helps explain why certain configuration choices matter—from firewall rules to Tailscale integration.

The three network zones:

1. *Public Internet:* Untrusted, where user requests come from

2. *Tailscale Mesh:* Encrypted, authenticated VPN connecting your machines

3. *Localhost (127.0.0.1):* Trusted, where backend services communicate

Traffic flows from public (untrusted) through Caddy (your security boundary) to localhost services, or through Tailscale to your private infrastructure.

The Edge Proxy Pattern

Caddy is your only service directly exposed to the internet on ports 80 (HTTP) and 443 (HTTPS). This is a security best practice called the *edge proxy pattern or reverse proxy architecture*.

Why This Pattern?

*Single security boundary:* You implement TLS, rate limiting, and access control in one place (Caddy) rather than in every service.

*Service isolation:* Backend services bind to localhost only. Even if someone discovers a service is running on port 3000, they can't access it directly from the internet—they can only reach it through Caddy.

*Simplified service configuration:* Services don't need to know about TLS certificates, domain names, or public networking. They just serve HTTP on localhost.

*Easy maintenance:* You can take a service down, upgrade it, or replace it entirely without touching public-facing DNS or SSL configuration.

How It Works

1. User makes HTTPS request to https://knot.snek.cc

2. DNS resolves knot.snek.cc to your server's public IP

3. Request arrives at your server on port 443

4. Firewall allows it (port 443 is open)

5. Caddy receives the request

6. Caddy matches the Host header to the knot.snek.cc virtual host configuration

7. Caddy applies rate limiting and security headers

8. Caddy forwards the request to http://127.0.0.1:5555

9. Tangled Knot (listening on localhost:5555) receives the request

10. Knot processes it and returns a response

11. Caddy receives the response and forwards it to the user

The user never communicates directly with Knot. From their perspective, they're talking to Caddy, which happens to proxy to Knot.

Firewall Philosophy

NixOS uses a default-deny firewall model: if you don't explicitly open a port, it's closed. This is the principle of *least privilege*—only expose what's necessary.

Your Firewall Rules

You explicitly open three ports:

*22 (SSH):* Administrative access
*80 (HTTP):* Redirects to HTTPS, but needs to be open to accept redirects
*443 (HTTPS):* All web traffic
*8776 (Radicle):* P2P traffic proxied to your homelab

Everything else is blocked. Even though services are listening on other ports (2583 for PDS, 5555 for Knot, etc.), the firewall prevents external access.

*Why this matters:* If a service accidentally binds to 0.0.0.0 (all interfaces) instead of 127.0.0.1 (localhost), the firewall still protects it. Defense in depth.

Firewall Implementation

Under the hood, NixOS uses iptables (or nftables on newer systems). When you set networking.firewall.allowedTCPPorts, NixOS generates iptables rules.

The rules are organized into chains:

*INPUT:* Traffic destined for this machine
*FORWARD:* Traffic passing through this machine (routing)
*OUTPUT:* Traffic originating from this machine

Your opened ports are in the INPUT chain, allowing external traffic to reach specific services.

IPv6 Considerations

You've disabled IPv6 (networking.enableIPv6 = false). Why?

Your hosting provider has broken IPv6 routing. Attempts to use IPv6 result in timeouts or routing failures. By disabling it entirely, you force all traffic through IPv4, which works correctly.

*Trade-offs:*

*Pro:* Reliable connectivity, no timeout issues
*Con:* No IPv6 support (increasingly important as IPv4 addresses are exhausted)

If your hosting provider fixes IPv6, you could re-enable it. Until then, IPv4-only is the pragmatic choice.

Tailscale: Your Private Mesh Network

Tailscale creates an encrypted mesh VPN using WireGuard. It's the glue that connects your VPS to your homelab and other machines.

What Tailscale Provides

*Encrypted tunnels:* All traffic between your Tailscale nodes is encrypted with WireGuard's modern cryptography (Curve25519, ChaCha20, Poly1305).

*Automatic NAT traversal:* Tailscale uses various techniques to punch through NAT (Network Address Translation), so your homelab (behind a router) can communicate with your VPS (public IP) directly.

*Stable IP addresses:* Each node gets a stable IP in the 100.64.0.0/10 range (CGNAT space). These IPs don't change even if your public IPs do.

*Identity-based access:* Authentication is via your Tailscale account. Only your devices can join your network.

Your Tailscale IPs

*snek (VPS):* Gets an IP like 100.x.x.x
*nixos server:* 100.113.12.42
*homelab (cobh):* 100.112.12.44

These IPs are how your services communicate privately.

Use Case 1: Radicle P2P Proxy

Radicle uses a custom TCP protocol for P2P communication. Your setup:

1. Radicle clients connect to seed.snek.cc:8776

2. Connection arrives at your VPS firewall

3. Firewall allows port 8776

4. Systemd service (radicle-proxy) accepts the connection

5. The proxy uses socat to forward the TCP connection to 100.112.12.44:8776

6. Socat establishes connection to your homelab over Tailscale

7. Traffic flows: Client → VPS → Tailscale → Homelab

*Why this architecture:*

Your homelab isn't directly exposed to the internet (stays behind NAT)
Radicle clients can still reach it via your VPS's stable public IP
The connection is encrypted by Tailscale for the VPS-to-homelab segment

Use Case 2: HTTP Service Proxying

For HTTP services on your homelab or nixos server:

1. User requests https://constellation.snek.cc

2. Caddy receives the request (port 443)

3. Caddy proxies to http://100.113.12.42:6795 (nixos server via Tailscale)

4. Tailscale routes the request through the encrypted tunnel

5. Constellation service on nixos server responds

6. Response flows back through the same path

*Why not just expose constellation publicly?*

Constellation runs on a machine without a public IP (or behind NAT)
You don't want to manage SSL certificates on multiple machines
Centralized rate limiting and access control in Caddy
Single point for logs and monitoring

TCP vs HTTP Proxying

You have two types of proxying:

*TCP proxying (Radicle):*

Raw TCP connection forwarding
Happens at layer 4 (transport layer)
Uses socat or similar tools
Protocol-agnostic—doesn't understand HTTP

*HTTP proxying (Caddy):*

HTTP-specific forwarding
Happens at layer 7 (application layer)
Understands HTTP requests/responses
Can modify headers, paths, etc.

Radicle needs TCP proxying because it's not HTTP. Constellation uses HTTP proxying through Caddy.

Network Security Model

Your security model has three trust zones:

Untrusted (Internet)

Anyone can connect to ports 22, 80, 443, 8776
All traffic is untrusted
Authentication happens at application layer (SSH keys, HTTPS)
Rate limiting protects against abuse

Semi-Trusted (Tailscale)

Only your devices can join this network
Traffic is encrypted
Authentication is via Tailscale identity
Still use service-level authentication where appropriate

Trusted (Localhost)

Only processes on this machine can access
No encryption needed (same machine)
Services trust each other (but still validate inputs)

This layered approach means even if Tailscale is compromised, an attacker still can't access localhost services directly. They'd need to compromise the server itself.

Rate Limiting at Multiple Layers

You implement rate limiting in two places:

*Caddy (Application layer):*

Rate limits HTTP requests
Per-IP tracking
Returns 429 (Too Many Requests) when exceeded
Configured per virtual host

*iptables (Network layer):*

Rate limits TCP connections
Tracks connection attempts
Drops packets silently when exceeded
Used for Radicle P2P port

*Why both?*

Different protocols (HTTP vs raw TCP)
Different granularity (requests vs connections)
Defense in depth

Connection Tracking and Timeouts

Your kernel sysctl settings tune TCP behavior:

net.ipv4.tcp_keepalive_time = 30
net.ipv4.tcp_keepalive_intvl = 10
net.ipv4.tcp_keepalive_probes = 3
}}}

This means:
- After 30 seconds of idle, send a keepalive probe
- If no response, probe again every 10 seconds
- After 3 failed probes (total 50 seconds), consider connection dead

**Why aggressive keepalives?**
- WebSocket connections (firehose) need to stay open
- NAT timeouts can drop idle connections
- Faster detection of dead connections prevents resource waste

== Tailscale Exit Nodes and Subnet Routes ==

You could extend your Tailscale setup:

**Exit nodes:** Route all internet traffic through a specific node (e.g., route through VPS for consistent public IP)

**Subnet routes:** Advertise entire subnets (e.g., your home LAN) so Tailscale nodes can reach any device on that network

**MagicDNS:** Access nodes by name instead of IP (e.g., `ping nixos` instead of `ping 100.113.12.42`)

These aren't configured in your current setup but are available if needed.

== DNS Considerations ==

Your domain `snek.cc` has DNS records pointing to your VPS's public IP. When someone accesses a subdomain:

1. DNS resolver looks up `knot.snek.cc`
2. Gets your VPS IP address
3. Connects to that IP
4. Caddy routes based on Host header

**Wildcard certificates:** Caddy's on-demand TLS handles `*.pds.snek.cc` automatically. No need to manually create DNS records for each user.

== Troubleshooting Network Issues ==

*"Can't reach service":*
1. Check if service is listening: `ss -tlnp | grep PORT`
2. Check firewall: `iptables -L -n | grep PORT`
3. Check Caddy proxy: `curl -v http://localhost:PORT` from server
4. Check from outside: `curl -v https://domain.com` from another machine

*"Tailscale connection failing":*
1. Check Tailscale status: `tailscale status`
2. Verify both nodes are authenticated: `tailscale ping NODE`
3. Check firewall isn't blocking UDP (Tailscale uses UDP for WireGuard)

*"SSL certificate errors":*
1. Check Caddy logs: `journalctl -u caddy -f`
2. Verify domain DNS resolves to this server
3. Check if on-demand TLS endpoint is working
4. Ensure ports 80/443 are accessible (Let's Encrypt needs port 80 for HTTP challenge)

*"Connection timeouts":*
1. Check if IPv6 is causing issues (try disabling)
2. Verify firewall rules allow the traffic
3. Check service is actually running and listening
4. Look at network latency: `ping domain.com`

== The Complete Data Flow ==

Let's trace a complete request to your PDS:

1. **Client** (somewhere on the internet) opens browser
2. Types `https://user.pds.snek.cc/xrpc/app.bsky.actor.getProfile`
3. **DNS** resolves to your VPS IP
4. **Client** initiates TCP connection to your VPS port 443
5. **Firewall** allows (port 443 is open)
6. **Caddy** accepts the connection
7. **Caddy** initiates TLS handshake, obtains certificate for `user.pds.snek.cc` via on-demand TLS
8. **Caddy** proxies to `http://127.0.0.1:2583`
9. **PDS** receives the request
10. **PDS** queries its database for the profile
11. **PDS** returns JSON response
12. **Caddy** forwards response to client
13. **Client** displays the profile

Every step is essential. If any component fails, the request fails. Understanding this flow helps debug issues when they occur.