Building a complex NixOS configuration from scratch isn't about writing 700 lines of code and deploying it all at once. It's about starting small, getting something working, then adding complexity layer by layer. Each layer should be tested and verified before moving to the next.
This approach has several advantages:
*Immediate feedback:* You know immediately if something works
*Easier debugging:* When something breaks, you know exactly what changed
*Learning at each step:* You understand each component before adding the next
*Safety:* A broken intermediate step is easier to recover from than a broken final system
*Motivation:* Seeing working results keeps you motivated to continue
Before writing any configuration, you need:
*A machine to configure:* A VPS, a VM, or physical hardware with NixOS installed
*Basic access:* SSH working with key authentication
*A domain name:* DNS pointing to your server
*A plan:* Know what services you eventually want to run
You don't need the full plan immediately. Start with the core infrastructure, then add services one by one.
Your first goal is the smallest possible working configuration:
*What to configure:*
*Why so minimal:*
*Testing:*
1. Build: nixos-rebuild build --flake .#snek
2. Switch: nixos-rebuild switch --flake .#snek
3. Verify: Can you still SSH in? Check systemctl status shows no failed services
*Rollback plan:* If this breaks, you can boot from the previous generation in GRUB, or use the hosting provider's console to fix things.
Once the base system works, add your edge proxy:
*What to configure:*
*Why this next:*
*Testing:*
1. Rebuild and switch
2. Create /var/www/snek.cc/index.html with some content
3. Visit https://snek.cc in a browser
4. Verify: Do you see your content? Is the certificate valid (green lock)?
*Common issues:*
dig)Before adding services that need secrets, set up sops-nix:
*What to configure:*
.sops.yaml with your age key*Why before services:*
*Testing:*
1. Create secrets/test.yaml with a dummy secret
2. Edit with sops secrets/test.yaml, verify you can encrypt/decrypt
3. Configure NixOS to use that secret
4. Deploy and verify secret appears in /run/secrets/
Now add your first real service. Choose one that's:
Good candidates:
*What to configure:*
*Testing:*
1. Rebuild with nixos-rebuild test first (temporary, non-persistent)
2. Check service status: systemctl status service-name
3. Check logs: journalctl -u service-name -f
4. Test functionality: Make a request to the service
5. If all good, switch permanently: nixos-rebuild switch
*Verification checklist:*
Repeat Phase 4 for each service:
*Order matters:* Consider dependencies:
*One at a time:* Don't add multiple new services in one rebuild. If something breaks, you won't know which service caused it.
*Verify before adding next:* Make sure the current service is fully working before moving on.
*Build testing:* nixos-rebuild build
*Test activation:* nixos-rebuild test
*Switch activation:* nixos-rebuild switch
test*Use build when:*
*Use test when:*
*Use switch when:*
test and everything worksNixOS generations are your safety net. Every switch creates a new generation.
*To see generations:*
nixos-rebuild list-generations
}}}
**To rollback immediately:**
nixos-rebuild switch --rollback
}}}
*To boot a previous generation:*
switch to make it permanent, or troubleshoot*When to rollback:*
When a service isn't working, create the simplest possible configuration that should work:
When something breaks:
1. Check service logs: journalctl -u service-name -f
2. Check system logs: journalctl -xe
3. Check Caddy logs: journalctl -u caddy
90% of problems are obvious in the logs.
Don't assume the whole chain works. Test each link:
1. Is the service running? systemctl status
2. Is it listening on the right port? ss -tlnp | grep PORT
3. Can you reach it locally? curl http://localhost:PORT
4. Can you reach it through Caddy? curl https://domain.com
5. Can you reach it from outside? Test from another machine
Wherever the chain breaks, that's your problem.
Use git to track changes:
This lets you diff configurations and understand exactly what changed.
When something doesn't work:
1. *Don't panic:* You can always rollback
2. *Check status:* What's failing? systemctl --failed
3. *Read logs:* What do the logs say? journalctl -u service
4. *Simplify:* Remove complexity, get back to a known state
5. *Test individually:* Test each component separately
6. *Search:* Look up error messages, check documentation
7. *Ask:* If stuck, ask for help with specific error messages
8. *Document:* When you solve it, write down what you learned
As you add more services:
*Watch resource usage:*
top, htopfree -hdf -h*Signs you need to scale:*
*Scaling strategies:*
Don't wait until the end to document:
This guide is that documentation—written as the system is built, explaining the reasoning behind each choice.
Before considering a service "done":
Building this system isn't a one-time task—it's an ongoing process:
*Services evolve:* You'll update configurations, add features, optimize
*Nixpkgs updates:* You'll run nix flake update to get security patches
*Requirements change:* You'll add new services, remove old ones
*Learning continues:* You'll discover better ways to do things
The incremental approach isn't just for the initial build—it's how you maintain and evolve the system over time. Small, tested changes are safer than big bang updates.
Embrace the process. Each working service is a victory. Each problem solved is knowledge gained. The goal isn't just a working system, but understanding how and why it works.