Consolidating Milvus Across AZs
I migrated a Milvus (standalone) deployment from a three-AZ Kubernetes setup (us-east-1a/b/c) to a single dedicated node in us-east-1b. I preserved all vector data, reduced infra to one m6a.xlarge node, consolidated all PVCs to 1b, and restored four collections with full integrity while untangling a handful of AWS, Kubernetes, Helm, and etcd knots.
This post documents the end-to-end path: decisions, traps, exact commands, and final checks. No fluff.
Context
- Cluster:
cm.montai.k8s.local(kops) - Namespace:
milvus - Milvus: (standalone), deployed via Helm (`zilliztech/milvus )
- Initial pain: PVCs spanned three AZs, forcing nodes in all three to satisfy volume affinity. Standalone Milvus didn’t need multi-AZ.
Goal: One dedicated nodegroup with taints in us-east-1b, with all PVCs in 1b, zero data loss.
Strategy in One Page
- Create a dedicated instance group with taints, pinned to us-east-1b.
- Add
nodeSelectors/tolerationsfor Milvus, etcd, MinIO via Helm values. - Snapshot the PVCs; restore into 1b (EBS volumes can’t cross AZs; snapshots can).
- Repair etcd membership from snapshot using
ETCD_FORCE_NEW_CLUSTER=true. - Bring up MinIO (object storage with vector data), then Milvus.
- Validate collections and segments; clean up.
Design calls:
- Snapshots over cloning to cross AZ boundaries.
- Preserve MinIO (data), rebuild etcd (metadata) from snapshot with
FORCE_NEW_CLUSTER. - Single AZ for simplicity and cost (dev/test trade-off accepted).
What Went Wrong (and How I Fixed It)
1) AZ mismatch blocking scheduling
- Symptom:
volume node affinity conflict. - Cause: Node still in
1a; PVCs bound to1b. - Fix: Move IG to
1b, apply, delete old node(s) to force recreation in1b.
2) StatefulSet PVCs stuck in old AZs
- Reality: PVC zone affinity is immutable.
- Fix: VolumeSnapshot → delete PVC → recreate PVC from snapshot; let CSI bind in
1b.
3) Missing IAM for snapshot restores
- Symptom:
UnauthorizedOperationonec2:CreateVolumefrom snapshot.
Fix: Add:
{ "Effect":"Allow","Action":"ec2:CreateVolume","Resource":"arn:aws:ec2:*:*:snapshot/*" }
Restart EBS CSI controller.
4) etcd membership deadlock after restore
- Symptom:
CrashLoop, “No active endpoints in cluster”. - Cause: Restored data contained old member IPs.
- Fix (disaster recovery):
- Restore only
etcd-0PVC from snapshot. - Scale to 3;
etcd-1/2join fresh.
- Restore only
Start one replica with:
kubectl set env statefulset/milvus-release-etcd \
ETCD_FORCE_NEW_CLUSTER=true ETCD_INITIAL_CLUSTER_STATE=new -n milvuskubectl scale statefulset milvus-release-etcd -n milvus --replicas=15) “Missing collections” scare
- Reality: Milvus stores metadata in etcd and vectors in MinIO.
- Fix: Once etcd metadata was restored from snapshot, Milvus mapped names→IDs and loaded segments. Data intact.
6) PVC selector immutability
- Lesson: Don’t try to patch PVC zone/selector. Use snapshot→recreate. With
WaitForFirstConsumer, node placement determines AZ.
Step-By-Step Execution
Phase 1 - Prep
Dedicated nodegroup (1b, tainted):
kops edit ig milvus --state s3://kops-cm-montai-com-state-store
# Set subnets: [us-east-1b], taint: milvus.io/node=cpu:NoSchedule
kops update cluster cm.montai.k8s.local --state s3://... --yesHelm values with selectors/tolerations (Milvus/etcd/MinIO):
# /tmp/milvus-migration-values.yaml
standalone: { nodeSelector: { kops.k8s.io/instancegroup: milvus }, tolerations: [{ key: milvus.io/node, operator: Equal, value: cpu, effect: NoSchedule }] }
etcd: { nodeSelector: { kops.k8s.io/instancegroup: milvus }, tolerations: [{ key: milvus.io/node, operator: Equal, value: cpu, effect: NoSchedule }], replicaCount: 3 }
minio: { nodeSelector: { kops.k8s.io/instancegroup: milvus }, tolerations: [{ key: milvus.io/node, operator: Equal, value: cpu, effect: NoSchedule }], replicaCount: 4, persistence: { size: 500Gi } }
Create VolumeSnapshots:
kubectl apply -f VolumeSnapshotClass(ebs.csi.aws.com)kubectl apply -f snapshots for etcd-0, etcd-2, minio-0, minio-1kubectl wait volumesnapshot/<name> -n milvus --for=jsonpath='{.status.readyToUse}'=true --timeout=300s
Phase 2 - IAM
- Add
ec2:CreateVolumeonarn:aws:ec2:*:*:snapshot/*; restart EBS CSI controller.
Phase 3 - Scale down & delete old PVCs
kubectl scale sts milvus-release-etcd -n milvus --replicas=0
kubectl scale sts milvus-release-minio -n milvus --replicas=0
kubectl delete deploy milvus-release-standalone -n milvus
kubectl delete pvc data-milvus-release-etcd-{0,2} export-milvus-release-minio-{0,1} -n milvus
Phase 4 - Restore PVCs into 1b
# Recreate PVCs from snapshots (no selector); CSI will place in 1b
kubectl apply -f restored-PVCs.yaml
Phase 5 - Lock IG to 1b & replace nodes
kops edit ig milvus # ensure only us-east-1b
kops update cluster --yes
kubectl delete node <nodes in 1a/1c>
Phase 6 - Bring up MinIO
kubectl scale sts milvus-release-minio -n milvus --replicas=4
kubectl wait -n milvus -l app.kubernetes.io/name=minio --for=condition=Ready pod --timeout=300s
Phase 7 - etcd recovery
kubectl delete pvc data-milvus-release-etcd-1 -n milvus
kubectl set env sts/milvus-release-etcd ETCD_FORCE_NEW_CLUSTER=true ETCD_INITIAL_CLUSTER_STATE=new -n milvus
kubectl scale sts milvus-release-etcd -n milvus --replicas=1
kubectl wait pod/milvus-release-etcd-0 -n milvus --for=condition=Ready --timeout=120s
kubectl delete pvc data-milvus-release-etcd-2 -n milvus
kubectl scale sts milvus-release-etcd -n milvus --replicas=3
Phase 8 - Start Milvus
helm upgrade milvus-release zilliztech/milvus -n milvus --reuse-values -f /tmp/milvus-migration-values.yaml
kubectl wait -n milvus -l app.kubernetes.io/name=milvus --for=condition=Ready pod --timeout=300s
Phase 9 - Cleanup extra nodes
- Verify all pods on the single node; cordon/drain/delete any stragglers.
Validation Checklist
Infra:
kubectl get nodes -l kops.k8s.io/instancegroup=milvus -o custom-columns='NODE:.metadata.name,ZONE:.metadata.labels.topology\.kubernetes\.io/zone,STATUS:.status.conditions[-1].type'
kubectl get pods -n milvus -o wide
for pvc in $(kubectl get pvc -n milvus -o jsonpath='{.items[*].metadata.name}'); do
vol=$(kubectl get pvc $pvc -n milvus -o jsonpath='{.spec.volumeName}')
zone=$(kubectl get pv $vol -o jsonpath='{.spec.nodeAffinity.required.nodeSelectorTerms[0].matchExpressions[0].values[0]}')
echo "$pvc | $zone"
done
# Expect all zones = us-east-1b
Milvus health & data:
kubectl get pods -n milvus
kubectl logs -l app.kubernetes.io/name=milvus -n milvus --tail=200 | grep -i "Auditor loaded segment metadata"
kubectl port-forward -n milvus svc/milvus-release 19530:19530 &
python - <<'EOF'
from pymilvus import connections, utility
connections.connect(host="localhost", port="19530")
print(utility.list_collections())
EOF
pkill -f "port-forward.*milvus"
MinIO contents (sanity):
kubectl exec -it milvus-release-minio-0 -n milvus -- ls /export/milvus-bucket/file/index_files/
Lessons You Can Reuse
Kubernetes
- Snapshot first. It’s the only sane way to cross AZs with EBS.
- With CSI
WaitForFirstConsumer, node placement → AZ. Don’t fight PVC immutability. - Use values files for Helm upgrades; avoid subchart auth traps.
AWS
- EBS volumes don’t cross AZs; snapshots do (within region).
- IAM for CSI is granular: creating a volume from a snapshot needs explicit rights.
Distributed Milvus
- In Milvus: MinIO = data, etcd = metadata. Protect MinIO PVCs; snapshot etcd.
- etcd DR: Start one restored member with
ETCD_FORCE_NEW_CLUSTER=true, then scale.
Final State
- Single node (m6a.xlarge) in us-east-1b, tainted and isolated.
- All PVCs consolidated in 1b (2,070 Gi total).
- Services: Milvus standalone, etcd (3), MinIO (4).
- Data: Four collections, 12 segments, full integrity.
Optional Cleanup & Monitoring
Watch stability and resources:
kubectl top node
kubectl top pods -n milvus
kubectl logs -l app=milvus-release -n milvus --since=24h | grep -i error
Remove snapshots if policy allows:
kubectl delete volumesnapshot -n milvus snapshot-etcd-{0,2} snapshot-minio-{0,1}
Appendix - Minimal IAM Addition for Snapshot Restore
{
"Effect": "Allow",
"Action": "ec2:CreateVolume",
"Resource": "arn:aws:ec2:*:*:snapshot/*"
}
Add to your EBS CSI controller role, then restart the controller.
Outcome: single-AZ Milvus, clean scheduling, lower cost, no data loss, reproducible steps.