AI/ML Operations Associate

Remotely

Full-time

Key Responsibilities

- Deploy supervised and unsupervised models from staging to production, using containerized workflows.

- Build CI/CD pipelines that lint, test, and version models, datasets, and feature stores.

- Instrument robust monitoring—Prometheus, Grafana, custom Python probes—to surface drift, bias, and SLA breaches in real time.

- Automate rollback and canary strategies for zero-downtime releases.

- Debug GPU/CPU bottlenecks; optimize resource allocation across multi-tenant clusters.

- Document run-books, architecture diagrams, and post-mortems for institutional knowledge.

- Collaborate with data scientists, software engineers, and product analysts to tighten feedback loops.

- Advocate security best practices—role-based access control, secrets management, network policies.

Essential Qualifications

- Bachelor’s in Computer Science, Data Engineering, or related discipline.

- 0-2 years’ professional experience (internships count) with Python for data pipelines or backend services.

- Hands-on exposure to Docker images, Kubernetes manifests, and Helm charts.

- Familiarity with at least one cloud platform (AWS, GCP, or Azure) and its ML toolchain.

- Knowledge of Git workflows, unit testing, and basic DevOps principles.

- Solid grasp of machine-learning lifecycle: training, validation, deployment, monitoring.

- Clear written and spoken communication; you explain complex ideas to non-technical stakeholders.

Preferred Extras

- Experience with Kubeflow, MLflow, or Vertex AI.

- Comfort reading TensorFlow or PyTorch code to trace runtime errors.

- Participation in Kaggle competitions or open-source MLOps projects.

- Understanding of data privacy regulations (HIPAA, GDPR, CCPA).