AI/ML Operations Associate
Key Responsibilities
- Deploy supervised and unsupervised models from staging to production, using containerized workflows.
- Build CI/CD pipelines that lint, test, and version models, datasets, and feature stores.
- Instrument robust monitoring—Prometheus, Grafana, custom Python probes—to surface drift, bias, and SLA breaches in real time.
- Automate rollback and canary strategies for zero-downtime releases.
- Debug GPU/CPU bottlenecks; optimize resource allocation across multi-tenant clusters.
- Document run-books, architecture diagrams, and post-mortems for institutional knowledge.
- Collaborate with data scientists, software engineers, and product analysts to tighten feedback loops.
- Advocate security best practices—role-based access control, secrets management, network policies.
Essential Qualifications
- Bachelor’s in Computer Science, Data Engineering, or related discipline.
- 0-2 years’ professional experience (internships count) with Python for data pipelines or backend services.
- Hands-on exposure to Docker images, Kubernetes manifests, and Helm charts.
- Familiarity with at least one cloud platform (AWS, GCP, or Azure) and its ML toolchain.
- Knowledge of Git workflows, unit testing, and basic DevOps principles.
- Solid grasp of machine-learning lifecycle: training, validation, deployment, monitoring.
- Clear written and spoken communication; you explain complex ideas to non-technical stakeholders.
Preferred Extras
- Experience with Kubeflow, MLflow, or Vertex AI.
- Comfort reading TensorFlow or PyTorch code to trace runtime errors.
- Participation in Kaggle competitions or open-source MLOps projects.
- Understanding of data privacy regulations (HIPAA, GDPR, CCPA).