AI/ML Operations Associate

Remotely
Full-time

Key Responsibilities  

- Deploy supervised and unsupervised models from staging to production, using containerized workflows.  

- Build CI/CD pipelines that lint, test, and version models, datasets, and feature stores.  

- Instrument robust monitoring—Prometheus, Grafana, custom Python probes—to surface drift, bias, and SLA breaches in real time.  

- Automate rollback and canary strategies for zero-downtime releases.  

- Debug GPU/CPU bottlenecks; optimize resource allocation across multi-tenant clusters.  

- Document run-books, architecture diagrams, and post-mortems for institutional knowledge.  

- Collaborate with data scientists, software engineers, and product analysts to tighten feedback loops.  

- Advocate security best practices—role-based access control, secrets management, network policies.  


Essential Qualifications  

- Bachelor’s in Computer Science, Data Engineering, or related discipline.  

- 0-2 years’ professional experience (internships count) with Python for data pipelines or backend services.  

- Hands-on exposure to Docker images, Kubernetes manifests, and Helm charts.  

- Familiarity with at least one cloud platform (AWS, GCP, or Azure) and its ML toolchain.  

- Knowledge of Git workflows, unit testing, and basic DevOps principles.  

- Solid grasp of machine-learning lifecycle: training, validation, deployment, monitoring.  

- Clear written and spoken communication; you explain complex ideas to non-technical stakeholders.  


Preferred Extras  

- Experience with Kubeflow, MLflow, or Vertex AI.  

- Comfort reading TensorFlow or PyTorch code to trace runtime errors.  

- Participation in Kaggle competitions or open-source MLOps projects.  

- Understanding of data privacy regulations (HIPAA, GDPR, CCPA).