Location: District 7, HCMC
Working hours: Mon – Fri (9AM – 6PM)
Get to Know the Team:
The ML Platform team empowers teams across the company to harness the power of machine learning. We're building cutting-edge tools and infrastructure to drive innovation and automation throughout the company.
Get to know the Role:
As a DevOps Engineer in the ML Platform team, you will contribute to the creation and maintenance of our machine learning infrastructure. This is a heavily Infra/SRE based role, embedded in an ML Platform team. You will be supporting us in maintaining, upgrading and improving our infrastructure and providing support.
The Day-to-Day Activities:
- Deliver high-quality AI infrastructure solutions: You will work with the Machine Leaning Platform team to design and develop the infrastructure to support distributed data processing and model training. You will utilize GitOps to ensure the reproducibility of the system's cloud infrastructure on different Kubernetes clusters.
- Develop observability solutions for Machine Learning pipelines: You will be responsible for developing and integrating monitoring and alerting within the company’s monitoring stack powered by Datadog, Prometheus, and Grafana. You will also contribute to the creation of runbooks and DevOps guides.