Jobify’s mission is to build the most efficient tool for machine learning and computer vision teams to turn assets into high quality training data. We aim to develop a performant, efficient, and beautifully designed browser interface served to a global workforce.
We at Jobify are looking to hire a Senior Operations Engineer with the below qualifications:
-
4+ years of relevant experience in an Operations, SRE or DevOps role
-
Experience with modern Linux systems and running services in production
-
Experience managing infrastructure in a major public cloud (AWS, GCP, Azure)
-
Experience with Kubernetes or other container orchestration systems
-
Experience with and an understanding of complex distributed systems
-
Experience with database technologies such as PostgreSQL, MySQL, or other RDBMS
-
Experience with other open source technologies such as Redis, Elasticsearch, and RabbitMQ
-
Experience working under Agile / Scrum methodologies
-
Shell scripting or Python skills
Job Description
-
Executing and actioning on an infrastructure roadmap, collaborating with team members across engineering, product, and design
-
Taking ownership of Operations Engineering as the first official Operations Engineer
-
Responding to day- to-day interrupt requests from business, support, and software development teams
-
Supporting developers and support engineers in solving blocking issues
-
Writing and maintaining documentation and playbooks for common or known problems
-
Acting as a mentor to support engineers to help them become more self-sufficient in their day-to-day responsibilities
-
Deploying, maintaining and troubleshooting instances in development and production environments, both in the cloud and on- premises
-
Performing root cause analysis and providing potential solutions when problems arise
-
Working closely with DevOps Engineers and SREs to develop permanent solutions to problems
-
Working with database technologies such as PostgreSQL, MySQL, or other RDBMS
-
Working with other open source technologies such as Redis, Elasticsearch, and RabbitMQ
-
Identifying and measuring key performance metrics for our infrastructure and defining service-level objectives (SLOs)
Bonus
-
Log and metrics collection experience using tools such as ElasticStack, Datadog, and others
-
Experience with automation tools and technologies such as Terraform, Helm, etc
-
Coding skills in languages such as Java or Golang
-
Experience with SOC 2, FedRAMP, HIPAA, and other compliance-related programs
-
Experience managing multiple Kubernetes clusters / clusters spanning multiple cloud providers
-
Advanced knowledge of infrastructure management in GCP