Sally Roth
sally@sallyroth.dev
Professional Summary
Seasoned Site Reliability Engineer with 15+ years of experience in DevOps, SRE, and infrastructure/platform engineering. Proven track record designing and maintaining large-scale, highly available cloud systems for companies including GoDaddy, Ripple, Oracle, Auth0, and GitHub. Expert in AWS and multi-cloud environments, automation (Terraform, Kubernetes, CI/CD pipelines), and observability tools (Prometheus, Grafana, Datadog) to ensure reliable and secure services. Strong leadership in cross-team collaboration and incident management, with a passion for improving developer experience through robust internal platforms and automation.
Skills
- Cloud & Platforms: AWS (EC2, S3, Fargate, CloudWatch), Google Cloud Platform, Microsoft Azure
- Containers & Orchestration: Docker, Kubernetes
- Infrastructure as Code & Configuration: Terraform, CloudFormation, Puppet, Chef, SaltStack
- CI/CD & Automation: Jenkins, GitHub Actions, GitLab CI, Slack (ChatOps), Capistrano, Hubot
- Monitoring & Observability: Prometheus, Grafana, ELK Stack (Elasticsearch/Logstash/Kibana), Loki, Datadog, Sumo Logic, Splunk
- Programming & Scripting: Python, Ruby (Rails), Go, Bash/Shell
- Databases: MySQL, MongoDB
- Operating Systems & Tools: Linux (Ubuntu, CentOS), Git, Vault, Okta, LDAP
Work Experience
GoDaddy
Principal Site Reliability Engineer (Nov 2022 – Present)
- Develop and maintain highly available Infrastructure as Code for GoDaddy's high-volume SMTP API platform, ensuring reliability and scalability.
- Leverage a hybrid cloud environment with AWS services (e.g., Fargate, Kinesis) and on-premise resources to build a secure, well-architected application stack (Rails, Node.js, MySQL).
- Implement and enhance CI/CD pipelines using GitHub Actions to automate deployments and infrastructure updates, improving deployment speed and consistency.
- Collaborate with development teams to embed robust security and observability (metrics, logging, tracing) into the platform, following best practices for cloud architecture.
Ripple
Staff Technical Operations Engineer (Sep 2019 – Nov 2022)
- Led the design, deployment, and maintenance of a multi-cloud observability platform (AWS and GCP) using Kubernetes, implementing monitoring and logging tools (Prometheus, Grafana, Loki) with custom automation in Go, Python, and Bash. This ensured comprehensive visibility across all services and reduced alert fatigue.
- Consulted and educated development teams on instrumentation best practices for building observable and maintainable systems, improving application telemetry and reliability.
- Consolidated and administered multiple SaaS monitoring tools - Grafana Cloud, Amazon CloudWatch, Datadog, Sumo Logic - into a unified observability stack, streamlining alerting and reducing tool overlap.
- Led a major RBAC redesign for all customer-facing infrastructure, refactoring a large legacy access management stack (Terraform, HashiCorp Vault, LDAP, Okta) to improve security and simplify user provisioning.
- Served as an escalation point in the SRE on-call rotation, diagnosing and resolving incidents on mission-critical services. Improved incident response and prevention, helping achieve high uptime and reliability.
Oracle Corporation
Site Reliability Engineer (May 2017 – Sep 2019)
- Built and automated a wide range of developer infrastructure tools and provisioning systems for Oracle Data Cloud using Python, Ruby, Shell scripting and configuration management (Chef) on CentOS, ensuring high availability and scalability.
- Designed and launched an internal Platform-as-a-Service (PaaS) for development teams, incorporating Prometheus for monitoring, an ELK stack for centralized logging, and GitLab for source control/CI. This platform improved deployment consistency and developer productivity.
- Onboarded multiple engineering teams to the new PaaS, guiding them in deploying their applications on the platform and adopting best practices for reliability and performance.
- Monitored and managed operations at massive scale (tens of thousands of nodes), troubleshooting and preventing service interruptions across diverse services as part of the core infrastructure team.
Auth0
Production Engineer (Jun 2016 – May 2017)
- Implemented a chatops-driven CI/CD pipeline using Jenkins (integrated with Slack and Hubot) to automate deployments on AWS and Azure, enabling rapid and reliable release cycles.
- Provisioned and managed scalable MongoDB database clusters in AWS and Azure, using Terraform for infrastructure as code and SaltStack for configuration management to ensure consistency across environments.
- Developed monitoring dashboards and alerting solutions using Datadog and Kibana, improving visibility into system performance and reducing mean time to detect issues.
- Participated in 24/7 on-call rotations and resolved production incidents, quickly diagnosing issues to minimize downtime and ensure service continuity.
GoDaddy
Systems Engineer (Feb 2015 – Jun 2016)
- Designed and built a ChatOps-driven CI/CD pipeline (Jenkins, Slack, Hubot, Capistrano, Rails on CentOS) to support the launch of a new GoDaddy Email Marketing product, enabling developers to deploy and test features rapidly.
- Automated server provisioning and configuration with Puppet for the product's launch, ensuring consistent environments and maintaining high uptime post-launch.
- Developed a staged deployment pipeline for configuration management code and log aggregation across 4,000+ legacy MySQL instances, utilizing Jenkins, shell scripts, Puppet, Python, and Elasticsearch to improve manageability of a large server fleet.
- Collaborated with database and operations teams to normalize configurations on legacy servers via Puppet, reducing configuration drift and operational errors.
GitHub, Inc.
Email Infrastructure Engineer ("Mail Guru") (Oct 2013 - Dec 2014)
- Engineered scalable, highly available email infrastructure as part of the Operations team, using Puppet for automation, Ubuntu Linux administration, and custom scripts and tools (Ruby on Rails, Bash) to support GitHub's email services.
- Implemented and managed bulk email delivery systems (PowerMTA and Postfix), delivering reliable email functionality for multiple internal applications. Worked closely with cross-functional teams (sales, marketing, training) to tailor email solutions to their needs, utilizing internal infrastructure and external Email Service Providers as appropriate.
- Investigated and resolved customer-facing email issues (e.g. webhook or notification problems) by analyzing logs with Splunk and internal tools, improving the email delivery success rate and user satisfaction.
Mad Mimi, LLC
Chief of Email Infrastructure & Delivery (Nov 2009 - Oct 2013)
- Led email deliverability and anti-abuse efforts for a marketing email platform, designing anti-abuse tools and strategies to prevent spam and manage sending reputation. Managed a team of two specialists to enforce best practices and maintain high deliverability rates.
- Managed and optimized a high-volume email infrastructure (Ruby on Rails application with PowerMTA and Postfix servers) to reliably send bulk emails to customers, scaling to meet growth while maintaining performance.
- Built internal analytics tools in Ruby on Rails for the anti-abuse and customer support teams, enabling better monitoring of email sending patterns and quicker response to issues.
- Collaborated with clients and Internet Service Providers to coordinate responsible bulk email sending practices, resolving deliverability challenges and ensuring compliance with ISP policies.
Education
Walla Walla University - B.S.E., Electrical Engineering concentration, Minor in Mathematics (‘09)
Additional Information
- Languages: Native English; Proficient in Spanish (B1); Basic Croatian (A2)
- Public Speaking: Presented at Ruby and DevOps meetups and international conferences, including DevOps DC, PuppetConf 2015-2017, ScaleConf Colombia, RubyFuza (South Africa), and SendGrid Training events
- Community Leadership: Founded an Engineers Without Borders chapter at Walla Walla University, and founded the Phoenix Puppet Users Group to foster local DevOps community engagement
- Media & Recognition: Interviewed in Developer Hegemony by Erik Dietrich, and featured on tech podcasts "Developer on Fire" and "Hanging Out With Mad Mimi"
- Travel & Culture: Ultra-minimalist traveler who has visited 16 countries and lived in 4, often with just a 20-liter backpack
- Music & Arts: 16 years of classical training in piano, flute, voice, and oboe
- Athletics: Certified USA Weightlifting Level 2 Coach and 8-time USAW Open Series qualifier (59kg/ 64kg class); NASM Certified Personal Trainer; Catalyst Weightlifting Level 2 & CrossFit Level 2 Trainer; competitive powerlifter and strongwoman