Photo of Rayene

Rayene BEN RAYANA

Technical Team Manager ● Site Reliability Engineer (SRE) ● CI/CD Expert ● Large Scale System Designer ● Ph.D. in MobileIPv6 Networking ● Change Management at Scale ● Rusty Pythonist

Note: You are reading the printed version of https://rayene.benrayana.net. Please visit that page for updates and further details.

  Currently looking for a mission requiring strong leadership skills to drive technically challenging changes. I can bring to your business a valuable experience resulting from a perfect academic & scientific record as well as 15 years of professional experience improving some of the world's largest products and most skilled teams.


  • Zurich, Paris or full remote
  • Open to frequent travels
  • Open to entrepreneurship
  • Flexible hours

Work Experience - Google/Alphabet (Zürich)

Manager of Gmail SRE Team - Zürich shard

Dec. 2021 - Today

In short: I manage 50% of the Gmail SRE team (11 to 17 engineers). My team is holding the pager for Gmail, a multi-billion daily active users product, from 7 AM to 7 PM CET. My team is trained to lead the response during large outages, to learn from them and to improve the infrastructure to prevent recurrence.
Team Contribution: Sharp drop (-55%) of Gmail's major and huge outages over the last 3 years. Under the hood, we silently migrated Gmail from a monolithic architecture to our cutting-edge "microservices" frameworks and collaborated on [one of] the world's largest database migrations. These huge changes rolled out smoothly despite 20 years of technical debt (Youtube video). Successful launch of Gmail's AI features (Help me write, Summarize, ...)
Personal contribution: Got the team back on track and productive after stressful reorgs and layoffs. Successful team culture change to embrace change vs. reluctance/conservatism. Cross team conflict resolution, roadmap alignment and staffing commitments, including VP-level escalations when needed. Salvaged and landed a couple of critical projects that were stuck and required a structured approach (microservice migration and statefulness removal). Grew 5 senior leads including junior managers including handoffs of large chunks of responsibility.

Manager of Google Calendar SRE Team - Zürich shard

Oct. 2019 - Dec. 2021

In short: Google Calendar has slightly less daily active users compared to Gmail but is still one of the worl'd most critical business tools. I took the lead of the Zürich SRE team in 2019 (6 Engineers)in a tense context right after the worldwide 4-hour outage. This outage triggered a large Google-wide effort to refocus on reliability as a fundamental. We not only managed to sustainably stabilize Calendar but we also migrated this old product to google's cutting edge infrastructure, tools and frameworks.
Personal contribution: My contributions were very similar to what I did for Gmail. In fact, Workspace director asked me to lead the Gmail SRE team when after they observed how Calendar's team and product health improved under my lead.

Lead of Google Workspace Capacity Planning Program

Mar. 2021 - Sept. 2024

In short: When I was managing Calendar SRE, I noticed that the capacity planning stack needed a "refresh". It was wasteful and the processes were both manual and erroro prone. With my TechLead, we cao-authored a vision document and started a project to automate forecasts and capacity changes and add elasticity to the footprint to gain in efficiency. Over the next 3 years, the project evolved to encompass all of Google Workspace products. It had up-to 4 concurrent workstreams led by either peer managers or senior SREs totalling more than 30 contributors. Today, this project has been deployed to 82% of Workspace's binaries ad landed multiple tens of M$ of sustainable/yearly savings. I handed over the project leadership to my TL in Sept. 2024.

Work Experience - OVH Cloud (Paris)

Engineering Director

Dec. 2017 - Today

OVH Cloud is an European cloud computing leader that offers VPS, dedicated servers and other web services. As of 2018, OVH has 27 datacenters in 19 countries hosting 300,000 servers.

My role is to improve the internal Information System by fostering best practices through tooling.

Observability Team Manager
In short: Setup and operate an internal Logs/Metrics/Traces platform. First step is to integrate and automate an incident management solution OpsGenie to be able to onboard about 1000 users.
Team Contribution: Formal team requirements gathering. Terraform plugin for OpsGenie to be able to manage the tool in infrastructure-as-code mode.
Personal contribution: Hiring. Internal marketing and help with team requirements gathering. Proof of concepts. Commercial negotiations with service providers. Depracation roadmaps for the tools we're replacing.
Keywords: SaaS integration, Terraform, Golang, RFP.
CI/CD Team Manager
In short: Leading a team of 5 engineers to develop and operate CDS: An Enterprise-Grade Continuous Delivery & DevOps Automation Open Source Platform.
Team Contribution: CDS is the main tool used internally at OVH to build, test and deploy. In August 2018, CDS was running a few thousands builds a day.
Personal contribution: Improved team communication with the rest of the company. Better documentation & better visibility outside of the company (meetup).
Keywords: Golang, Testing, CI/CD.
Urban Planning Team Manager
In short: Leading a team of 2 software architects to draw a map of the microservices information system at OVH and gamify its continuous improvement. To do so, we've designed, developed and open-sourced Lhasa.
Team Contribution: Lhasa allowed us to publish an up-to-date map of the microservices at OVH. We did not reach the gamification part. The project has been paused.
Personal contribution: Actively participated in the design and development. I've also worked on the internal promotion of the tool (blog posts, meetings, etc.).
Keywords: Golang, REST API, Urban Planning, Impact Analysis, Gamification.

Work Experience - Scality (Paris)

Engineering Director

Sept. 2015 - Dec 2017

Scality is a global market leader in Distributed File Systems and Object Storage according to both Gartner and IDC.

My role was to lead a couple of Python development teams (11 engineers).

Core Engineering Team Manager
In short: The supervisor is the command-and-control center of the storage cluster. In 2016, The team has decided to revamp this component, switching from a traditional WebUI to a REST API.
Team Contribution: The RING v6 delivery contained the live monitoring tool and the most useful API routes.
Personal contribution: Reshaped the team to face this technical challenge.
Keywords: Python, Elasticsearch , Grafana, Swagger, SaltStack.
Release Engineering Team Manager
In short: I have created the Release Engineering team in october 2015 to tackle serious delivery issues that the engineering organization was facing at that time. The team's purpose: streamline the delivery process and move the engineering organization from a waterfall model to Continuous Delivery.
Team Contribution:
  • Bert-E: a novel merging model and gatekeeper bot to replace the former manual merge process. This was a game changer. We have written an ACM paper about it and open-sourced it.
  • Eve: the build service executes up-to 200 builds/day. It is a layer of code on top of buildbot to add support for pipeline-as-code.
Personal contribution:
  • Infrastructure-as-code mindset shift.
  • Test automation and stabilization.
  • Data-driven engineering organization (KPI dashboards).
  • Promotion of the core principles of "agility".
Results: The QA phase has (almost) been removed. The inter-delivery delay has decreased from 6 months to about 6 weeks while the number of features has been significantly increased.
Engineering Council Member The engineering council's was a temporary organization (Jan. to Sept. 2016 ). Its purpose was to drive the whole engineering organization temporarily fulfilling the VP of engineering's role.
Results: On schedule delivery of the RING v6 LTS version, 50+ features.

Work Experience - Olfeo (Paris)

Quality Assurance Team Leader

Apr. 2015 - Sept. 2015

Olfeo is the French market leading URL filtering solution (protocol filtering, network antivirus, proxying and detailed activity logging and reporting).

The mission consisted in leading a small team to design and build a fully-automated continuous delivery pipeline.

Skills Acquired : Linux/Debian, Python, Selenium, Vagrant, libvirt, Docker, git, Scrum, DevOps, management training.

Work Experience - Ucopia Communications (Paris)

Performance & Quality Assurance Team Leader

Oct. 2013 - April 2015

Ucopia Communications is the French market leader in network controllers and WiFi guest access. My role was to lead a small QA team to implement performance benchmarks and automate functional tests.


Benchmarking: imagining and implementing realistic stress tests that both scale out and scale up. Such tests will detect performance bottelnecks and ensure that the product will comply with the growing customer architectures sizes and complexities. At the end, were able to simulate several tens of thousands users interacting with an Ucopia Controller Cluster.
Automating tests: In our race to a continuous delivery process, we've found that the task of creating and maintaining automated tests is extremely time-consuming. This contrasts with the need for software updates and the increasing system complexity. For that matter, I imagined and developed a library based on (Python + Selenium Webdriver + VMware) that makes the task of creating and maintaining automated tests a breeze even for newbies.
Quality metrics: We've setteled and followed the evolution of concrete metrics that allow to measure the software quality. These metrics are extremely useful when it comes to predicting delivery dates or making strategic decisions.

Junior then Senior R&D Engineer

Sept. 2010 - Dec. 2013

I've worked on a wide variety of projects such as MySQL optimization, Debian packaging, frontend, performance and scalability.

Database & LDAP expert

I have been involved in every project implying advanced MySQL design and fine-tuning. My works to ensure the scalability of the database system for the largest clusters as well as the efforts produced to design an autonomous database (self-monitoring, self-repairing) were highly appreciated by cusomers. I have also implemented a similar solution to allow an OpenLDAP system to be more faul-tolerant and scalable. These developments allowed to validate the product for large deployments (e.g., EDF, Stade de France)

Software architect

I started, and vigorously defended, the use of MVC/MVT pattens and unit testing to enhance the readability and the maintainability of the source code. I am especially keen on on software best practices (DRY, loose coupling, etc.,).

Versatile developer

I used Python, PHP, java and shell on a daily basis. Less often, I had to maintain a C and PERL code base.

Linux/Debian specialist

The product is based on Debian. I nearly worked with every aspect of this distribution : distributing software as DEB packages, overcoming the system limitations and scalability hurdles (C10K problem) , etc.

Web frontend designer

I also lead several frontend projects. I used to master HTML5/CSS3/JQuery, responsive designs.

Performance & scalability referent

During the last couple of years, I used Python and gevent to develop Ucopia Labs: A high performance asynchronous networking testbed. It is used to simulate tens on thousands of devices authenticating and accessing the web through the Ucopia Acces Controller. I also supervised a trainee who developed a visualization tool of the simulator data using Django & jQuery Mobile (see scrennshots). These tools were of a great help when we had to evaluate the behavior of our product under high load.

Ucopia Labs

Education - Ph.D.

Ph.D. in Mobile IPv6 Networking

Telecom-Bretagne (now IMT), France 2006 - 2009

Distinction: Highly Honorable equiv. Cum Laude.

In short: Designing and implementing a smart algorithm capable of dispatching the network traffic of a vehicle throughout different wireless technologies (WiFi/WiMax/3G) taking into account their availability and cost in addition to the importance of the network flows to be conveyed.

An interaction layer with the application level allowed to ask applications to reduce their bandwidth usage whenever the latter becomes scarce or too expensive (e.g., stop video stream, change bitrate, delay email download, etc.,)." To validate the results and demonstrate the algorithm's capabilities, I developed an open source network simulator/emulator, NetPyLab (see screencast below). This was my first experience with Python.

full abstract

Project partners : Orange R&D , Thalès Communications

Skills acquired : Linux firewalling suite (netfilter), C coding for the core algorithm, Python for the network emulator, improved communication and English skills, ...

Publications : I have written some interesting papers and a book chapter.

Education - Master of Science

Master of Science, Networking

ENSI/Telecom-Bretagne (Now IMT), France2005 - 2006

Awards : The results of my research during this Master have been patented by SFR. I am a co-author of the patent EP1940119.

In short : Designing and implementing an algorithm to dynamically select the best compression method for a network flow based on its contents and on the network characteristics.

Education - Engineering in Computer Science

Engineer's degree

ENSI, Tunisia 2002 - 2005

The school : ENSI is the second most reputable engineering school in Tunisia. It has an excellent training course and hard-access conditions.

Awards : Ranked the 1st out of 250 students during the first two years. Ranked the 2nd during the third year.

Skills

Important Note: The star ratings below are partially obsolete. They are a snapshot of my self-evaluation of my skills in 2019 (before joining Google). As a senior manager at Google, despite being a regular oncaller, I did not have the opportunity to be as hands-on as I used to be. I decided to keep these self-ratings anyway as they give useful hints about my pre-Google experience.

Networking Skills

TCP/IP-IPv6 stack

Netfilter (iptables, ipset)

Diag. tools (wireshark, etc.,)

Developer Skills

Python

Golang

PHP

Java

C/C++

Shell

HTML(5), CSS(3)

Javascript & Angular

Git

Virtualization Skills

VirtualBox

VMware

Vagrant

Docker / Kubernetes

Openstack

KVM/QEMU/libvirt

Data Storage Skills

MySQL

PostgreSQL

OpenLDAP

NoSQL ( Redis - Memcached )

Spoken Languages

French

English

Spanish

Arabic

Personal Information

Birthdate January 4, 1982

Contact Me

Phone +33 (0)6.29.98.76.05

Email rayene@benrayana.net

Social

LinkedIn