Work Experience - Google/Alphabet (Zürich)
Manager of Gmail SRE Team - Zürich shard
Dec. 2021 - Today
In short: I manage 50% of the Gmail
SRE team (11 to 17 engineers). My team is holding the pager for Gmail, a multi-billion daily active users product, from 7 AM to 7 PM CET. My team is trained to lead the response during large outages, to learn from them and to improve the infrastructure to prevent recurrence.
Team Contribution: Sharp drop (-55%) of Gmail's major and huge outages over the last 3 years. Under the hood, we silently migrated Gmail from a monolithic architecture to our cutting-edge "microservices" frameworks and collaborated on [one of] the world's largest database migrations. These huge changes rolled out smoothly despite 20 years of technical debt (
Youtube video). Successful launch of Gmail's AI features (Help me write, Summarize, ...)
Personal contribution: Got the team back on track and productive after stressful reorgs and layoffs. Successful team culture change to embrace change vs. reluctance/conservatism. Cross team conflict resolution, roadmap alignment and staffing commitments, including VP-level escalations when needed. Salvaged and landed a couple of critical projects that were stuck and required a structured approach (microservice migration and statefulness removal). Grew 5 senior leads including junior managers including handoffs of large chunks of responsibility.
Manager of Google Calendar SRE Team - Zürich shard
Oct. 2019 - Dec. 2021
In short: Google Calendar has slightly less daily active users compared to Gmail but is still one of the worl'd most critical business tools.
I took the lead of the Zürich SRE team in 2019 (6 Engineers)in a tense context right after the worldwide 4-hour
outage. This outage triggered a large Google-wide effort to refocus on reliability as a fundamental.
We not only managed to sustainably stabilize Calendar but we also migrated this old product to google's cutting edge infrastructure, tools and frameworks.
Personal contribution: My contributions were very similar to what I did for Gmail. In fact, Workspace director asked me to lead the Gmail SRE team when after they observed how Calendar's team and product health improved under my lead.
Lead of Google Workspace Capacity Planning Program
Mar. 2021 - Sept. 2024
In short: When I was managing Calendar SRE, I noticed that the capacity planning stack needed a "refresh". It was wasteful and the processes were both manual and erroro prone. With my TechLead, we cao-authored a vision document and started a project to automate forecasts and capacity changes and add elasticity to the footprint to gain in efficiency.
Over the next 3 years, the project evolved to encompass all of Google Workspace products. It had up-to 4 concurrent workstreams led by either peer managers or senior SREs totalling more than 30 contributors. Today, this project has been deployed to 82% of Workspace's binaries ad landed multiple tens of M$ of sustainable/yearly savings. I handed over the project leadership to my TL in Sept. 2024.
Work Experience - OVH Cloud (Paris)
Engineering Director
Dec. 2017 - Today
OVH Cloud
is an European cloud computing leader that offers VPS, dedicated servers and other web services.
As of 2018, OVH has 27 datacenters in 19 countries hosting 300,000 servers.
My role is to improve the internal Information System by fostering best practices through tooling.
Observability Team Manager
In short: Setup and operate an internal Logs/Metrics/Traces platform. First step is to integrate and automate an incident management solution
OpsGenie to be able to onboard about 1000 users.
Team Contribution: Formal team requirements gathering. Terraform plugin for OpsGenie to be able to manage the tool in infrastructure-as-code mode.
Personal contribution: Hiring. Internal marketing and help with team requirements gathering. Proof of concepts. Commercial negotiations with service providers. Depracation roadmaps for the tools we're replacing.
Keywords: SaaS integration, Terraform, Golang, RFP.
CI/CD Team Manager
In short: Leading a team of 5 engineers to develop and operate
CDS: An Enterprise-Grade Continuous Delivery & DevOps Automation Open Source Platform.
Team Contribution: CDS is the main tool used internally at OVH to build, test and deploy. In August 2018, CDS was running a few thousands builds a day.
Personal contribution: Improved team communication with the rest of the company. Better documentation & better visibility outside of the company (
meetup).
Keywords: Golang, Testing, CI/CD.
Urban Planning Team Manager
In short: Leading a team of 2 software architects to draw a map of the microservices information system at OVH and gamify its continuous improvement. To do so, we've designed, developed and open-sourced
Lhasa.
Team Contribution: Lhasa allowed us to publish an up-to-date map of the microservices at OVH. We did not reach the gamification part. The project has been paused.
Personal contribution: Actively participated in the design and development. I've also worked on the internal promotion of the tool (blog posts, meetings, etc.).
Keywords: Golang, REST API, Urban Planning, Impact Analysis, Gamification.
Work Experience - Scality (Paris)
Engineering Director
Sept. 2015 - Dec 2017
Scality
is a global market leader in Distributed File
Systems and Object Storage according to both
Gartner
and
IDC.
My role was to lead a couple of Python development teams (11 engineers).
Core Engineering Team Manager
In short: The supervisor is the command-and-control center of the
storage cluster.
In 2016, The team has decided to revamp this component, switching from a
traditional WebUI to a REST API.
Team Contribution: The RING v6 delivery contained the live
monitoring tool and the most useful API routes.
Personal contribution: Reshaped the team to face this technical
challenge.
Keywords: Python, Elasticsearch , Grafana, Swagger, SaltStack.
Release Engineering Team Manager
In short:
I have created the Release Engineering team in october 2015
to tackle serious delivery issues that the engineering
organization was facing at that time. The team's purpose:
streamline the delivery process and move the engineering
organization from a waterfall model to
Continuous Delivery.
Team Contribution:
- Bert-E: a novel merging model and gatekeeper bot to replace the
former manual merge
process. This was a game changer. We have written an
ACM paper about it
and
open-sourced
it.
- Eve: the build service executes up-to 200 builds/day. It is a
layer of code on top of buildbot to add support for
pipeline-as-code.
Personal contribution:
- Infrastructure-as-code mindset shift.
- Test automation and stabilization.
- Data-driven engineering organization (KPI dashboards).
- Promotion of the core principles of "agility".
Results: The QA phase has (almost) been removed. The
inter-delivery delay has decreased from 6 months to about 6 weeks while
the number of features has been significantly increased.
Engineering Council Member
The engineering council's was a temporary organization (Jan. to
Sept. 2016 ). Its purpose was to drive the whole
engineering organization temporarily fulfilling the VP of engineering's
role.
Results: On schedule delivery of the RING v6 LTS version, 50+
features.
Work Experience - Olfeo (Paris)
Quality Assurance Team Leader
Apr. 2015 - Sept. 2015
Olfeo
is the French market leading URL filtering solution (protocol filtering,
network antivirus, proxying and detailed activity logging and reporting).
The mission consisted in leading a small team to design and
build a fully-automated continuous delivery pipeline.
Skills Acquired : Linux/Debian, Python,
Selenium, Vagrant, libvirt, Docker, git, Scrum, DevOps, management
training.
Work Experience - Ucopia Communications (Paris)
Performance & Quality Assurance Team Leader
Oct. 2013 - April 2015
Ucopia
Communications is the French market leader in network
controllers
and WiFi guest access.
My role was to lead a small QA team to
implement performance benchmarks and automate functional tests.
Benchmarking: imagining and implementing
realistic stress tests that both scale out and scale up.
Such tests will detect performance bottelnecks and ensure that the
product will comply with the growing customer architectures sizes
and
complexities. At the end, were able to simulate several tens of
thousands users interacting with an Ucopia Controller Cluster.
Automating tests: In our race to a
continuous delivery process, we've found that the task of creating
and
maintaining automated tests
is extremely time-consuming. This contrasts with the need for
software updates and the increasing system complexity. For that
matter, I imagined
and developed a library based on (Python +
Selenium
Webdriver + VMware) that makes
the task of creating and maintaining automated tests a breeze even
for newbies.
Quality metrics: We've setteled and
followed the evolution of concrete metrics that allow to measure the
software
quality. These metrics
are extremely useful when it comes to predicting delivery dates or
making strategic decisions.
more >>>
Junior then Senior R&D Engineer
Sept. 2010 - Dec. 2013
I've worked on a wide variety of projects such as
MySQL optimization, Debian packaging, frontend, performance and
scalability.
Database & LDAP expert
I have been involved in every project implying advanced MySQL
design and fine-tuning. My works to ensure the scalability of
the database system for the largest clusters as well as the
efforts produced to design an autonomous database
(self-monitoring, self-repairing) were highly appreciated by
cusomers.
I have also implemented a similar solution to allow an OpenLDAP
system to be more faul-tolerant and scalable. These developments
allowed to validate the product for large deployments (e.g.,
EDF, Stade de France)
Software architect
I started, and vigorously defended, the use of MVC/MVT pattens
and unit testing to enhance the readability and the
maintainability of the source code.
I am especially keen on on software best practices (DRY,
loose coupling, etc.,).
Versatile developer
I used Python, PHP, java and shell on a daily basis. Less often,
I had to maintain a C and PERL code base.
Linux/Debian specialist
The product is based on Debian. I nearly worked with every
aspect of this distribution : distributing software as DEB
packages, overcoming the system limitations and scalability
hurdles (C10K
problem) , etc.
Web frontend designer
I also lead several frontend projects. I used to master
HTML5/CSS3/JQuery, responsive designs.
Performance & scalability referent
During the last couple of years, I used Python and gevent to
develop Ucopia Labs: A high performance asynchronous
networking testbed.
It is used to simulate tens on thousands of devices
authenticating and accessing the web through the Ucopia Acces
Controller.
I also supervised a trainee who developed a visualization tool
of the simulator data using Django & jQuery Mobile (see
scrennshots).
These tools were of a great help when we had to evaluate the
behavior of our product under high load.
Ucopia Labs
more >>>