HubSpot

To support ‘immense scaling,’ HubSpot turns to Vitess for sharding and automation

Challenge

HubSpot has seen enormous growth since it was founded in 2006; today it serves more than 64,500 customers in over 100 countries. Behind the scenes, that meant that “we had to deal with larger databases, larger loads, bigger amounts of data,” says Mali Akmanalp, a Senior Software Engineer on the Vitess Infrastructure Team. “We needed to shard our databases, and sharding MySQL manually is extremely difficult.” In mid-2016, HubSpot engineers began looking for another solution.

Solution

“We didn’t want to make any huge radical changes that would be impactful to the whole business and slow down product engineering,” says Senior Software Engineer Alex Charis. “We wanted something that would be similar to what we already had so we could leverage our MySQL operational knowledge, but was also battle-hardened.” Vitess fulfilled these requirements for HubSpot, which was also adopting Kubernetes.

Impact

Developers can now “press a button and get a database within minutes,” says Charis; before, it would take days. Upgrades for the whole cluster went from days to two hours. With Vitess automation, downtime caused by an impairment or a crash is measured in seconds rather than minutes. Plus, “we’ve been able to almost double our production load and the number of databases [from about 400 MySQL clusters to 700] while keeping the size of the team relatively static, between 3 and 5 people,” says Leo Lin, Technical Lead of the Vitess Infrastructure Team.

Challenges:

Automation, Scaling

Industry:

Software

Location:

United States

Cloud Type:

Public

Product Type:

Installer

Published:

November 8, 2019

Projects used

By the numbers

Upgrades

Went from days to
two hours

Downtime

Now lasts seconds instead of minutes

Doubled production load and number of databases with same team size

A company that offers software for inbound marketing, sales, and customer success software, HubSpot has seen enormous growth since it was founded by two MIT grad students in 2006.

It went public eight years later, and today serves more than 64,500 customers in over 100 countries.

Behind the scenes, that meant that “the company went through an immense scaling process,” says Mali Akmanalp, a Senior Software Engineer on the four-member Vitess Infrastructure Team. “We had to deal with larger databases, larger loads, bigger amounts of data. We needed to shard our databases, and sharding MySQL manually is extremely difficult.”

To try to improve the process, another team built their own custom code for sharding, but they “ran into a lot of trouble,” Akmanalp says. “It required a lot of expertise, and live resharding was very difficult.”

By mid-2016, “we realized that the current trajectory of relational data at HubSpot is not sustainable,” says Senior Software Engineer Alex Charis. HubSpot engineers began looking for another solution. There were certain requirements, he adds: “We didn’t want to make any radical changes that would be impactful to the whole business and slow down product engineering. We wanted something that would be similar to what we already had so we could leverage our MySQL operational knowledge, but was also battle-hardened.”

Vitess fit the bill. “We liked that it was open source and that it was battle-tested at YouTube,” Charis says. Also appealing was the fact that all the intra-Vitess transport and the transport from Java microservices into Vitess are on gRPC. “That means that we can secure it via TLS, and for a publicly traded company with Sarbanes-Oxley and SOC2 compliance concerns, that’s a powerful security story,” he adds.

Plus, another group was concurrently considering adopting Kubernetes, and both teams were excited about how the two technologies would work together. “We thought, ‘Well, Kubernetes definitely works for stateless services, but can we run our data storage systems on it as well?’ We went after it, and the results have been pretty positive,” Charis says.

From the adoption process on, the Kubernetes team and the Vitess team worked hand-in-hand, even sitting next to each other in the office. With the decision-making and development so aligned, “Vitess has paved the way for us to unify all of our data storage infrastructure and our microservice infrastructure onto Kubernetes, and it’s giving us a blueprint for what the rest of our data stores might look like on Kubernetes,” says Charis. “That’s been a great win for us as an infrastructure team.”

“Vitess has paved the way for us to unify all of our data storage infrastructure and our microservices infrastructure onto Kubernetes, and it’s giving us a blueprint for what the rest of our data stores might look like on Kubernetes. That’s been a great win for us as an infrastructure team.”
— ALEX CHARIS, SENIOR SOFTWARE ENGINEER AT HUBSPOT

Charis points out that HubSpot was such an early adopter of Vitess that the team had to do quite a lot of work to make the technology work within the organization. They tweaked their set of backend libraries to make them compatible with Vitess, and also invested time on improving Vitess itself, including the SQL dialect. They wrote the first ever Vitess operator, which has been in production use for two years.

At HubSpot, Vitess has, of course, had a great impact on sharding. “Horizontally scaling MySQL without this kind of middleware is a real pain,” says Charis. Developers can now “press a button and get a database within minutes. Before, somebody filed a Jira and then one of the engineers on the team would have to run a manual Ansible play to setup the cluster when they had time. Maybe it takes a couple of days, maybe there’s a weekend in there. Now it’s literally less than 10 minutes and you’ve got a fully functioning database that you can address with whatever code that you’re trying to prototype.”

Because Vitess makes it so much easier, “people are a lot less hesitant with sharding,” says Leo Lin, Technical Lead of the Vitess Infrastructure Team. “Previously, if you have to wait half a month to get a new database, you might say, even though this is a bad idea, ‘Let me just mix these two applications’ data on one single database just because I don’t want to wait around.’ With Vitess, we’re seeing more databases created, and it’s a good thing we’re really empowering the product engineers to decide where these logical vertical shards begin on their own without really having to worry about the long turnaround time for the Jira.”

While sharding is what Vitess is most commonly used for, the HubSpot team has also leveraged it for operational scalability. “Vitess gives us a lot of tools and gives us a platform for us to build our own tools to be able to automate away a bunch of the daily difficulties of being a data infrastructure engineer,” says Charis. “Since we’re such a small team, we don’t have every moment to deal with every single individual problem that’s happening. As much as we can automate that stuff the better.”

“We’ve been able to almost double our production load and the number of databases while keeping the size of the infrastructure team relatively static, between 3 and 5 people.”
— LEO LIN, TECHNICAL LEAD OF THE VITESS INFRASTRUCTURE TEAM AT HUBSPOT

For instance, deployments and version upgrades are a whole lot easier. “Right now, we press a button and it takes two hours to roll out a new Vitess version across all the clusters that we have,” says Lin. “It’s mostly a pretty straightforward, hands-off process, versus the much more involved manual procedure we had previously that would take days.”

Before Vitess, an impairment or crash would mean an on-call engineer had to be paged to log in and execute an Ansible play to do a failover, which could result in 5-10 minutes of downtime. With Vitess, such downtime “is measured in seconds,” says Akmanalp. “In the last few months, we’ve had 50-100 master impairments, which previously would have been difficult and unthinkable. Now it gets dealt with automatically.”

The bottom line: “We’ve been able to almost double our production load and the number of databases [from about 400 MySQL clusters to 700] while keeping the size of the infrastructure team relatively static, between 3 and 5 people,” says Lin.

Looking ahead, the HubSpot team is working on horizontal sharding, having so far focused mostly on vertical sharding. Right now, the process is still very hands-on. Ultimately, the goal is to have it fully automated, with a built-in solution for managing schema changes across shards. (It’s something they’re working on using Vitess and GitHub’s open source project gh-ost.) “We’ll have a UI for our product teams to say, ‘Oh, I want to increase this database from two shards to four shards,’’ says Charis. “And just a couple clicks of a button and then they get some notification later that it’s done. We want to give product engineers a way to use it with ease.”