AlphaSense: A Hot AI Startup Relies on Kubernetes to Support Its Exponential Growth
Named one of the top AI startups by Fortune in 2017, Alphasense experienced exponential growth. “We needed to deliver more features much faster, and make the system more stable and reliable,” says Yuri Bushnev, Technical Product Manager. Teams needed the ability to spin up new AlphaSense environments quickly and easily in order to work on many features simultaneously; those working with AI needed a simpler way to spin up experiments and run models on these environments. “We realized that we need to change our infrastructure because it cannot keep up with the pace of development we need,” adds Bushnev.
The company, which had already broken down most of its monolith into microservices, decided to adopt Kubernetes for orchestration, along with Spinnaker for continuous delivery and Prometheus and Grafana for monitoring and alerting.
Deployment went from an hour to minutes, and releases increased from once to 30+ times a week. Plus, MTTR went from hours to minutes. Reliability increased from 95% SLA to close to 99.9%.
CLOUD TYPEMulti, Public
CHALLENGESAutomation, Reliability, Scaling, Velocity
“From this point, we realized that we needed to be prepared for exponential growth,” says Yuri Bushnev, Technical Product Manager. AlphaSense develops search technology to provide business insights using artificial intelligence, natural language processing, and machine learning, and its goal, he adds, is to “deliver this content to our users with the speed of light.”
But the company’s infrastructure wasn’t quite as fast. “We were always in the cloud in AWS, but at the same time, we weren’t so much cloud native,” Bushnev says.
The challenges they faced were familiar ones: “We needed to deliver more features much faster, and make the system more stable and reliable,” says Bushnev. “We always had a problem when something went down, because nobody had any ideas, and we’d spend an hour to debug.” Plus, teams needed the ability to spin up new AlphaSense environments quickly and easily in order to work on many features simultaneously; those working with AI needed a simpler way to spin up experiments and run models on these environments.
“We realized that we need to change our infrastructure because it cannot keep up with the pace of development we need,” Bushnev says. “At that time, Tommi Ripatti, Head of Applications at AlphaSense, proposed making a drastic but long-term-focused step for our company: to go into the cloud native world by adopting Kubernetes.”
One team at the company did a POC using ECS to solve the problem, but “Kubernetes was a much better choice from a long-term perspective,” says Senior DevOps Engineer Khoa Mai. The ECS project was soon abandoned, and AlphaSense focused all its efforts on adopting Kubernetes instead. “We liked that it was vendor agnostic, just for being able to utilize any cloud we want in the future,” says Bushnev. “And the community is so strong, so you don’t need to worry. It will just work.”
Mai, who joined AlphaSense at the beginning of the cloud native transition, and Ripatti led the main efforts to adopt the new technology. One of the first priorities was to figure out how best to do CI/CD with Kubernetes. AlphaSense chose Spinnaker for the continuous delivery tool and added Prometheus and Grafana in order to get better observability in the system.
The results were just what the team had hoped for. “Velocity has been increased dramatically from the moment when you have your code developed on a local machine to the moment when you release to production,” says Bushnev. Thanks to ongoing improvements in development practices and the adoption of new cloud native technologies, deployments went from hours to minutes, and releases increased from once to 30+ times a week. “During this time, we scaled our teams quite heavily, and we had a lot of dependencies between the teams,” Bushnev points out. “But using cloud native, we can just cut these dependencies between each of the teams, and now anybody can release almost anytime.”
To support AI and machine learning projects that are better suited on different clouds such as GCP, the AlphaSense team is planning to build hybrid cloud solutions — which is enabled by Kubernetes. “Our next target is to allow our developers to use any cloud solution,” says Bushnev. “That’s why we are happy to be on Kubernetes, as everything we’re building lately is targeting a cloud-agnostic nature.”
“We finally see gradual adoption of the DevOps mindset in our company. For example, instead of doing some manual things, now you’re thinking how to automate that. And this kind of mindset is for sure what the Kubernetes transition brings to you.”
— Yuri Bushnev, Technical Product Manager at AlphaSense
Cloud native has also had a positive impact on reliability, which has increased from 95% SLA to close to 99.9%, and MTTR, which went from hours to minutes. That’s largely because the company’s observability story has been greatly improved. “Before, only 10% of our system was really covered by some monitoring and observability,” says Bushnev. “Now we feel that for sure, if anything happens in 85% of our infrastructure, right away we know about it, and right away the proper people are pinged.”
One wow moment was when the team migrated one of the most problematic services from Beanstalk to Kubernetes. “We forgot about this service for a while,” says Bushnev. “After three weeks, we finally had our Prometheus and Grafana setup in place, and we added alerting on top. Right away, we started to get different alerts. It turned out that this service was crashing hundreds of times a day.”
But with Kubernetes in place, they had been blissfully unaware that was happening for those three weeks. “We didn’t even know about that, just because availability was much better,” Bushnev says. “If anything goes down, it goes back up much faster, and that’s why we didn’t even notice it.”
Another benefit for a company with exponential growth is the ease of configuration for all the different tools that teams want to use. “On a weekly basis, we have a need for some new tool,” says Bushnev. “Today it’s Influxdb, tomorrow it’s something for tracing, and then it’s something for machine learning. As soon as we implemented Kubernetes, adding any new tool is just a matter of hours instead of a matter of weeks. Before, you always needed to have somebody who knows exactly where and how the configuration is running. Now we have a unified place where we have configuration for all our tools, so if we want to add new tool, it’s really only about finding the proper Helm chart and deploying it in one command. It’s just as clear as it should be.”
“I think all the challenge when migrating to Kubernetes is actually more about people. We try to let them know how powerful it is and then how can we solve the problem. We work with people to try to open their minds.”
— Khoa Mai, Senior DevOps Engineer at AlphaSense
As a result, it’s not just DevOps engineers who are using these tools now. “Our QA team and our developers are contributing to alerting and monitoring,” says Bushnev. “Anybody can easily contribute.”
In fact, the challenges of the migration had more to do with preparing developers for it. “We try to let them know how powerful it is and then how can we solve the problem,” says Mai. “We work with people to try to open their minds.” The full pipeline can be generated with just one click. The infrastructure team provides a basic template for the Helm chart for bucket management. There’s documentation if teams need customization, and workshops and demos to spread the information.
“When we introduced the template, it looked like magic for them at first,” says Mai. “Now, I think people actually understand it is quite simple, and adoption is much broader.”
Two years in, Bushnev says, “we finally see gradual adoption of the DevOps mindset in our company. For example, instead of doing some manual things, now you’re thinking how to automate that. And this kind of mindset is for sure what the Kubernetes transition brings to you.”