The European Organization for Nuclear Research (CERN) relies on open source technology to process the massive amount of data generated by the Large Hadron Collider. ATLAS (A Toroidal LHC ApparatuS, shown) is a general-purpose detector for probing fundamental particles.
CERN needs little introduction. CERN created the World Wide Web (WWW) and the Large Hadron Collider (LHC), the world’s largest particle accelerator, which discovered the Higgs boson. Tim Bell, who is responsible for the organization’s IT operating systems and infrastructure, says his team’s goal is “to provide computing facilities for 13,000 physicists worldwide to analyze these collisions and understand what the universe is made of and how it works.”
CERN is conducting hard-core scientific research, especially the Large Hadron Collider, which generates massive amounts of data when running. “CERN currently stores about 200 petabytes of data, and when the accelerator is running, more than 10 petabytes of data are generated each month. This inevitably brings enormous challenges to the computing infrastructure, including storing large amounts of data and being able to process data within a reasonable time frame, which puts great pressure on networks, storage technologies, and efficient computing architectures,” Bell said.

Tim Bell, CERN
The scale of the Large Hadron Collider’s operation and the amount of data it generates present severe challenges, but CERN is no stranger to these issues. CERN was founded in 1954 and has been around for over 60 years. “We’ve always faced insurmountable computing power challenges, but we’ve been working with the open source community to solve these problems,” Bell said. “Even in the 90s, when we invented the World Wide Web, we wanted to share it with people so they could benefit from CERN’s research, and open source was the perfect tool for doing that.”
Using OpenStack and CentOS
Today, CERN is a deep user of OpenStack, and Bell is a member of the OpenStack Foundation’s board of directors. But CERN predates OpenStack, and over the years, they have been using various open source technologies to provide services through Linux servers.
“Over the past decade, we’ve found that instead of solving problems ourselves, it’s better to find upstream open source communities facing similar challenges and collaborate with them, then contribute to these projects together, rather than creating and maintaining everything ourselves,” Bell said.
A good example is Linux itself. CERN was once a customer of Red Hat Enterprise Linux. In fact, as early as 2004, they collaborated with Fermilab to create their own Linux distribution called Scientific Linux. Eventually, they realized that since they weren’t modifying the kernel, it didn’t make sense to spend time building their own distribution, so they migrated to CentOS. Since CentOS is a fully open source and community-driven project, CERN can collaborate with the project and contribute to CentOS’s build and distribution.
CERN helps CentOS provide infrastructure, and they also organize CentOS DoJo events (CentOS Dojo is a one-day event that brings together people from the CentOS community to share system administration, best practices, and emerging technologies), where engineers can gather to jointly improve CentOS packaging.
In addition to OpenStack and CentOS, CERN is also a deep user of other open source projects, including Puppet for configuration management, Grafana and InfluxDB for monitoring, and so on.
“We collaborate with about 170 laboratories worldwide. So whenever we find improvements to an open source project, other laboratories can easily adopt them,” Bell said. “At the same time, we also learn from other projects. When large-scale deployments like eBay and Rackspace improve the scalability of solutions, we benefit from that and can also scale up.”
Solving Real-World Problems
Around 2012, CERN was researching how to scale computing power for the Large Hadron Collider, but the challenge was people, not technology. CERN has a fixed number of employees. “We had to find a way to scale computing power without requiring a lot of additional people to manage it,” Bell said. “OpenStack provided us with an automated API-driven and software-defined infrastructure.” OpenStack also helped CERN examine issues related to service delivery and then automate them without increasing staff.
“We currently run about 280,000 processor cores and 7,000 servers in two data centers in Geneva and Budapest. We’re using software-defined infrastructure to automate everything, which allows us to continue adding more servers while keeping the number of employees constant,” Bell said.
Over time, CERN will face even greater challenges. The Large Hadron Collider has a blueprint until 2035, including some important upgrades. “Our accelerator runs for three to four years, then we spend 18 months or two years upgrading the infrastructure. During this maintenance period, we do some computing power planning,” Bell said. CERN also plans to upgrade to the High-Luminosity Large Hadron Collider, which will allow for higher luminosity beams. Compared to the current scale of CERN, the upgrade means computing needs will increase by about 60 times.
“According to Moore’s Law, we may only be able to meet a quarter of the demand, so we must find ways to scale computing power and storage infrastructure accordingly, and finding automation and solutions like OpenStack will help with this,” Bell said.
“When we started using the Large Hadron Collider and looked at how we provide computing power, it was clear that we couldn’t fit everything into CERN’s data centers, so we designed a distributed grid structure: CERN at the center and a cascade structure around it,” Bell said. “There are about 12 large Tier 1 data centers worldwide, followed by 150 smaller universities and laboratories. They collect samples from the Large Hadron Collider’s data to help physicists understand and analyze the data.”
This structure means CERN is engaged in international cooperation, with hundreds of countries working to analyze this data. It boils down to a fundamental principle: open source is not just about sharing code, but also about collaboration between people and knowledge sharing to achieve goals that individuals, organizations, or companies cannot achieve alone. This is the Higgs boson of the open source world.
Reprinted with permission: Developer Relations »