Small Team, Big Impact: UC Merced’s Research Computing Journey

Located in California’s Central Valley, the University of California, Merced is the newest campus in the UC system and is rapidly growing toward R1 research status. CaRCC spoke with Sarvani Chadalapaka, Director of Cyber Infrastructure and Research Technologies (CIRT) at UC Merced, to learn how her team supports computational research while making HPC accessible to a diverse student population. The following Q&A has been edited for brevity and clarity.

Can you tell us about CIRT and how it got started?

The program started with our faculty receiving an NSF Major Research Instrumentation (MRI) grant funding for HPC clusters. It was decided that this unit would report to the Office of Information Technology while closely collaborating with the Office of Research.

The research computing team has always been small. It started about eight years ago with one director and one systems administrator. Over that time, our team has grown to four full-time employees plus two or more student technology consultants.

We have a faculty governance committee called the Committee on Research Computing (CORC), which includes faculty representatives from all three schools at UC Merced. The charter for CORC is to govern research computing policy and steer research computing priorities. This faculty group also acts as liaisons with their peers within their departments and with faculty senate committees.

What kinds of services and support does CIRT provide?

Over the past couple of years, we’ve focused on becoming fiscally sustainable by creating multiple revenue streams. We offer baseline services at no charge to users, such as HPC account creation, general research computing consultation, HPC office hours, and installation of software packages on the cluster.

We’ve also established services where we do charge. These fall into two categories: pay-per-use services (our chargeback or recharge model), where we charge for core hours and personal time for special projects; and service level agreements (SLAs), which involve purchasing condo storage, condo compute nodes, and annual maintenance fees.

My team manages high-performance computing, the Wide Area Visualization Environment, the Science DMZ research network, and JupyterHub infrastructure that’s used for data science curriculum delivery. JupyterHub infrastructure is past the  pilot phase, and currently supports approximately 500 users and more than 7 Data Science courses. 

Do you offer training or workshops for your researcher clients?

Yes, we have weekly HPC and JupyterHub office hours. We provide Carpentry sessions in joint consultation with the libraries, and we offer ad hoc research computing training. If a researcher wants their research group to learn more about HPC, we provide consultations tailored to their research problems.

We give them a starting point with an introduction to HPC and show them how to navigate around the cluster. During office hours, we offer some level of support to optimize workflows or parallelize code, though it’s not highly structured yet. We also maintain an HPC wiki page and a documentation GitHub page to share information.

Who are your clients across campus?

We are a central service provider serving all schools and researchers at UC Merced. Not surprisingly, we see more demand for HPC systems from the natural sciences and School of Engineering than from humanists. We support humanists too, but that requires different types of consultations beyond just HPC.

Beyond the campus community, we recently received a new NSF CyberTeam for Research Computing and Data regional computer grant. We’re collaborating with California State Universities in the underserved Central Valley region. We share compute resources through the Open Science Grid (OSG).  CENVAL-ARC (Central Valley Accessible Research and Computational Hub) is an NSF-funded regional compute CC* grant designed to expand access to advanced research computing resources across California’s Central Valley. Recognizing the region’s limited access to large-scale computing infrastructure, CENVAL-ARC brings together institutions such as UC Merced, CSU Sacramento, CSU Stanislaus, and CSU Fresno to share resources, expertise, and training opportunities.

How did you implement the transition to a chargeback model?

It’s not been an easy transition, but faculty understand that this is a necessary service that needs to continue operating. When we had ad hoc funding—what I call “hat in hand funding”—a head node would go down, and we would have to ask different units for funding.

A major concern from faculty was that they couldn’t switch to a fee-based model because they hadn’t included these costs in their grants. So I talked to the provost’s office and secured funding to offset costs for faculty for one year. Starting this July, we’ll actually begin charging faculty members.

In the meantime, we’ve done extensive outreach. We’ve created templates showing how faculty can include these services in their grant funding proposals, including sample budget sheets. We offer consultation services for anyone who wants to use our resources but doesn’t know how to include them in a funding proposal. With these measures in place, I’m confident the transition will go well.

For SLA (Service Level Agreement) services like annual maintenance charges for condo purchases, we provide cost breakdowns to help faculty understand there are annual costs to maintaining these servers. It’s a shift in perception that we’re working on. I have many meetings to educate faculty about why charging is important, and I leverage the faculty committee (CORC) so they can talk to their peers about it.

We’re also trying to use our usage data to show departments the value and see if we can get some cost offsets from them, reducing the burden on individual faculty while ensuring we have the funding to continue operations.

How many people are involved in delivering services at CIRT?

We are a small team. Besides myself, we have a research facilitator, an HPC systems administrator, and a research and instruction systems integration engineer who specializes in JupyterHub. These four are full-time employees. We also have two student technology consultants right now, but we’re planning to grow that area of our team.

How does the student program work?

We get funding for student technology consultants from the IT department, but it’s becoming competitive, and we’re feeling the pinch of not getting the number of hours we need. We’re tapping into different funding sources, including fellowship opportunities the campus provides for students.

There’s a Student Success Internship Program funded by the campus rather than IT, which allows us to work with graduate students on special projects like one-time analysis tasks. We’ve also partnered with schools to work with their graduate students who are already heavy users of the cluster. They help us refine scripts and processes, similar to how teaching assistants would work during the summer.

What other providers on or off campus does your team work with?

We work with libraries to offer Carpentry trainings and data management consultations with faculty. Since I’m part of IT, it’s easy for me to work with the information security officer for high-security data consultations. I also consult with faculty on cloud resources, serving as a translator between them and the Amazon Web Services (AWS) resources they need. We also work with data compliance teams and the Office of Research.

We’re trying to educate our researchers about using ACCESS, and our team is closely involved with CaRCC and ACCESS working groups, but we don’t yet have many users utilizing national resources.

What would you say your program does really well?

We provide good HPC service to our faculty members through HPC core hours and storage. Faculty know we’re the primary point of contact for anything computational. Not just faculty—we’ve established working relationships with procurement too, so if faculty members submit orders for huge servers, procurement knows to connect us with them. Enough of our campus community knows who we are, what we do, and where to find us.

How do you keep your small team organized with so many different responsibilities?

Over the last year, we all worked on defining our mission, vision, standard operating processes, who we are, what we do for campus, and how we hold each other accountable. We use Slack as a communication channel between our teams with daily check-ins about what we’re working on.

We have weekly team meetings and monthly team lunches at various campus destinations—we find a cool, shady spot to sit, talk, and work. These activities have been helpful in keeping up with the team and jumping in where help is needed.

How do you set priorities for CIRT?

Our institution has a strategic plan with the ambitious goal of becoming an R1 institution by 2030. A lot of what we do directly serves this strategic goal, making it easy to align our work with institutional priorities.

We also fall under the IT umbrella, where priorities include fiscal sustainability and transparency. The chargeback model and revenue generation align with those IT goals.

To set priorities, I work on the technical aspects with the team, get input from CORC about what faculty needs are, and align our goals with IT’s broader strategy. This collaborative approach helps us determine what to focus on for the next year.

We’ve spent the past two years building automation and infrastructure that allows us to focus on current priorities without constant maintenance concerns. We’ve consolidated clusters into a federated Slurm environment, updated older hardware to reduce outages, and created documentation and templates for our training offerings. Now we can focus on process improvements and fine-tuning our service level agreements.

How often do you meet with your governance groups?

With faculty Committee on Research Computing (CORC), we just decided to move from a monthly cadence to a quarterly cadence based on where we are in our operations. With the IT teams, I meet weekly with other IT leads, including networking, security, and service leads. I report to the Vice Chancellor of Information Technology and he  convenes anIT cabinet meeting every two weeks where we strategize and seek support from the peer IT teams

With the Office of Research or the Office of the Provost, I meet by invitation, typically once a quarter or every six months, especially with recent leadership changes including a new provost.

What would you like to see in CIRT’s future?

My plan is to lower the barriers to HPC, especially for the population we serve. UC Merced has many first-generation college students and serves underrepresented populations. I want them to feel comfortable using an HPC environment and leave our training with a level of confidence they didn’t have coming in.

I want us to be a leader in providing computational research services and training, not just for UC Merced but for the whole Central Valley region of California.

What advice would you give to new leaders of research computing teams about making difficult decisions?

Having faculty governance is important—having them back you up in whatever decision you make is crucial. When faculty make many requests, the way to prioritize is to show them the cost of ownership for whatever they’re asking for. Once you show the cost, faculty are quick to identify what they truly need versus what would just be nice to have.

I’d also recommend establishing relationships early on. As someone new, you have the advantage of being able to reach out to key players—libraries, research offices, provost offices, departments, key users—and get them on your side sooner rather than later.

What’s a success story you’re proud of?

This year alone, we’ve received two big NSF grants, one as PI and one as co-PI, in addition to everything else we’ve managed operationally. That represents significant work.

I’m also proud of reaching our current level of operational maturity. A few years ago, we had head nodes failing and maintenance entering a question mark phase because we didn’t have funding to order new hardware. Now we’re thinking about sustainability, exploration, and discovery, which seemed like a luxury compared to where we were.

The fact that we’re the only department on campus that received a brand new staff position allocation to support the data science program with JupyterHub infrastructure in the last year makes me extremely happy about how things are progressing.