RCD Program Story: The College of New Jersey

The Power of One: How TCNJ’s ELSA Cluster Bridges the Gap Between Classroom and Research

Shawn Sivy gives a tour of the ELSA Cluster to students.

Located in Ewing, New Jersey, The College of New Jersey (TCNJ) is a highly ranked public college known for its rigorous undergraduate programs and commitment to student-faculty research. While primarily a teaching-focused institution with approximately 7,000 students, TCNJ was recently designated as a Research College and University (RCU) in the Carnegie Classification framework, reflecting its average annual research expenditure of approximately $2.7 million. Central to this research mission is the Electronic Laboratory for Science and Analysis (ELSA) High-Performance Computing (HPC) cluster.

CaRCC spoke with Shawn Sivy, HPC System Administrator at TCNJ, to learn how a “one-man show” supports a diverse research community while preparing undergraduates for the modern STEM workforce.

The following Q&A has been edited for brevity and clarity.


Shawn Sivy, ELSA cluster HPC System Administrator, providing advanced computing support to the TSC/TCNJ community for more than 30 years.

How did research computing move from isolated faculty labs to a centralized resource at TCNJ?

Shawn: Prior to 2016, research computing at TCNJ existed in separate silos. Faculty in Chemistry might have had their own mini-cluster for computational chemistry, while others relied on high-end workstations that were sometimes supported by central Information Technology (IT) and sometimes not.

The real catalyst for change came in 2014, when the College received funds from the 2012 New Jersey Building Our Future Bond Act (S2500). This allowed for the construction of a new STEM building, which included a modest 400-square-foot space designed to house up to eight server racks. It was a moment of incredible foresight by our dean at the time; he saw ten years ago that this type of computing was going to be an absolute requirement for undergraduate students.

While some of the funding went toward new servers, the bulk of our initial nodes actually came from a donation of decommissioned systems from Linode, a cloud-computing platform. I was hired into the HPC system administrator role in 2016. At the time, I was the director of networking and technical services in TCNJ’s central IT department. I had over 25 years of Unix and Linux experience and had worked closely with science faculty before, so I was able to step in as both the systems person and the research computing facilitator.

You’ve mentioned that ELSA is a major tool for faculty recruitment. How does an HPC cluster help a Primarily Undergraduate Institution (PUI) attract top talent?

Shawn: We use it as a primary hiring mechanism. When we are interviewing potential faculty, we can tell them, “Yes, we are an undergraduate institution, but we have an HPC cluster with these specific CPU and GPU resources.” It has allowed us to hire faculty with exceptional computational needs who might otherwise have gone to a large Research 1 (R1) university. By bringing those researchers to campus, we create opportunities for our students—often as early as their sophomore year—to participate in research that they simply wouldn’t have access to elsewhere.

Where does your role fit organizationally, and how does that help you navigate a ‘one-person show’? 

Shawn: I report directly to the Dean of the School of Science. This reporting structure is vital for a program of our size. While I work closely with the central IT division for networking and security, working under the Dean means I am directly aligned with the research and pedagogical mission of the faculty. I’m not seen as just another “IT cost” but as a facilitator of the science itself. When a faculty member has a grant deadline, I can pivot quickly because I understand the academic urgency in a way that a centralized, ticket-based IT system might not.

Students learning about TCNJ’s ELSA cluster.

What does “end-to-end service” look like when you are the only person managing the program?

Shawn: It means I wear every hat. On the hardware side, I handle the break-fix tasks, upgrades, and future design planning. I also coordinate the “facilities” side—cooling, power, and Uninterruptible Power Supply (UPS) systems. On the software side, I manage the Operating System (OS) patches, OpenHPC updates, and the Slurm workload manager.

The most important part, though, is the facilitation. I provide training on how to access the cluster and help faculty get their code running efficiently. I often install software centrally for them or help them configure it in their own accounts. I have to know just enough about their specific research—whether it’s astrophysics or genetics—to help them optimize their code, while relying on the faculty to be the subject matter experts in their fields.

Beyond just technical instructions, you also teach what you call “resource ethics” to students. Why is that part of the training?

Shawn: When we do training, I focus on the “do’s and don’ts” of being a good cluster citizen. We show students how to estimate the proper resources for a job. For example, if a student requests 100 CPUs but their code only uses two, I’ll show them the post-mortem job statistics: “You used one CPU at 100%, but the other 99 were idle.”

We also teach them about Input/Output (I/O) optimization—when to use “scratch” storage versus their home directory. At an institution like TCNJ, where we have 68 nodes and 4,000 cores, one inefficient script can impact a dozen other researchers. It’s about teaching them to think about their workload before they submit the script, which is a vital skill in professional research.

How does TCNJ’s “open” approach to the cluster differ from the experience students might have at larger universities?

Shawn: We try to keep the red tape to a minimum. At many large institutions, you have to submit formal proposals just to get time on a cluster. We don’t do that. We make it available and just ask people to talk to me first to make sure their project won’t overload the system.

I’ve had students come back from graduate school at major universities and tell me, “I knew how to use their cluster on day one because of my time at TCNJ—and honestly, the cluster at TCNJ was easier to use.” We even have Computer Science students who jump on the cluster just because they want to learn AI on their own. As long as they aren’t causing problems for others, we encourage that. It gives them a skill they wouldn’t have had otherwise.

Undergraduate research is a pillar of the School of Science. How do programs like MUSE and COSA utilize these resources?

Shawn: MUSE (Mentored Undergraduate Summer Experience) is essentially a full-time research job for students over the summer. We provide four or five half-day workshops during MUSE to get them started with HPC, visualization tools, and data science.

Then there is COSA (the Celebration of Student Achievement) at the end of the semester. Students present posters or give presentations on their research. For those who used ELSA, we make sure the cluster is cited. It’s part of the STEM workforce development; they aren’t just doing the science, they are learning how to credit the infrastructure that made the science possible.

While the cluster is “owned” by the School of Science, you’ve managed to branch out into other departments. Where else on campus is ELSA making an impact?

Shawn: We do a lot of outreach. We have folks from Engineering and the Social Sciences using it, especially for large-scale statistics and data analysis in the R programming language. We’ve even had interest from the School of Nursing for health science data analysis. As long as it isn’t a massive burden on my time, we welcome them. They are often using the same tools—like Jupyter or R—that our science faculty already use, so it’s an easy transition.

We also look beyond our own campus. We collaborate with Rider University right down the street, and we’ve had their faculty and students sit in on our workshops. We also use the cluster as a recruiting tool for local community colleges. We invite their students to gain experience here for the summer, with the hope of recruiting them to finish their degree at TCNJ.

What is the hardest part of managing Research Computing and Data (RCD) at an institution of TCNJ’s size?

Shawn: Time and sustainability. Because I am the only person in this role, I have to be very careful about recurring costs. I try to avoid anything with a subscription or an annual institutional commitment, like maintenance contracts. When we buy equipment, I usually purchase five years of support upfront so it’s “baked in” to the initial funding.

The biggest challenge, though, is that everything is on me. If I’m unavailable, the work waits. I am trying to document as much as possible—especially those tricky software installs—for whoever comes next. At a smaller institution, finding the budget for a formal succession plan is a slow process.

What advice would you give to someone trying to build RCD support from scratch at a similar institution?

Shawn: Get your stakeholders onboard early—faculty, IT, and facilities. You need faculty champions who will push for the program and help the administration understand that this isn’t just an “IT cost,” but a research necessity. Find out what success looks like for your deans or provosts—is it the number of students involved? Is it grant dollars? Bake those metrics into the project from day one.

What is your elevator pitch for the ELSA cluster?

Shawn: The ELSA HPC cluster is at the heart of TCNJ’s School of Science research computing and data. It facilitates the faculty-student undergraduate research program where computational needs exceed what is available in a standard lab. It also equalizes access to high-end computing resources in the classroom, regardless of the student’s personal device. Ultimately, it gives our students the hands-on training they need to be successful in their future endeavors, whether in the workforce or in graduate school.