Growing a Research Computing Powerhouse: How Northwestern’s Team Tripled in Size While Staying True to Its Values
Set on the shores of Lake Michigan in the northern Chicago suburb of Evanston, Illinois, Northwestern University is an undeniable research powerhouse, with a Carnegie Classification of R1 and funded projects totaling more than $1 billion in fiscal year 2024. The medical and law school campus is located in downtown Chicago. This private institution is home to 90 school-based research centers and 35 university research institutes and centers. CaRCC sat down with Jackie Milhans, Director of Research Computing and Data at Northwestern, to learn how she and her team support such a dynamic research enterprise. The following Q&A has been edited for clarity and brevity.
Can you describe your team’s reporting and governance structure?
We are in Northwestern Information Technology and report up to the CIO. We have an advisory committee with representation from the research deans, the dean of Libraries, Office of the Vice President for Research, and Office of the Provost. We meet weekly with the CIO and about monthly with the Provost and VPR.
How did your program get started and how has it grown since you’ve been there?
It started in about 2008 as an HPC shop, essentially providing high-performance computing for research. It primarily focused on some key investigators who needed high-performance computing for their research. It grew into a more mature service model starting in 2012 to 2013. I joined Northwestern in 2013, and when I first started, we had between 200 and 300 users. We now have over 6,000 users on the cluster, so it has grown significantly.
Northwestern doesn’t only have engineering. We also have a medical school, a business school, a law school, journalism, physical sciences, political science – these are all areas that now use high-performance computing at Northwestern. It can be a real challenge to support all these areas at the same time. However, it has given us the opportunity to branch into new services. We started building out data science support in 2016 under Christina Maimone, Associate Director for Research Data Services, and helping people build research software, data collection, data scraping, data wrangling, and cleaning data. Then, in 2018, we added data management services to help guide people in storing their data and transferring their data. We were maturing and understanding more how to collaborate with people internally and externally.
What other kinds of things does your program offer and focus on now?
We are working on reducing excess data stored and improving research data organization. This allows researchers to understand which data are important and how to organize it so that researchers are prepared when it comes to publication time and for reproducibility. That’s extremely important due to the mandates regarding providing public access for federally funded research data. We’re also understanding data security concerns because we do have a medical school, as well as funding from DOE, DoD, etc. We think about PII and are preparing for NIST 800-171. We’re trying to mature our security posture for research, creating a secure enclave for data. At this time, we are pursuing a set of cloud-based services and products, and more importantly – processes, that can be approved for certain types of data concerns.
On the data science side, we help people think about how to apply AI or machine learning methods to their research. The data scientists will often write software to automate tasks or apply AI or ML methods in support of the research.
We also have more tailored support for genomics researchers, including staff, compute, and storage, which is funded by our medical school. It is also available for folks who are doing non-human genomics, such as plant biology.
We have grown significantly – almost threefold over the past year – due to a large investment from the Office of the Provost at Northwestern. We’re very grateful for that opportunity because our team had not grown at the pace of researchers adopting our services. This expansion was really necessary to support the kind of researchers that Northwestern has and is recruiting to the university. Before the expansion, we were not able to go too deep into any research project, but we now have bandwidth to provide dedicated support for data science and statistics.
Does your program also do training and workshops?
Yes, we do a lot of training and workshops. The majority of our training and workshops are in the data science area. We also have overviews of our services for computing, data management, data storage, data transfer – as well as repeated topics like version control with GitHub, as well as advanced topics. It used to all be in person before the pandemic led us to go all virtual. We now have a mix, and we’re trying to determine when students and faculty will come to in-person workshops and what works better in person.
We have several student workers in the program. Our data science student consultants often teach more advanced topics. They often have specialized knowledge in tools, libraries, etc. for their research areas. We also find that it takes some experience to know the cues when people aren’t following along and develop the appropriate pace for beginner workshops. So we rotate those roles throughout the staff and more senior students to allow them to develop while providing opportunities to teach new things.
Another program our data science team offers is called BYOD or “bring your own data.” It’s a set of working groups that meet quarterly, usually with about three cohorts. Proposals are received from researchers, usually grad students and a few faculty. A data scientist, statistician, or other expert will meet with the cohort weekly. It provides accountability for researchers to make progress, and it also helps them bridge the gap between the training and applying it to their research. That gap can be really difficult, because people are learning these skills in a vacuum and then they have to apply it to their research. It is a huge leap. The BYOD program is a vital bridge to employing data science skills for many researchers.
Are there particular populations of researchers that you’re especially targeting or trying to grow today?
We have all of the usual areas of research like engineering, physics, materials, and chemistry. Genomics is one that has emerged in a big way over the past 5 to 10 years. Those are probably our biggest numbers in users when it comes to compute. When it comes to data science support, it tends to be in the social sciences and medicine, but we also see engineering emerging in that area, especially when they’re looking at applying machine learning methodology.
However, with GPUs for AI and their need for liquid cooling, this can only be done centrally. So, the cooling and power demands are making things difficult to do independently. We are actively partnering with researchers who used to have their own cluster, but must come central now . We also work with Northwestern IT’s networking team so that researchers have faster connections. This is often needed when there is a requirement to compute or react to instrument data immediately. Computer science is one area where we are currently partnering to offer liquid-cooled GPUs for AI research, providing state of the art GPUs, systems management, and applications support. In the past, the department has been a mix of using its own cluster, faculty-owned clusters, and using our central cluster, Quest. .
We are also always working to be as approachable as possible to all levels of computing knowledge. This includes engineering. We are also seeing more activity from law, journalism, and anthropology. Our social sciences have become a lot more active in this area, and the relationships with our data scientists, as well as their expertise, have really helped build understanding and trust, resulting in successful research project collaborations.
Additionally, a chargeback model funds a small portion of what we do. We seek to balance priorities at the University and lift up folks who don’t have access to the same funding opportunities as other areas.
It sounds like there’s a lot going on. How many people are involved in delivering the services?
We have about 38 people, including infrastructure staff. With networking, this is about 4 more working on research projects. Data center operations includes another five or so. At any given time, we have somewhere between 15 and 25 students working on our team.
Could you tell us a little bit about how the student program is run? It’s pretty impressive.
The student program allows us to provide services and also provides a great working environment. Our student program was built by Alper Kinaci, Manager of Research Computing Services, as a way to provide computational support, answering tickets, and learning about HPC. It is currently run by David Glass, Research Computing Support Lead. A number of undergrads and a couple of grad students support the compute services. They provide important support, as they answer tickets and provide us insight into how students communicate and work. The grad students have been able to try out new tools and software, providing helpful perspectives since they are researchers themselves. For example, a graduate student was critical in integrating XDMoD into our operations. Grad students often explore specific tools, software, or projects.
Their work experience in Research Computing and Data Services can help them build decision-making, communication, teaching, and presentation skills. It has been a great resume builder for a lot of our students. The student program has provided mentorship opportunities for staff and opportunities to learn how to delegate. Being able to delegate has become a key skill for our staff, especially as our services have expanded with the size of our team.
Our team has three graduate student assistantships that are funded through The Graduate School. Other students are paid at an hourly rate, similar to our undergrad students. We have a pay scale, and our student data scientists are active consultants and lead or TA workshops. Many of our students have been with us for a number of years. They also support the BYOD program. Students add value in many ways, including having an area of expertise that is in demand or not already represented on our team. It’s also enables our team to better support Northwestern, as they broaden our breadth of knowledge and experience because supporting all these different types of research areas is nearly impossible to do with a handful of staff.
In addition to IT, how do you partner with other providers?
We meet regularly with the IT directors from the schools and colleges to share information and align efforts to support larger research projects, as well as service-related efforts. For large, cross-team efforts, we (central and distributed IT) often choose somebody to be the entry point or their “concierge” to make sure we are not confusing each other or the research group.
We work with Northwestern and Galter Health Sciences Libraries as partners, especially for data management, in the publication phase, and also at the very beginning, when people are developing their data management plans. The Libraries also provide GIS support. We have historically provided the data science support, and they have built out support in the humanities recently. Galter Health Sciences Library, has more of a focus on bioinformatics, data repositories, and publication. We often tackle gray areas together, as there is too much work to be territorial.
Additionally, we do not look to compete with existing services. Instead, we look to complement. For example, now that we have recently launched a statistical consulting service, we could have a lot of overlap with the biostatistics core at first glance. However, NIH-funded or medical school researchers should visit the biostats core first. In most cases, if our partners are not a fit, researchers can come to us. We support the entire University and will look to find existing solutions or resources if we cannot support researchers ourselves. When there is no existing solution, we cannot always provide bespoke or unsustainable solutions due to the inability to scale staff time.
Do you have any regional or national partners and resources?
We are an active partner with the CaRCC and CASC organizations. We are a member of the Big Ten Academic Alliance and the MidWest Research Computing Consortium. We utilize ACCESS and participate in Campus Champions. I’ve recently become active in a small Ivy+ group, which is helpful as the institutional goals tend to be similar across those institutions. Our team is looking to provide services that are pushing the boundary or often being a fast follower. If it’s advantageous from an RCD community standpoint, our team tries to be involved and we often try to do as much as we can. I’m proud of so many of our team members who have volunteered their time and skills to these communities, and I’m grateful for the opportunities the organizations have provided us to build leadership, communication, and other professional skills.
How are your program’s salaries funded?
Most of the program is funded by the University through the IT organization and Provost’s office. Our staff are salaried positions, and we also have a student staff budget. I previously mentioned the three graduate assistantships, which are funded by The Graduate School and a staff position funded by our medical school to support genomics research. We also have created term positions and have had interns when we have needed to expand our team, which has allowed us to develop staff in an area where the workforce demand is bigger than the existing workforce. People can learn on the job and decide if this is the right career for them. If you need someone with more experience, many end up recruiting from a peer institution or a national lab. Some of our students have come to work on our team after graduating, either before their next job or permanently. This has been beneficial on both sides. And Northwestern benefits in a tremendous way, as an experienced, skilled consultant is supporting researchers at the University.
For our dedicated effort on research projects, we are sometimes written into grants or we are provided an hourly fee. If we have committed to a project, that will take priority. However, we will also support research without funding available. With the training, BYOD program, and answering routine requests, we only have so much time to dedicate to individual research projects. We have to balance across well-funded research areas and those without funding that is widely available. Our staff also need time to go on vacation, have sick time, answer email, and go to meetings.
How does funding work for non-salary costs such as hardware and cloud services?
Like many universities, we have a yearly budget exercise. We work with our IT partners and cost share partners for planning. Software tends to be in our operating budget or shared across relevant schools and colleges, and infrastructure is requested through capital budget. We develop a seven-year plan, which helps us avoid decisions without sustainability in mind and allows the institution to anticipate bigger changes or potential factors in future years (such as a need for HIPAA-compliant computing, etc.). In a real life example, liquid cooling and power requirements, as well as increased GPU costs, are affecting many RCD centers right now. I think many of us underestimated how expensive GPUs would be, as generative AI changed the game. But, by planning for an increase in AI research three years ago, it helped mitigate what could have been a bigger challenge for us. Planning for future years allows us to align and strategically plan when infrastructure may sunset, when major investments may be required due to increased research needs in an area, etc.
How do you think about the impact your services are having? Are there metrics you track? Are there things you really want to be able to report?
We track metrics to understand the experience of our services, demand of our services, service adoption, and research impact. Experience of our services can be understood by tracking wait time on our cluster, repeat workshop attendees, and time to resolution of a ticket. We track demand by tracking the number of consultations requested and provided, new and active cluster users, workshop attendees, etc. For research impact, we track the number of research projects we have supported, the number of PIs, and research dollars associated with PIs supported by Research Computing and Data Services each fiscal year. And what we find is that we support PIs associated with 85 to 95 percent of the funded research at Northwestern. We have tracked both repeat and unique folks who are attending our trainings, and have looked at the attendance for more advanced levels of training – including from what school or college are the advanced training attendees. We want to see researchers building skills to the level of maturity required for their research, which may not always be to advanced levels. We track which research areas are represented in our services, schools, colleges, etc. And we track affiliation – faculty, graduate student, staff, postdoc, etc. We are seeking to build maturity in the ability to apply data science methods, using complex data sets or large data sets, or HPC when research requires. Our IT Communications team tracks engagement or visits to our documentation and website to understand if they are useful or users are aware of web resources.
Cluster wait times help us understand if we have enough compute resources and prioritize opportunities to help researchers optimize their job submissions. We look to help researchers use the cluster responsibly by requesting only the resources required. In addition, Northwestern has made impactful investments to provide computing at no cost to its researchers to support junior faculty, researchers without access to funds, etc. We have a condo model. About half of our cluster is dedicated to use at no cost to the researcher and the other half is purchased, dedicated access for groups with large HPC research requirements.
Our metrics are critical to making future decisions, helping others understand our impact and our work, as well as the demand for our services. It is also important for us to understand why some months feel more slammed than others and to schedule maintenance during low use periods on the cluster.
Growing by a factor of three in a short period of time sounds genuinely hard. How have you handled the rapid growth of your team?
One of the things that I think has been extremely important is having a very strong foundation to start with. The growth has not been without hiccups, including my own. This has been a major transition for me, as much as it has for anyone on our team. Everyone who was on our team two years ago has had changes in their responsibilities, even within the same role. Of course, new team members have navigated adjustments, from moving to a new career path. Managing ineffective (or worse, toxic) behavior quickly and addressing it is critical. The tone needs to be set that we are kind, and it is safe to ask questions on our team. We aren’t going to ridicule anyone for asking questions. In higher ed institutions, there is no manual of how to work there, so it really is relationships. Asking questions is how you’re going to learn information quickly.
Team members are expected to raise issues in a timely manner and not cover up mistakes. It gives us a chance to work together to resolve the issue, and more importantly, not punish the employee. This has been the tone of the team for a very long time, from my viewpoint. It really has been a culture of mentorship, working together, and learning new things. It isn’t always a hierarchical approach to all decisions. It’s working together on how to come up with the best decision or solution.
Because people are working together, the team members are independent or working with team members. Because I am farther away from most of the technical items, I probably will not make a better decision than them, so it is important to empower folks to build project plans and make decisions. Liz Summers, our team’s project manager, has been critical in ensuring we have project plans. This has helped when staff are unexpectedly out of the office, or a dependency on a project needs more time. In order to empower good decision-making, priorities should be shared with team members. The larger group visits this about once a year, while it is visited monthly in project meetings and meetings w/ team supervisors. It is important for managers and team leads to provide guidance for their teams. So, if we have to push or reframe a project, we consider our priorities and have to communicate what we can deliver.
Wherever possible with repeat requests, we will operationalize and standardize. It cuts out the decision-making and exhausted time from meetings. So standardization and trustworthiness across the team, being able to ask questions, and building the delegation skill have all been extremely important.
Colby Witherup Wood, Manager of Data Science Services, further matured interviewing processes for the team to support both the interviewees and interviewers. With all of the positions, standardizing saved time and increased equity in the interview processes. A couple of team members created a document that outlines what we do in engaging with faculty, students, and the research community. It sets the expectations of the team and what the interview process looks like. We ask almost entirely behavioral questions. To support an equitable process and to hire the best candidate, we use hiring rubrics. It’s not only preparing the interviewees, but also the interviewers. We train team members on interviewing before they are included as an interviewer – including how to ask behavioral questions. We have a set of questions to ask every candidate for a role, and doing our best to run inclusive interviews and use equitable hiring processes.
Once people join the team, we have onboarding documents for practical things like: How do you get into our suite? How do you get access to systems? How do you order equipment? Those types of things. And we have a more living document that explains things like: What teams do we usually work with? Who do we ask about data transfers? If we see a bottleneck, who do we generally work with? It is like an unofficial org chart.
We also have team meetings. I do a monthly team meeting which used to be every other week when we were a smaller team. And then the teams – data science, data management, and compute – all do their own weekly team meetings. Everybody meets one-on-one with their direct reports weekly. I do skip-levels quarterly with supervisors who don’t report to me and also with a senior staff group together without their supervisor. We have monthly team leadership meetings as well.
It’s really about people and how to support them and onboard them. Onboarding new team members effectively has been key to enabling these experts to be productive, have wins, and build confidence so that they can engage with their work and teammates quickly.
Are there resources that you’ve found really useful for yourself and your team leads?
One of the things that I started listening to when I would go on jogs in the morning was Manager Tools. There have been podcasts that have been so helpful for me when I’m thinking about my early days of management, especially because of scripts they provide for almost any situation you could encounter. You can start tailoring it to more of your style, but it helped reduce my self doubt when I was encountering common challenges for new managers. The podcast notes are extremely helpful and well worth the license, especially if you are more of a reader than listener.
I set very high expectations for the team when it comes to practices around people, and I really don’t have tolerance for things like bullying. Venting is normal, but then you have to reach out. If you’re having problems with someone, the first thing I am likely to say is to go meet with them and listen to what their priorities are. One-on-one meetings can resolve a lot of issues. Relationship building has been extremely important to our success, especially being at a university where everybody works in a different building and many of us have different priorities.
So yes, I would say Manager Tools was the biggest thing for me. And Jonathan Dursi’s newsletter is great. I really am grateful that one of our own writes about management practices!
Do you do regular outreach to your campus about your service? If so, what approaches do you use?
We took a hiatus, but we are back to hosting an annual research symposium called the Computation and Data Exchange (CoDEx) that showcases computational or data-intensive research on campus. There are usually keynotes, research talks from faculty and grad students, a poster session, data viz challenge, and lightning talks – used as a ramp for researchers new to presenting.
We have a listserv of about 8,000 people who have used our services or attended our trainings. We send a newsletter that includes details about our workshops or ad-hoc opportunities on a regular basis. We have recently partnered with the IT Communications team to develop a format that is easier to read and more attractive. We also have reengaged in attending research group meetings for computational researchers, and more proactively building relationships with researchers using HPC resources. Finally, we work with IT Communications to advertise student job postings, workshops, etc. in newsletters from other units and departments.
Are there tools that you use for communicating within your team and keeping track of work status?
We use a combination of email, Teams, Excel, and Smartsheet when it comes to tools for communicating and tracking work. To improve our productivity and outcomes for service improvement projects, a full-time project manager has recently joined the team. She tracks our project portfolio and manages our cross-team projects. We use project charters to ensure we are all on the same page for scope, budget, tasks, responsibilities, and timeline. Getting the charter and project plan done are more easily said than done. We use Excel and Smartsheet to track our project progress. We have monthly project portfolio meetings to discuss overall progress. Individual project meetings are scheduled based on the project needs. We also use our regular one-on-one meetings to communicate progress in between project meetings.
What are your near/medium-term priorities?
A big priority is to ensure our new staff are getting settled in, learning, and being productive – this is both for the employee and for our goals. We want to continue to provide appropriate stretch opportunities for staff who are ready to perform beyond their current role.
We’re also focused on our research symposium, CoDEx, to generate a sense of community among researchers who use computational methods or conduct data-intensive research, as well as buzz about our team.
We’re working to refresh our HPC storage and developing a long-term plan for data center facilities in future years. With an increase of AI research on the horizon, we need affordable, sustainable solutions to offer GPUs to our researchers.
We have begun collaborations on more research projects, where we provide expertise in data science, statistics, software development, or code optimization. We want to solidify our secure enclave service offering that allows us to provide a repeatable solution for the restricted data types we see on campus.
We are working with our school and college IT teams to provide sustainable data storage solutions for research data, both on-premise and cloud-based.
How do you set priorities?
We must always continue to make sure our bread and butter services like HPC, data storage, and workshops or bootcamps are operating smoothly. These take top priority, as well as documentation. They can serve the majority of research on campus. It is important to do your bread and butter services well before doing undefined or new services. Our project portfolio check-ins catch special interest projects that may not be a good use of time or that may create unsustainable offerings.
Security is now continuously top of mind, so that continues to rise to the top. It takes an incredible amount of staff resources. The Regulated Research Community of Practice has been a helpful resource. In general, priorities match the goals of the institution. For Northwestern, it is important to do the run, but also the grow and transform. For the transformational priorities in research, Northwestern is prioritizing biosciences, renewable energy and sustainability, and data science and AI, among other published priorities. Our advisory committee has been helpful by providing us feedback on our roadmap. I am grateful that they continue to help us understand pain points and have also been great supporters of us.
Is there a success story for your program that really stands out?
I mentioned that we’re building our team and growing quite a bit, especially over the past year. I am thrilled to say that we had several candidates wanting to work specifically on our team – a higher ed IT team. To me, that says that our managers have developed a team with a great reputation, whether that is due to a great working environment, that we do work that matters, or that we also care about people’s careers. And our team members shared with interviewers that they get feedback that has helped them grow on this team. I believe that our team members know that they will have career growth here, professional development. That is, I think, the success story. When we have students employees who want to stay and transition into a full-time position, that’s really rewarding. I think that that’s the biggest success story of our team – providing new career opportunities in an area with high workforce demand.
What’s your elevator pitch for your team?
What we want to do is make research easier, and we also want to enable research that otherwise couldn’t be done without an expert – someone who has years of experience in computing or data science, working with complex data sets, writing machine learning or AI research software, automating data movement, or developing data processing scripts. Permanent staff also provide knowledge and research productivity continuity that can complement a graduate student’s efforts.
There are so many things that we can do quickly, provide answers, or offer some guidance. It could take half an hour of your time, and it could open up new doors to future research. Whatever we can do to make their life easier, your life easier as a researcher, we’re here to enable research. That’s what we love to do.