Summary
In the decades to come, our ability to advance discovery, create new knowledge, and provide insights that suggest solutions to the world’s most pressing problems will increasingly rely on our ability to learn from data.
Stanford Data Science (SDS) convenes a community of the world’s best data scientists with scholars and practitioners from diverse fields who rely on accurate, dependable, large data sets and modern data science techniques to advance their work.
At SDS, research, application, and education thrive in a mutually supportive culture by cross-pollinating ideas, questions, and solutions among engineering, business, the humanities, law, medicine, natural sciences, social sciences, and sustainability experts. Together we are developing new methods, revealing fresh insights, and educating the next generation of leaders and citizens who will harness data science and benefit from its responsible application.
OnAir Post: Stanford Data Science
About
Stanford Data Science Goals
The goal of Stanford Data Science is to enable data-driven discovery at scale and expand data science education — across Stanford and beyond.
To achieve our goals, we are investing in:
Source: Stanford Data Science
Community
Stanford Data Science is helping to weave data science research into the fabric of the university and connect with our peer universities, not-for-profit organizations, and corporate partners.
One way we see this happening is by bringing researchers at all levels together around similar thematic areas like data types and data science methods. We also build community with stakeholders outside the university, students, faculty, non-profits, and industrial partners through sponsored research, social good programs, and exchanges.
Global community
- Women in Data Science (WiDS) – a global movement started at Stanford/ICME to “inspire and educate data scientists worldwide, regardless of gender, and to support women in the field”.
- Center on Open and Reproducible Science (CORES) – focused on developing and nurturing transparency and reproducibility in the collection, analysis, and dissemination of data across all domains of scientific activity.
- Stanford Causal Science Center (SC²) – focused on providing an interdisciplinary community for scholars interested in causality and causal inference.
- COVID-19 Data Forum – a series of multidisciplinary, online meetings for topic experts to focus on data-related aspects of the scientific response to the pandemic, including data access and sharing, essential data resources for analysis, and how we can best support decision-making.
Stanford Community
- NSF Frameworks grant 2019 – 2021: Data Science Collaboratory, weekly meeting in Wallenberg, Wednesdays at 3 PM.
- Data-Driven Wildland Fire Research Seminar Series: a weekly series of talks with discussions from Stanford faculty, researchers, and Ph.D. students on the intersection of wildland fire research and data science. Mondays at 12:30 PM.
Industrial Partnerships
Social Good and other not-for-profit stakeholders
- Data Science for Social Good
- Summer extension of the Stanford Big Earth Hackathon on Widland Fires
- Planning underway for Data Science capstone courses at Stanford. Contact datascience@stanford.edu to help!
Promote your data science event
Data Science is practiced across campus, with workshops, research, and other events happening all the time. We are happy to help promote your data science event, please drop us a line at datascience@stanford.edu to share.
Source: Stanford Data Science
Research Areas
The world is being transformed by data and data-driven analysis is rapidly becoming an integral part of science and society. Stanford Data Science is a collaborative effort across many departments in all seven schools. We strive to unite existing data science research initiatives and create interdisciplinary collaborations, connecting the data science and related methodologists with disciplines that are being transformed by data science and computation.
Our work supports research in a variety of fields where incredible advances are being made through the facilitation of meaningful collaborations between domain researchers, with deep expertise in societal and fundamental research challenges, and methods researchers that are developing next-generation computational tools and techniques, including:
Data Science for Wildland Fire Research
In recent years, wildfire has gone from an infrequent and distant news item to a centerstage isssue spanning many consecutive weeks for urban and suburban communities. Frequent wildfires are changing everyday lives for California in numerous ways — from public safety power shutoffs to hazardous air quality — that seemed inconceivable as recently as 2015. Moreover, elevated wildfire risk in the western United States (and similar climates globally) is here to stay into the foreseeable future. There is a plethora of problems that need solutions in the wildland fire arena; many of them are well suited to a data-driven approach.
Data Science for Physics
Astrophysicists and particle physicists at Stanford and at the SLAC National Accelerator Laboratory are deeply engaged in studying the Universe at both the largest and smallest scales, with state-of-the-art instrumentation at telescopes and accelerator facilities
Data Science for Economics
Many of the most pressing questions in empirical economics concern causal questions, such as the impact, both short and long run, of educational choices on labor market outcomes, and of economic policies on distributions of outcomes. This makes them conceptually quite different from the predictive type of questions that many of the recently developed methods in machine learning are primarily designed for.
Data Science for Education
Educational data spans K-12 school and district records, digital archives of instructional materials and gradebooks, as well as student responses on course surveys. Data science of actual classroom interaction is also of increasing interest and reality.
Data Science for Human Health
It is clear that data science will be a driving force in transitioning the world’s healthcare systems from reactive “sick-based” care to proactive, preventive care.
Data Science for Humanity
Our modern era is characterized by massive amounts of data documenting the behaviors of individuals, groups, organizations, cultures, and indeed entire societies. This wealth of data on modern humanity is accompanied by massive digitization of historical data, both textual and numeric, in the form of historic newspapers, literary and linguistic corpora, economic data, censuses, and other government data, gathered and preserved over centuries, and newly digitized, acquired, and provisioned by libraries, scholars, and commercial entities.
Data Science for Linguistics
The impact of data science on linguistics has been profound. All areas of the field depend on having a rich picture of the true range of variation, within dialects, across dialects, and among different languages. The subfield of corpus linguistics is arguably as old as the field itself and, with the advent of computers, gave rise to many core techniques in data science.
Data Science for Nature and Sustainability
Many key sustainability issues translate into decision and optimization problems and could greatly benefit from data-driven decision making tools. In fact, the impact of modern information technology has been highly uneven, mainly benefiting large firms in profitable sectors, with little or no benefit in terms of the environment. Our vision is that data-driven methods can — and should — play a key role in increasing the efficiency and effectiveness of the way we manage and allocate our natural resources.
Ethics and Data Science
With the emergence of new techniques of machine learning, and the possibility of using algorithms to perform tasks previously done by human beings, as well as to generate new knowledge, we again face a set of new ethical questions.
The Science of Data Science
The practice of data analysis has changed enormously. Data science needs to find new inferential paradigms that allow data exploration prior to the formulation of hypotheses.
Source: Stanford Data Science
Careers
Stanford Data Science is hiring!
We are seeking Research Data Scientists! Research Data Scientists will play a critical role in a new strategic investment in “Marlowe”, a GPU-based computational instrument designed to enable large-scale, data-intensive research. This position will leverage their expertise in data science, machine learning/computation and data-intensive research to develop and optimize workflows and applications that unlock Marlowe’s capabilities for everyone on campus.
https://careersearch.stanford.edu/jobs/research-data-scientist-27446
___
We are seeking a Program Manager to help with the application and review process for “Marlowe”, a new GPU-based computational instrument, and assist in the coordination and administration of other Stanford Data Science programs and activities. This position will work closely with key stakeholders across a wide variety of roles including faculty, researchers, staff, and trainees.
https://careersearch.stanford.edu/jobs/program-manager-26870
Please join our email list, for future announcements.
Source: Stanford Data Science
People
Faculty Director
- Guido Imbens, Economics
Associate Directors
- Emmanuel Candes, Statistics & Mathematics
- Ramesh Johari, Management Science and Engineering
- David Lobell, Earth System Science
- Russell Poldrack, Faculty Director, CORES; Albert Ray Lang Professor of Psychology
- Chiara Sabatti, Biomedical Data Science & Statistics
- Risa Wechsler, Physics; Particle Physics & Astrophysics
- James Zou, Biomedical Data Science
Staff Directors
- Craig Kapfer, Senior Director of Research Data Science
- Chris Mentzel, Executive Director, Stanford Data Science
- Elizabeth Wilsey, Director, Engagement and Partnership
Source: Stanford Data Science
Web Links
Programs
Stanford Data Science (SDS) believes that developing the next generation of early-career researchers is at the core of the University’s mission. We support PhDs and postdocs through the Stanford Data Science Data Science Scholars and Postdoctoral Fellows programs, bringing together a multidisciplinary cohort of scholars to learn, share, and collaborate on cutting-edge topics. Our early-career researchers advance data science, machine learning, and AI and apply these techniques to drive new scientific discoveries in disparate fields, including biology and medicine, astrophysics, sustainability, and a lot more!
Postdoc Fellows
Source: Stanford Data Science
Stanford Data Science seeks recent PhDs of exceptional promise for postdoctoral fellow positions in interdisciplinary research with expertise in both Data Science and its application in a domain of scholarship, like physical, earth, life, or social sciences, humanities, and the arts, business, law, medicine, education, or engineering.
View current Data Science Postdoc Fellows »
The Opportunity
Data Science Fellows work both within and at the boundaries between data science methods and the domains of scholarship that utilize data science to discover and create new knowledge. They will lead independent, original research programs with impact in one or more research domains and one or more methodological domains (i.e. computer science, statistics, applied mathematics, etc).
Ideal candidates will have earned a PhD in either a methods or applied discipline with demonstrated skills and experience in one of the other complementary areas (as examples: a PhD in statistics with applications to physics, or a PhD in biology with extensive use of machine learning). Successful candidates will bring a research agenda that can take advantage of the unique intellectual opportunities afforded by Stanford University, and will have experience in working with researchers across different fields. Their research results will be published in technical reports, open-source software, and peer-reviewed journals, as well as presented at scientific conferences. Ideal candidates will have experience and interests in building community, teaching and training, and leadership with strong communication skills.
Applicants should expect traveling as a requirement to coordinate research with internal and external collaborators and sponsors.
Term
Appointments are initially for one year, with an expectation of renewal for a second year on satisfactory performance. Fellowships have a competitive salary and benefits, with funds to support research and travel. There is flexibility about the start date, September 1st is expected.
Qualifications
- Recent PhD (graduation within the last five years) with experience in a complementary field(s).
- Excellent experience in their PhD discipline (or an area applying data science for new discoveries).
- Excellent knowledge of advanced software engineering, computer science and/or statistics.
- Demonstrated commitment to reproducibility and open research through existing public release of research data and software code.
- Excellent verbal and written communication and presentation skills necessary to author technical and scientific reports, publications, invited papers, and to deliver scientific presentations, seminars, meetings and/or teaching lectures.
- Experience collaborating effectively with a team of scientists of diverse backgrounds.
Desired Qualifications
- Experience in developing curriculum and teaching.
- Experience developing open-source research software used by a community beyond their lab.
- Experience building inclusive communities of practice around data science that are diverse and equitable for all.
Desired Start
September 1st of the following academic year.
Required Application Materials
- Applicants submit their (1) curriculum vitae, (2) a publication/software list, and (3) a two-page letter of intent detailing a proposed research plan. The proposed research plan should include information about both advancing data science and its application in a domain of scholarship. Please also include the names of potential faculty collaborators (ideally bridging a methods domain and an application domain, e.g. Stats+Bio, CS+politics, etc).
- Applicants are encouraged to discuss their proposed research plan with potential faculty collaborator(s)/mentor(s) in preparing their application. Applicants who don’t coordinate with Stanford faculty beforehand should indicate the reason. Stanford collaborators / mentors can be any full-time faculty on campus, prior affiliation with Stanford Data Science is not necessary.
- Applicants need to have two letters of reference submitted through our online form by the deadline: 22 Jan 2024, 9:00 AM Pacific.
Stanford University is an affirmative action and equal opportunity employer, committed to increasing the diversity of its workforce. It welcomes applications from women, members of minority groups, veterans, persons with disabilities, and others who would bring additional dimensions to the university’s research and teaching mission.
PhD Scholars
Source: Stanford Data Science
Stanford Data Science Scholars make up a diverse group of early-career researchers and trainees from all parts of the University who are using and developing data science methods in their research. They share a keen interest in solving problems while sharing and exchanging knowledge with others. A primary goal of the program is to create a community of data science researchers who are representative of the wide array of disciplines and who can share methods and applications while creating a stimulating, innovative, and supportive environment.
The Scholars Program has space at the Computing & Data Science Building (CoDa), room E409.
View current Stanford Data Science Scholars
More Information
For more information on the program, this year’s eligibility requirements, and how to apply, please see our program detail page.
Education
Source: Stanford Data Science
Stanford Data Science supports departments, schools, and programs from all areas of campus to provide diverse data science educational opportunities at the undergraduate, graduate, and professional levels.
New Data Science Majors
Stanford just launched two new majors in Data Science, learn more here: https://datasciencemajor.stanford.edu/
- The B.S. in Data Science is the successor to the major in Mathematical and Computational Science (MCS). The goals of this program remain ambitious: we aim to provide a broad and deep understanding of the foundations of the discipline, training nimble and versatile data scientists. Increasing data size and availability, enhanced computational power, and progress in algorithms and software make this an ever exciting area.
- The B.A. degree in Data Science & Social Systems enables students to develop a triple fluency: expertise in statistical and computational methods, domain knowledge across the core social sciences, and a deep and interdisciplinary understanding of an important social problem.
- A new Data Science Minor is also available.
New Courses
SDS is helping to build new course offerings for the Stanford community.
If interested in learning more, or helping in the development of these offerings, please contact one of the cognizant faculty below:
- Inclusive Mentoring in Data Science
Chiara Sabatti first offered this new course in Spring 2021 now most Winter quarters. Course page. - Race, Data Algorithms, and Health
James Zou and Chiara Sabatti created BIODS240 in Fall 2020 - Data Science Capstone
Chiara Sabatti, David Lobell, Stephen Boyd, Elaine Treharne, Balasubramanian (Naras) Narasimhan, Dan Jurafsky - Data Literacy
Stephen Boyd, Elaine Treharne, Balasubramanian (Naras) Narasimhan, Jef Karel Caers, David Lobell, Chiara Sabatti, Margot Gerritsen, Russ Poldrack, Dan Jurafsky - Computing for Data Science (starting in 2022: BIODS/STATS352)
John Chambers, Balasubramanian (Naras) Narasimhan - Mind Reading with Movies and Neuroimaging
Cameron Thomas Ellis will offer PSYCH236 again in Fall 2024 (includes Python code, using parallel computing and analyzing big data)
Stanford Continuing Studies
Stanford Continuing Studies offers a variety of data science and artificial intelligence classes:
TECH 105 — Statistics for AI, Machine Learning, and Data Science
Look under the Technology category for related classes.
Educational Offerings
Stanford offers a variety of data science educational opportunities for undergraduate and graduate students, including introductory workshops, undergraduate and master’s programs with strong data science components, PhD programs with data science emphases, and training programs. Below is an unordered and partial listing of educational offerings at Stanford. Please email datascience@stanford.edu to add to this list.
- ICME (Institute for Computational and Mathematical Engineering)
- Statistics: many data science-related courses and (masters) or (minor)
- Computer Science: many data science-related courses and (masters tracks in AI) or (information management and analytics)
- Department of Biomedical Data Science
- Biomedical Informatics Program
- Biomedical Informatics Research (BMIR) Research Colloquium
- Major in Mathematical and Computational Science (MCS)
- Management Science and Engineering (MS&E)
- School of Earth Data Science offerings
- SCPD (Stanford Center for Professional Development)
- Foundations for Data Science Certificate program
- Summer intensive on data science
Informal Training, “on-ramps” to data science
For workshops from the Data Science Institute, please see our Community page
- Library’s Center for Interdisciplinary Digital Research (CIDR) workshops
- Stanford Research Computing Center (SRCC) seminars
- Technology Training from University IT (UIT)
Data Science for Social Good Summer Program
Source: Stanford Data Science
The Data Science for Social Good summer program trains aspiring researchers to work on data science projects with social impact. Working closely with governments and nonprofits, participants take on real-world problems in education, health, energy, public safety, transportation, economic development, international development and more. Participants include a diverse and inclusive cohort of students who spend eight weeks of the summer working with Stanford researchers and technical mentors, learning insights from data that benefit society.
- Click here to learn more about becoming a student fellow
- Click here to learn more about becoming a technical mentor [Stanford only]
- Click here to learn how to have your data sets be part of the program
Women in Data Science
Source: Stanford Data Science
Known as “WiDS,” the Women in Data Science initiative was launched at Stanford University in 2015 with a one-day technical conference featuring outstanding women doing outstanding work in the field. Soon after, volunteer ambassadors took WiDS around the world by organizing similar conferences in their communities under the WiDS umbrella. As the WiDS community grew, we added additional year-round programming in education, innovation, and leadership to strengthen community connections and support, educate, and inspire women everywhere.
WiDS (Women in Data Science) envisions a future where women are fully integrated and represented in all areas of Data Science, and share equally in decision-making, economic prosperity, and opportunities.
WiDS Stanford integrates and builds on that mission at Stanford University, with Stanford Data Science as a platform for elevating women doing data-intensive science research.
WiDS Worldwide
Read more about the spin-out of an independent WiDS Worldwide in our press release.
The annual WiDS programming now includes 200+ conferences and events around the world, a global datathon, a podcast series, a WiDS Academy with upskilling workshops and the Next-Gen outreach program, and an upskilling platform for career and training opportunities. WiDS Worldwide programs reach over 150,000 participants each year in more than 160 countries. Since 2023, WiDS Worldwide is a fiscally-sponsored project under Community Initiatives, with continued strong collaboration and partnership with WiDS Stanford.
Visit the WiDS Worldwide Website!
Rising Stars in Data Science
Source: Stanford Data Science
Celebrating the potential of exceptional data scientists through an intensive research and career workshop.
The Rising Stars in Data Science workshop, hosted November 11-12 by Stanford University in collaboration with the University of California, San Diego, and the University of Chicago, focuses on celebrating and fast tracking the careers of exceptional data scientists at a critical inflection point in their career: the transition to postdoctoral scholar, research scientist, industry research position, or tenure track position. Over the past four years, the Rising Stars workshop has hosted over 130 Rising Stars from nearly 40 institutions.
This fall, the sixth annual Rising Stars workshop will showcase the exciting, innovative data science initiatives at Stanford University, UC San Diego, and UChicago. This event will provide PhD students and postdoctoral researchers the opportunity to connect with these networks, platforms, and opportunities. The workshop also aims to broaden access to data science by providing a platform and a supportive mentoring network to navigate academic careers in data science. All graduate students and postdocs, including those from a wide variety of lived experiences and communities, are encouraged to apply. Applications are encouraged from all people of all racial, ethnic, geographic, and socioeconomic backgrounds, sexual orientations, genders, and persons with disabilities.
The two-day workshop will feature career and research panels, networking and mentoring opportunities, and research talks from the Rising Stars. Participants will gain insights from faculty panels on career development questions such as: how to start your academic career in data science; how to strategically sustain your career through research collaborations, publications, and skill development; and how to form meaningful interdisciplinary collaborations in data science with industry and government partners. Participants will also hear inspiring keynote talks from established, cutting-edge leaders in data science. Accepted participants will be reimbursed for qualified travel expenses.
Eligibility & Guidelines
If you have any questions about your eligibility, please email risingstars@stanford.edu.
- Applicants must be full-time graduate students within 1 year of obtaining a PhD, or a current postdoctoral scholar, fellow, or researcher.
- We welcome applicants from a wide variety of fields and backgrounds: any eligible PhD or postdoc who is engaging in rigorous, data-driven inquiry is encouraged to apply.
- Applicants from all institutions, including but not limited to Stanford University, the University of California, San Diego, and the University of Chicago, are encouraged to apply.
- Applicants may only submit one application.
- Applicants may have nominations from a maximum of 2 faculty members or advisors.
Applications are now open! The deadline to apply is August 1, 2025. Applicants will be notified of their application status by September 9.
Workshop Format
- Rising Star research talks
- Panels (career development, data science research)
- Keynote address
- 1:1 meetings with faculty members
- Networking within the Stanford University, UC San Diego, and UChicago data science ecosystems
Virtual Info Session
Join us on July 17, 9:00 – 10:00 am PDT for an informational session on the 2025 Rising Stars in Data Science workshop. In this session, attendees will learn more about the program, hear from the Universities, and ask questions of past program participants.
Research Centers
Faculty-Led Research Centers
The Stanford Data Science Centers convene faculty, researchers, and students around themes of common interest. Center-led seminars, conferences, and events help build multidisciplinary communities across campus. Center affiliates collaborate on research, jointly work with postdocs and research assistants, and participate in multidisciplinary activities with both Stanford and other partners.
Stanford Data Science commits staff and financial resources to each center, often in partnership with other units on campus, and with support from corporate affiliates.
Stanford Causal Science Center
Source: Stanford Data Science
The Stanford Causal Science Center (SC²) aims to promote the study of causality / causal inference in applied fields across campus.
The SC² focuses on two core objectives. The first is to provide an interdisciplinary community for scholars interested in causality and causal inference at Stanford where they can collaborate on topics of mutual interest. We will do so through the organization of regular seminars and conferences focused on topics of interest to scholars of causality. This includes the online causal inference seminar (OCIS). The second is to encourage graduate students and post-docs to study and apply causal inference methods in a range of fields including statistics, social sciences, computer science, biomedical sciences, and law. The center aims to provide a place where students can learn about methods for causal inference in other disciplines and find opportunities to work together on such questions.
Check out our course lists and seminars, including the Bay Area Tech Economics Seminars series, other meetings, and past conferences in 2021, 2022, 2023, and 2024.
Join our mailing list to learn more about SDS and the Casual Science Center!
Center for Open and REproducible Science
Source: Stanford Data Science
The Stanford Data Science Center for Open and Reproducible Science (SDS-CORES) aims to develop and nurture transparency and reproducibility in the collection, analysis, and dissemination of data across all domains of scientific activity.
The Center focuses on two core objectives. The first is to develop resources and support activities that promote the adoption of open science practices at Stanford and beyond. The second is to foster methodological innovations that can enhance the adoption and effectiveness of open science practices.
Center for Sustainability Data Science
Source: Stanford Data Science
Sustainability is a grand challenge of our time and is a major focus of Stanford’s next decade of scholarship and investment. There are many exciting opportunities to advance sustainability science and practice with new data and data science methods, especially at Stanford where there are a large and growing number of faculty, students, and staff who are working on aspects of sustainability.
The mission of the Center for Sustainability Data Science (SuDS) is to foster community between the wide range of scholars on campus by building:
- Engaging interactions to inform data scientists of the big questions in sustainability and ensure sustainability scientists are aware of the latest capabilities of data science.
- Accessible mechanisms to engage in this kind of research.
Our goals:
- Catalyze new research that applies data science to topics in sustainability, with an emphasis on early-stage projects across a broad range of applications.
- Engage faculty from both within and outside the new school on climate change and sustainability, spanning all seven schools at Stanford.
- Provide an educational opportunity for data science students to have a meaningful experience that applies their skills to real-world research projects.
- Provide a venue for matching interests between faculty and students in data science and sustainability science.
Stanford Data Science for Health Center
Source: Stanford Data Science
Vision
The goal of Data Science for Health is to:
- Foster a community for data science + health researchers across all the Stanford schools.
- Create Stanford data assets and infrastructure to broadly enable data science + health research and translation.
- Catalyze new advances in data science methodology to address health challenges.
- Provide education, mentoring, and outreach opportunities in health data science.
Join our mailing list to learn more about SDS and the Health Data Center!
Center for Decoding the Universe @ Stanford
Source: Stanford Data Science
Leveraging complex data to infer how the universe works.
The Center for Decoding the Universe aims to unlock the physics of the universe by pioneering innovative approaches to extracting insights from vast, multi-modal datasets. Our mission is to develop cutting-edge methodologies that harness the full potential of complex data to drive physical inference.
The Center serves as a nexus for interdisciplinary collaboration, convening researchers from diverse academic backgrounds to advance our understanding of the universe and simultaneously push the boundaries of data science and AI. Our structure and activities enable us to rapidly demonstrate the potential of our innovations across diverse scientific domains.
The Center also spearheads innovative educational initiatives that enhance scientists’ proficiency in data science tools and integrate rigorous scientific reasoning into data science practice.
Join our mailing list to learn more about SDS and the Center for Decoding the Universe!
Welcome to the Stanford Center for Computational Market Design
Source: Stanford Data Science
The Stanford Center for Computational Market Design focuses on the interdisciplinary study of modern, complex market platforms, leveraging expertise in algorithm design, economics, machine learning, and operations research.
Stanford Center for Neural Data Science
Source: Stanford Data Science
Neuroscience is bursting with data. With detailed maps of neural circuits, high-resolution behavioral videos, and large-scale recordings of brain activity, we have an unprecedented opportunity for data-driven discovery. Analyzing and deriving insight from these complex datasets requires expertise spanning neuroscience, data science, statistics, computer science, engineering, psychology, and more.
The Center for Neural Data Science will foster new interdisciplinary collaborations between our world-class departments here at Stanford, creating a synergistic environment for developing novel analytical methods and driving transformative discoveries in brain research.
Mission and Goals
- Mission
To advance the understanding of the brain through the development and application of cutting-edge data science methods, fostering interdisciplinary collaboration and training the next generation of neural data scientists at Stanford. - Specific Goals
- Foster Interdisciplinary Research
Facilitate new collaborations between faculty and students from diverse departments. - Develop Novel Methodologies
Drive the creation of new computational and statistical tools for neural data analysis. - Train the Next Generation
Provide comprehensive training opportunities for students at all levels. - Promote Community Engagement
Disseminate knowledge and engage with the broader scientific community and public. - Build Bridges across Stanford Data Science Centers
Build relationships with other centers, including the Open Science and Causal Science centers, through joint events and mixers.
- Foster Interdisciplinary Research
Join our mailing list to learn more about SDS and the Center for Neural Data Science!
Marlowe
Marlowe – Stanford’s GPU-Based Computational Instrument
Stanford has long been a leader in the development, analysis, and use of data-intensive methods. Modern scientific breakthroughs and discoveries in almost every field require massive computational resources to explore novel ideas and paradigms at scales that have thus far been the sole purview of industry.
GPU-Based Computational Instrument
To empower faculty whose research depends on such high-powered computation—and to attract and retain the most talented students, scholars, and faculty—Stanford is making a substantial investment in a large, high-performance, GPU-based computational instrument called Marlowe. As envisioned, the infrastructure and the Research Data Science team will offer investigators the ability to build, analyze, and use large-scale models for new types of scientific discoveries.
Research Data Scientists
To maximize Marlowe’s utility, Stanford University is investing in research data scientists who can:
- Design and optimize tools to get the best system performance.
- Work with research groups and students to educate them about techniques and methods to optimize usage.
- Provide software tools and operating workflows so groups can easily follow the best open science practices.
The team will also help Stanford researchers match jobs to the appropriate resources using various job characteristics such as data-dominated or computation-dominated, heterogeneous vs. homogeneous node requirement, CPU-bound vs. GPU-bound, and scale (e.g. some jobs may benefit from resources available only at a National Computing Center).
The research data scientists will be skilled in efficiently performing large-scale simulations, and machine-learning tasks and possess other specialized skills. Open science practices (organizing, preserving, and sharing data, metadata, results, and computational methods) as mandated by NIH, NSF, and OSTP are best integrated into the standard workflow right from the start.
Such methods become especially important should the job move from our system to other systems such as national infrastructure. We are investing in a team of people including systems engineers, research data scientists, and open science engineers, whose contributions will be worthy of publication authorship.
Overview
Marlowe is an NVIDIA DGX H100 Superpod, built using NVIDIA’s reference architecture, designed to deliver cutting-edge computational performance. It comprises 31 NVIDIA H100 nodes, collectively providing 248 NVIDIA H100 GPUs and 2.5PB of high-performance DDN Lustre storage.
Allocation Details
Stanford researchers will apply for both access and project-based allocations. See the Marlowe Access page for more information on how the application process works. More details on medium and large project allocations are coming soon!
Citing Marlowe
The Marlowe team is using Zenodo for its reference document. Please cite your use of Marlowe as:
Kapfer, C., Stine, K., Narasimhan, B., Mentzel, C., & Candes, E. (2025). Marlowe: Stanford’s GPU-based Computational Instrument (0.1). Zenodo.
Node Overview
Each NVIDIA DGX H100 node includes
- GPU: 8x NVIDIA H100 80GB GPUs
- CPU: 2x Intel Xeon Platinum 8480C CPUs (112 cores/node)
Allocation Details
Stanford researchers will apply for both access and project-based allocations. See the Marlowe Access page for more information on how the application process works. More details on medium and large project allocations are coming soon!
Technical Details
More information on Marlowe’s technical infrastructure is located at Docs.Marlowe.Stanford.edu.
Data Risk Classification
Low and Moderate Risk data
Marlowe is approved for computing with Low and Moderate Risk data only.
High Risk data
Marlowe is NOT approved to store or process HIPAA, PHI, PII nor any kind of High Risk data.
The system is approved for computing with Low and Moderate Risk data only, and is not suitable to process High Risk data.
Users are responsible for ensuring the compliance of their own data.
For more information about data risk classifications, see the Information Security Risk Classification page.
- Memory: 2TB of RAM
- NVSwitch: 4x NVLink connections, providing up to 900 GB/s GPU-to-GPU bandwidth
- Node-local Storage: 30TB NVMe
- Networking: 8x 400Gbps NDR InfiniBand connections, providing up to 3.2Tbps bandwidth
Marlowe Access
Source: Stanford Data Science
Basic Access
New to Marlowe, start here
Medium Project Access
Once you have Basic Access, you may apply for dedicated project access. Please see the prep guide for details.
Large Project Access
Once you have Basic Access, you may apply for dedicated project access. Please see the prep guide for details.
Project Application Guide
Source: Stanford Data Science
Marlowe GPU Project Application – Preparation Guide
Below is a brief guide detailing the information you will need to complete a Medium or Large project application for Marlowe. Please review the following, and take note of the job profile information, storage requirements, and PDF instructions. The application form has more detailed instructions.
Section 1: Project & PI Information
Prepare basic project details including project title, abstract, PI name, group member SUNet IDs, and any urgent timelines (e.g., upcoming grant or conference deadlines).
Section 2: Project PDFs
a) Computational Suitability Statement (CSS)
You will need to upload a 2–4 page PDF describing computational needs, readiness, scalability, and justification for using Marlowe. The 2-4 page PDF for medium projects should focus on computational suitability rather than a full scientific narrative, please include:
- A brief scientific overview for context
- Prior experience on Marlowe or similar GPU-based systems
- Any weak or strong scaling studies (completed or planned)
- Codes and toolchains used (include GitHub links, if available, with instructions on how to run your codes)
- Job profiles: wall time, GPUs/node, concurrency, memory, I/O, and checkpointing
- Computational readiness and tuning status, expected/demonstrated MFU, etc Provide detail on why your workload requires more than standard lab or cloud resources.
- Large projects (>10,000 GPU-hours) should provide strong evidence of readiness and scalability.
Section 3: Computational Profile
Summarize the typical and max job type, wall time, GPU and node usage, concurrency, and usage pattern for this project. This helps assess fit with system capabilities.
Section 4: Technical Requirements & Feasibility
List software frameworks, container tools, and any special configuration needs.
Section 5: Storage & Data
Estimate scratch storage requirements — capacity. Indicate how data will be sourced, moved, and stored. Include details on checkpointing (if applicable).
Section 6: Impact & Acknowledgments
Briefly describe expected outcomes such as publications or software.
Review Process
Computational Suitability Review for Medium and Large Projects (Staff)
Staff will review the CSS for both Medium and Large projects, considering the following areas:
- Experience
Assessment of familiarity with GPU clusters, including past experiences - Software and Resource Requirements
Review of codes, toolchains, clear instructions for running applications, and job profiles for use of GPUs, CPUs, memory, I/O, and checkpointing - Computational Readiness
Evaluation of scaling, optimization, computational efficiency (e.g., MFU), and - Need for Marlowe
Justification for needing GPU resources beyond standard lab or cloud environments, Sherlock, etc.
Marlowe Recharge
Source: Stanford Data Science
Overview
The Marlowe GPU cluster operates under a subsidized recharge model, designed to sustain and improve our computing resources. Rates are set by VPDoR, below market value, and subject to future reductions.
Recharge Rates (Effective July 9, 2025)
Resource Type | Non-preemptible Jobs (Medium/Large Projects) | Preemptible Jobs (Basic Access) |
---|---|---|
GPU Usage | $1.25 per GPU-Hour | $0.75 per GPU-Hour |
CPU Usage | $0.050 per CPU-Hour | $0.025 per CPU-Hour |
Project Storage
Storage Type | Monthly Rate |
---|---|
Intelliflash (/projects) | $20 per TB |
Billing Details
- Billing applies monthly to GPU-hours, CPU-hours, and project storage.
- No retroactive charges. Billing starts on July 9, 2025.
- Add or update Basic Access PTA here: https://forms.gle/uB8zQ5skEgcU7wrU8
Support
For assistance or questions:
- Slack: #marlowe-researchers
- Email: marlowe-info@stanford.edu