Resources
This page collects a variety of references, organizations, data repositories, links, and books that I find useful or motivating. In all cases, the lists are highly partial, in reflection of my own experience and limitations. If you have a suggestion for an addition, please let me know!
Feminism, Antiracism, and Anti-Colonialism More Broadly
- Entitled: How Male Privilege Hurts Women by Kate Manne.
- How to be an Antiracist by Ibram X. Kendi.
- Stamped From the Beginning: The Definitive History of Racist Ideas in America by Ibram X. Kendi.
- An Indigenous People’s History of the United States by Roxanne Dunbar-Ortiz.
Network Theory
- Lectures on Network Systems, a free book by Francesco Bullo at UC Santa Barbara covering a range of important topics related to dynamical systems on networks. The development of the linear algebra toolbox for approaching network problems is clear and of high general utility.
- Aaron Clauset at CU Boulder maintains excellent lecture notes on Network Analysis and Modeling and Biological Networks.
- Economic Networks: Theory and Computation by John Stachurski and Thomas J. Sargent. I haven’t personally read this one, but I’d like to soon!
- The Atlas for the Aspiring Network Scientist by Michele Coscia contains brief discussion of a very large cross section of the math, models, and ideas of network science.
- A short course in network science at the University of Utrecht Summer School by Javier Garcia-Bernardo, Leto Peel, Mahdi Shafiee Kamalabad, Jiamin Ou, and Vincent Buskens.
Machine Learning
- Applied Machine Learning is a collection of well-annotated Jupyter notebooks from Cornell Tech’s course CS5785, Applied Machine Learning as taught by Volodymyr Kuleshov.
- Patterns, Predictions, and Actions by Moritz Hardt and Benjamin Recht is a useful text for advanced undergraduates or early graduate students on several fundamental technical stories in machine learning.
- MLU-Explain includes high visual articles of a number of core topics in machine learning.
- Pen and Paper Exercises in Machine Learning is a series of mathematical exercises in machine learning for those with some background in linear algebra and probability.
Useful Math
- Introduction to Probability for Data Science, a free book by Stanley Chan at Purdue covering some of the elementary theory of probability as it relates to statistics and machine learning.
- High-Dimensional Probability, a free book by Roman Vershynin at UC Irvine covering a range of topics on the probabilistic foundations of modern, high-dimensional statistics at an advanced level.
- Algebra, Topology, Differential Calculus, and Optimization Theory For Computer Science and Machine Learning by Jean Gallier and Jocelyn Quaintance is a monumental, free online book covering a wide range of mathematical background useful in machine learning and data science.
- Need to brush up on your applied linear algebra? Stephen Boyd and Lieven Vandenberghe have an excellent introduction to the subject from an applied perspective.
- A cheatsheet of useful inequalities, especially for probability, statistics, and computer science. It was compiled by László Kozma.
- Mathematics for Machine Learning by Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong covers a wide range of mathematical fundamentals for understanding machine learning algorithms. It’s a great resource for undergraduates looking to get started in the theory of ML.
Data Sets
- Tidy Tuesday is an initiative organized by the R For Data Science online learning community. Each week, they pose a different data analysis problem in which people can practice their programming and data science skills. The collection of data sets is particularly nice.
- The machine learning competition website Kaggle hosts a large variety of data sets suitable for various data science tasks.
- Data sets for network science:
- The Colorado Index of Complex Networks (ICON) hosts a large variety of network data sets spanning a large variety of research fields. ICON is curated by the group of Aaron Clauset at CU Boulder.
- The Stanford Large Network Dataset Collection (SNAP) hosts a wide range of network data sets. SNAP is curated by Jure Leskovec and Andrej Krevl at Stanford University.
- Austin Benson at Cornell hosts a collection of data sets for a range of problems related to graphs and hypergraphs.
- The Data Science For Good Lab, led by Michael Fire at Ben-Gurion University of the Negev, hosts a number of very interesting data sets. Many of these have network structure.
- UCLA students Christine Gu, Yu-Hsin Huang, and Shaodian Wang assembled a data set of Reddit submissions and comments related to the COVID-19 vaccine. You are welcome to access the data and use it in projects. Please acknowledge Christine, Yu-Hsin, and Shaodian in any published work that uses this data.
- Congress In Data collects a wide range of data sets, including many with network structure, on the US Senate and House of Representatives.
Organizations
- “The society of Women in Network Science (WiNS) connects women, trans and non-binary gender network scientists from different races, socioeconomic backgrounds, and nations. The society aims to recognize the work, perspectives and expertise of its members to create bridges between academia, government, and private industry related to network science.”
- I am a Partner at QSIDE, the Institute for the Quantitative Study of Inclusion, Diversity, and Equity. QSIDE has a number of ongoing projects and welcomes collaborators.
- QSIDE recently released their Data4Justice Curriculum, which contains sample lesson plans, code, readings, and data sets.
- The Just Mathematics Collective is an international collective of mathematicians whose goal is to “to shift the global mathematics community towards justice, via genuine anti-racism, anti-militarism, and solidarity with the Global South.”
Programming
Python
- CS For All, a website and book developed for brand-new programming learners by the Department of Computer Science at Harvey Mudd College.
- Lecture notes and videos from PIC16A, my course on core skills in Python programming and data science.
- A Whirlwind Tour of Python by Jake VanderPlas is an excellent, rapid overview of fundamental Python skills. It is suitable for those who have experience in several other programming languages, or for those who previously learned Python and just need a brush-up.
- Lecture notes from PIC16B, my course on advanced computational and data science in Python.
- The Python Data Science Handbook by Jake VanderPlas is an excellent and freely-available online resource for practical data science in Python.
R
- R for Data Science by Hadley Wickham and Garrett Grolemund is my favorite “0 to data analysis” text. Great chapters on data wrangling, visualization, modeling, and communication.
- Folks with a bit of prior programming experience might like reading Jenny Bryan’s STAT 545, which covers many of the same topics but also addresses workflow considerations like version control, automation, and interactivity.
- Advanced programmers who want to develop their own R packages should consult R Packages by Hadley Wickham and Jenny Bryan.
Julia
- ThinkJulia: How to Think Like a Computer Scientist by Ben Lauwens and Allen Downey is an introduction to computer science principles through the Julia programming language.
- A Deep Introduction to Julia for Data Science and Scientific Computing by Chris Rackauckas offers an advanced introduction to Julia that is most suitable for folks with prior programming experience. There are several interesting problems and case studies included.
- The book Julia Data Science by Storopoli, Huijzer, and Alonso covers Julia basics, data manipulation, and visualization.
Other
- The Missing Semester is an MIT course that aims to train you in fundamental tools for practical computer science that you may not have encountered in other classes. These include shell scripting, text editing, version control, profiling, and much more. Detailed lecture notes and high-quality lecture videos are available on their website.
Other Data Science Technical Resources
- Dirk Eddelbuettel (University of Illinois) hosts a website with a wide array of resources for his course Data Science Programming Methods.
- Sanjay Lall and Stephen Boyd are running an interesting course on machine learning with the Julia programming language.
- Programming for Data Science is a course in the nuts and bolts of writing code for data analysis using R. One thing I especially like about this course is that it introduces machine learning through the topic of algorithm evaluation and auditing. The course is taught by Dr. Sarah Brown at the University of Rhode Island.
Pedagogy
- Evan Peck maintains a website with activities for responsible computing in introductory computer science.
- When Twice as Good Isn’t Enough: The Case for Cultural Competence in Computing by Dr. Nicki Washington.
- Ungrading: A Bibliography compiled by Jesse Stommel.
- The (Un)grading Spectrum by Chris Sarkonak.
Humor
- Many mathematics memes collected by Wyatt Deimel and Sam Willoughby, with contributions from Julia Engholm and Bella Rieder.