ML advice

Advice for machine learning PhD students

by Markus Heinonen, Aalto University, 2025

Here’s what I learned doing a machine learning PhD and supervising many others.

Accompanying slideset

Listen to big-shots

Andrej Karpathy A Survival Guide to a PhD
Patrick Kidger Just know stuff
John Schulman Opinionated guide to ML research
Bill Freeman How to do research
Eamonn Keogh How to do good research
Stefano Albrecht PhD Handbook

The three pillars. Machine learning is (1) understanding, (2) communicating, and (3) implementing; in this order of importance. To minimize time spent coding one should maximize time spent on understanding the problems and the solutions.

Clarity is all you need. Research often boils down to simplifying the problems, solutions and ideas to their simplest version. It’s a good idea to refactor and rework the math until it’s so clear that an outsider could grasp the ideas at ease from a first glance. This often takes time, but will help make a best realisation of the contribution. Implementing, writing and publishing becomes much easier after the conceptual work is done.

Become a coder wiz. Learn to automate your workflows: code, experiments, runs, logging, analysis, plotting, results. Make sure you can reconfig and restart and reanalyse your experiments in "one click" against a GPU cluster. Make clean code and refactor often. Learn the latest tools and frameworks. Ask your collegues for their best practises. One can’t survive a PhD anymore without being a proficient ML engineer. This will multiply your research output.

Exploit LLMs. Use LLMs to code faster, write better papers, do literature surveys, and solve math. LLMs are a game-changer in how we do science.

Benchmarks are not science. Paper’s should not be treated as benchmark competitions, but opportunities to identify novel research problems and understand and address their root causes. Benchmark tables are not scientifically interesting: every year new methods come up and errors go down (brrrr..), often with little gain in insights. Instead aim at understanding the qualitative improvement behind the contributions, or finding gaps in literature, or problems behind SOTA models. These often come from understanding related works and your model more in-depth. By solving true open problems the SOTA results will follow.

Focus on problems. Instead of finding solutions, focus on finding problems that are true, novel and significant. Find open problems by looking at what state-of-the-art can’t do, does poorly, or ignores.

Make a point. Don’t just present a method with 2% better performance. So what? There are thousands of papers every year: why should one care about a 2% improvement? Why is this important and significant; what do we learn from this; why should everyone know this method?

Don't be pretentious. Math for math's sake is rarely useful in machine learning. Don't drown your audience in theory unless it matters. Keep things as simple and tied to the real world problem you are solving.

It’s a marathon. PhD is around 1000 days of work. Plan long term and check your progress twice a year.

Keep learning. One needs to become world’s top expert in the phd topic during it. This means reading 100’s of papers during your phd. Most scientists know one thing very well, and apply it everywhere. For instance, differential geometry, Bayes, numerics, etc. This makes publishing much easier.

Do your homework. Follow good ML principles and keep the quality bar high. Don’t make shortcuts. This will come around and cause trouble in future. Understand your own code and data and literature throughout. Don't rush to implementing a cool idea before you've gone through all related papers. Prepare for any question a collegue could have. Don’t submit unfinished manuscripts with sloppy presentation and incomplete results.

Debug to understand. When things are not working, visualise everything: the loss, the optimisation, the network, the activations, the weights, the data, the likelihood, the gradients, the layers, etc. A “that’s odd…” moment will come.

Backtrack to debug. When things don't work, backtrack until you find a solid foundations that you fully understand and that fully works as expected in every possible way. Then, start adding your stuff back in one at a time: verifying and checking each.

Do project reviews. Present your ideas, projects and code to other phd students for honest feedback. You will learn a ton. If this doesn’t exist in your lab, make it exist. Do this often, eg. once a month.

How to present. Make slides for every meeting. Include a context slide. Distill your ideas to their simplest version: what are the main points you need to convey? Great slides usually have 1 picture and little text (closer 10 words than 100 per slide). Distribute meeting agenda beforehand and summary afterward.

How to give a talk. Math-heavy talks are pointless: the experts in audience already know your work, and rest can’t follow. Talks should be aimed at a non-expert audience to inspire them about stuff they don’t know about. Spend third of your time giving a big picture of the domain: why is it important and cool? Spend rest on the specific problem, and your high-level ideas. Remember that an average listener has mental budget for max 5 equations, and will stop listening if you present more.

Slow down to speed up. Spend time understanding the problem you want to solve, and verify it exists. Formulate hypotheses on how to improve. Start from trivial baselines (random forest, linear regression), simple neural networks, and pre-trained SOTA baselines. Run a sequence of more and more complex models, where ideally you change and quantify only one thing at a time. You want to do coordinate descent: move along one design axis at a time. See Andrej Karpathy’s neural network training recipe and Google’s DL tuning playbook.

Don’t tell me “it doesn’t work”. Why doesn’t it? What steps did you make to narrow down where the problem lies?

Break your model. Stress-test your model until it breaks. What are its limits? This gives you direct avenue to making a second paper. Look at the XAI question bank.

Make first-author papers. To have a PhD and a career afterwards, you need to focus on making first-author papers where you were the driving force. A paper record of middle-author papers indicates poor priorities and inability to deliver.

Don’t hide from your supervisors. Supervisors love talking about science, being challenged, and hearing about your ideas. Actively ask for advice and feedback from your supervisor: meetings where only you talk benefit no one. Insist on regular update meetings.

Be honest. Tell your supervisor when you don’t understand something or when you are struggling. Don’t nod if you didn’t understand; ask for clarification. Implying otherwise makes it difficult to work with you. Don’t imply that you are doing fine when you aren’t. Never cancel meetings.

Go to conferences. Attend a top conference in your field (NeurIPS/ICML/ICLR/etc) every year, even if you have no paper. Workshops are a great way to get your foot in, and practise presenting. Prepare a 5 second and 60 second pitch of what you do to introduce yourself.

Be visible. Have a website for collegues and bigshots to find you. If you have no papers yet, having a technical blog is a good way to show your expertise (great example).

Advertise your papers. Make a website for each paper you make (a good example). Release the code, and spend time polishing it, making demos, tutorials and notebooks. Write a friendly blog post for each of your papers. The more user-friendly your method is, the more citations and impact you will get. Often the most famous method in a field is not the best, but the one that is easiest to use and has best documentation.

Write simple papers. Write in a way that is accessible to a non-expert reader. Use illustrations, colors and short paragraphs. Use LLMs to polish the language. Shorten, distill and simplify as much as you can. If you can’t write a simple paper, the idea is not yet ready. Be explicit and use precise math. Organise internal mock peer-review with other students, especially ones not from your field since reviewers are chosen randomly.

Spend time on figures. Make a good "abstract" figure for page 1 that illustrates the main innovation. Animations typically make even the most complex ideas understandable; work on them (see here or here).

Go to the point. Write what you want to say, and nothing more. As a reader I want to see a 5 line abstract and 2 paragraph introduction: please do this. Bullet your contributions. Related works often works best at the end of the paper. Use short paragraphs, and use \paragraph to title them. Papers often have a good flow when whitespace is maximised. Make a feature table wrt related methods (example). Add conclusions in boxes (example). Color equations (example). Use \underbrace copiously. Put math in full lines. Make figures and tables self-contained. Annotate everything about figures. Captions should give the conclusion, not the description of the figure. Figure font size should not decrease from paper: use small 'figsize'. Include standard deviations. Put full math derivations in appendix, no matter how simple or basic.

Write transparent papers. You want to make everything about the work transparent. Visualise the data by showing example datapoints, showing summary statistics (shapes, sizes, means, variances, etc), and global visualisations (pca, tsne, umap). Visualise the training by showing optimisation traces of all methods over all seeds and repeats. Visualise the predictions: show examples, logit distributions, statistics (bias, variance), etc. Make sure the reader can mentally trace the entire method pipeline from data to results. Use figures to illustrate and concretise the ideas (see example).

Follow the domain. Check all orals, keynotes and tutorials of all top ML conferences, even if they don’t relate to your research. This tells where the field is moving, and you always get some useful ideas. Follow what your competitors are publishing. Use Google Scholar to follow seminal papers and their forward citations.

Attend a summer school. Go to one on your first year.

Do an internship. Company internships or research lab visits are very useful. Most labs are happy to receive students, while company processes are stochastic. Apply early and often. Typical visits are three months.

Enjoy what you do. Move towards projects that interest you. Finish your current project regardless. If your project is not progressing, take initiative. This is your phd thesis and career, you need to drive it forward. Don’t expect a supervisor to make the phd happen: their career does not depend on your success.

If you are stuck. Slow down, rethink what you are doing, and discuss with your collegues (you will notice that people love to give advice!): what problem are you solving and is it the right problem? If the problem is right, is the solution? Keep reading literature: all ML problems have already been solved by someone in some paper in some domain.

Organize your time. Make sure to spend at least 20% of your time reading. Do not slip from this. Keep a research diary and a technical report on your project. Share these as a persistent single-click url for your supervisors.

Queue your work. Research is a sequence of small tasks. Treat it as FIFO queue: have a single active task at a time, and conclude and close it before you move forward. Do not multitask. Do not leave unfinished tasks. If your backlog is growing, stop, and resolve them first.

Calendar, not TODO lists. Don’t make TODO lists. They expand until they become unbearable, and you restart. Instead, block time for tasks on your calendar. Follow Devi Parikh’s advice.

Study math. You want to understand algebra, calculus, probability, statistics, measure theory, functional analysis, differential geometry, complex analysis, and optimisation.

The ideal story. Science builds incrementally on top of earlier results. The earlier state-of-the-art is the bedrock, and your contribution sits on top. A supervisor probably expects to see these steps:

Understand the research domain, and read all papers on it
Be able to reproduce earlier, state-of-the-art results
Demonstrate a significant short-coming in state-of-the-art
Find or adapt a known solution to the type of problem

Read books. Reading textbooks cover-to-cover is useful to obtain holistic knowledge. You want to understand mathematical foundations (Deisenroth), statistical learning (Hastie), machine learning (Bishop) and deep learning (Murphy). If you only read one book, go for Bishop. Great books are:

On math: Deisenroth et al Mathematics for machine learning
On modern ML: Murphy Probabilistic machine learning series, Bishops’ Deep Learning
On learning theory: Mohri et al Foundations of machine learning, Shalev-Shwartz et al Understanding Machine Learning
On statistical learning: Hastie et al Elements of Statistical Learning
On Bayesian modelling: Gelman et al Bayesian Data Analysis
On generative models: Tomczak Deep Generative Models
On information theory: MacKay Information theory, Inference and Learning Algorithms