Some ideas for practice and self-study:
-
Following this thread on PyTorch optimisation by [current Hugging Face ML Research Intern] Nouamane Tazi, which ends with links to Pytorch's performance tuning guide and Nvidia's Best Practices guide, take an example network and optimise it!
-
Follow Andrej Karpathy's videocasts on micrograd and makemore and dig into/reproduce/make a variation on a particular part
-
Look at the difference between an earlier and later version of an architecture, e.g. LayoutLM vs. LayoutLMv3 (which could help to indicate a path forward for other architectures like XDoc)