Modern DNA-based biodiversity surveys result in massive-scale data, including up to millions of species - of which most are rare. Making the most of such data for inference and prediction requires modelling approaches that can relate species occurrences to environmental and spatial predictors, while incorporating information about their taxonomic or phylogenetic placement. Even if the scalability of joint species distribution models to large communities has greatly advanced, incorporating hundreds of thousands of species has not been feasible to date, leading to compromised analyses. Here we present a novel "common to rare transfer learning" approach (CORAL), based on borrowing information from the common species to enable statistically and computationally efficient modelling of both common and rare species. We illustrate that CORAL leads to much improved prediction and inference in the context of DNA metabarcoding data from Madagascar, comprising 255,188 arthropod species detected in 2874 samples.