Recently, we talked to Dan Fu and Tri Dao – authors of “Hungry Hungry Hippos” (aka “H3”) – on our Deep Papers podcast. H3 is a proposed language modeling architecture that performs comparably to ...
But today, Nvidia sought to help solve this problem with the release of Nemotron 3 Super, a 120-billion-parameter hybrid model, with weights posted on Hugging Face. By merging disparate architectural ...