• 31337@sh.itjust.works
    link
    fedilink
    arrow-up
    2
    ·
    2 months ago

    Likely transformers now (I think SD3 uses a ViT for text encoding, and ViTs are currently one of the best model architectures for image classification).