avivga / zerodim

Disentangled face manipulation using CLIP-based annotations

  • Public
  • 1.8K runs
  • GitHub
  • Paper
  • License

😵 Uh oh! This model can't be run on Replicate because it was built with a version of Cog that is no longer supported. Consider opening an issue on the model's GitHub repository to see if it can be updated to use a recent version of Cog. If you need any help, please hop into our Discord channel or Contact us about it.

Run time and cost

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 53 seconds. The predict time for this model varies significantly based on the inputs.

Readme

ZeroDIM

An Image is Worth More Than a Thousand Words: Towards Disentanglement in the Wild
Aviv Gabbay, Niv Cohen and Yedid Hoshen
Neural Information Processing Systems (NeurIPS), 2021.

Abstract: Unsupervised disentanglement has been shown to be theoretically impossible without inductive biases on the models and the data. As an alternative approach, recent methods rely on limited supervision to disentangle the factors of variation and allow their identifiability. While annotating the true generative factors is only required for a limited number of observations, we argue that it is infeasible to enumerate all the factors of variation that describe a real-world image distribution. To this end, we propose a method for disentangling a set of factors which are only partially labeled, as well as separating the complementary set of residual factors that are never explicitly specified. Our success in this challenging setting, demonstrated on synthetic benchmarks, gives rise to leveraging off-the-shelf image descriptors to partially annotate a subset of attributes in real image domains (e.g. of human faces) with minimal manual effort. Specifically, we use a recent language-image embedding model (CLIP) to annotate a set of attributes of interest in a zero-shot manner and demonstrate state-of-the-art disentangled image manipulation results.

Citation

@inproceedings{gabbay2021zerodim,
  author    = {Aviv Gabbay and Niv Cohen and Yedid Hoshen},
  title     = {An Image is Worth More Than a Thousand Words: Towards Disentanglement in the Wild},
  booktitle = {Neural Information Processing Systems (NeurIPS)},
  year      = {2021}
}