The Garment Reconstruction Problem Try-On Left Behind

Virtual try-off reconstructs canonical garment images from photos of clothed individuals, addressing the returns, resale, and catalog automation bottleneck that a decade of virtual try-on research never touched.

Virtual try-on has absorbed a decade of research funding and retail pilot budgets to answer one question: what does this garment look like on me? The inverse question received its first formal definition only in November 2024, when researchers at Bielefeld University coined the term “virtual try-off” and published TryOffDiff: given a photo of a clothed person, reconstruct a clean, standardized product image of the garment itself. That second question is the one returns processing, resale platforms, and catalog automation have been waiting for.

The task sounds straightforward; the engineering is not. A garment on a body is occluded, wrinkled, lit from an arbitrary angle, and deformed by pose. TryOffDiff adapted Stable Diffusion with SigLIP-based visual conditioning, trained on the VITON-HD dataset of 11,552 paired images, and beat every baseline on perceptual fidelity metrics. By May 2025, TEMU-VTOFF introduced a dual Diffusion Transformer architecture handling multiple garment categories and was accepted at ICLR 2026. OmniDiT, published in March 2026, built a unified try-on and try-off framework on 380,000 image pairs. The awesome-virtual-try-off repository now tracks 41 papers, most of them published in the year following TryOffDiff’s November 2024 debut.

Virtual try-on generated thousands of papers and billions in venture funding without ever producing a garment image that could be listed for resale.

The National Retail Federation estimates U.S. retail returns will reach $849.9 billion in 2025, with online purchases returned at a rate of 19.3 percent. Apparel sits at the top of the return curve, with category rates routinely reaching 20 to 30 percent, with some segments exceeding 50 percent. Every returned garment that re-enters inventory needs a product image to be relisted—and re-photographing individual items at $25 to $75 per shot for basic catalog listing is the standard cost of traditional product photography. Try-on promised to reduce returns by helping shoppers visualize fit before buying. That promise addresses the front of the funnel; it says nothing about what happens after the package comes back.

The secondhand market compounds the image problem at a different scale. ThredUp’s 2026 resale report projects the global secondhand apparel market will reach $393 billion by 2030, with U.S. online resale nearly doubling from $29.7 billion in 2025 to $48.3 billion by 2030. Each of those items enters the pipeline without a catalog-quality product image. Managed platforms like ThredUp photograph items centrally; peer-to-peer platforms like Poshmark and Vinted depend on sellers pointing a phone camera at a garment draped over a chair. Standardizing those images is a prerequisite for visual search, recommendation, and cross-platform inventory matching. Virtual try-off converts a single worn-on photo into the flat, standardized image that every downstream system requires.

The counter-argument is direct: if try-on reached full adoption and cut returns sharply, the downstream image problem would shrink on its own. For that to hold, try-on would need to eliminate the bracketing behavior practiced by 63 percent of online shoppers and address the 9 percent of returns the NRF classifies as fraudulent. Even the most optimistic try-on deployment touches purchase decisions only. Returns driven by gift mismatches, impulse reversals, and wardrobing exist independent of visualization quality. Try-off does not compete with try-on; it operates in the territory try-on was never designed to reach.

Catalog economics explain why the research accelerated so fast. A brand with 500 SKUs spends $125,000 to $250,000 annually on product photography. AI image generation has compressed that cost to pennies per image for some use cases. But generating a new image of an existing garment still requires a reference—and for returned, resold, or one-of-a-kind items, the only available reference is often a photo of someone wearing it. TEMU-VTOFF’s authors frame the task explicitly as addressing the cost of acquiring catalog-style garment images at scale. The research community did not converge on try-off by accident; it converged because the commercial signal was loud.

Try-on and try-off run on the same diffusion architectures. The gap was conceptual: the field spent years perfecting how garments look on bodies and never asked how to get them off. Retailers deciding where to allocate AI budgets now face a concrete choice. Try-on helps a shopper imagine a purchase; try-off makes a returned, resold, or newly cataloged garment visible to the next buyer. The harder commercial problem was always the second one.