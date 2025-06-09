Dataset establishes new AI training benchmarks as detailed in accompanying research paper

Zedge, Inc. (NYSE American:ZDGE), $ZDGE, a leader in digital marketplaces and interactive games that provide content, enable creativity, empower self-expression and facilitate community, today announced the release of a new foundational image dataset – DataSeeds.AI Sample Dataset (DSD) – purpose-built for computer vision and generative AI model training. The dataset was created in partnership with Perle.ai and Émet Research and represents a major step forward in data-centric AI image development.

Jonathan Reich, CEO of Zedge, Commented:

“The DSD release marks a significant milestone for DataSeeds.AI, whose goal is to become a major supplier to enterprises that create foundational models in need of rights-cleared, high-quality images. The DSD annotation delivers measurable improvements over legacy solutions like AWS Rekognition, setting a new benchmark for high-quality, human-aligned AI training data. DataSeeds.AI was able to assemble the DSD by leveraging GuruShots’ tightly knit photographer community and their wide-ranging portfolio of photographs for high-quality AI training data. This release not only underscores the commercial potential of DataSeeds.AI as a serious contender in the evolving B2B marketplace arena for AI datasets but also highlights the natural synergies that exist with our creators across both GuruShots and the Zedge Premium marketplace. It validates our ability to turn user-generated content into scalable, enterprise-grade datasets that can generate new revenue sources for Zedge.”

The DSD is comprised of over 7,800 high-quality photos sourced from players of Zedge’s leading photography game, GuruShots. Every image in the dataset was ranked by players of the game, and subsequently, each image was annotated by expert reviewers who provided detailed descriptions of the image content. The DSD release marks a major step in building the kind of real-world, human-reviewed data that improves the veracity of modern AI models.

The introduction of the DSD highlights the inherent value in DataSeeds.AI’s capacity to meet custom image demand promptly by launching relevant GuruShots photo challenges and/or by accessing existing images from GuruShots’ massive catalog. Whether it is improving generative AI models, analyzing scenes or handling edge cases, the platform offers a scalable pipeline supported by tens of thousands of photographers that can provide diverse and rights-protected images.

Ahmed Rashad, CEO of Perle.ai remarked, “The DataSeeds.AI partnership allowed us to apply our methodologies, which leverage domain expertise and AI, for high quality data annotation and while validating the results through comprehensive benchmarking research. We are thankful for Zedge’s partnership and the meaningful contribution that DataSeeds.AI is making to the AI community. DSD is a milestone for human-aligned dataset creation.”

Freeman Lewin, CEO of Émet Research said, “We’re deeply grateful to Zedge, DataSeeds.AI and Perle.ai for enabling this release. Together, we’ve not only demonstrated the power of data-centric AI but also introduced a best-in-class model for data to be used for AI training. We’re excited to keep supporting important AI research efforts in conjunction with industry leaders like Zedge and Perle.ai.”

The release of the DSD is accompanied by a evaluative research paper titled “Peer-Ranked Precision: Creating a Foundational Dataset for Fine-Tuning Vision Models from DataSeeds’ Annotated Imagery,” which shows how training AI models with the DSD yields 70% better results when compared to using typical benchmark datasets. The dataset, model weights and paper are now available to the public.

The DSD was labeled through a multi-tiered process where human experts described scenes in natural language and even outlined certain objects down to the pixel. This helps AI learn in a way that’s closer to how people view and explain the world.

Technical Deep Dive: Research Findings and Differentiators

The DSD was designed to serve as a reproducible benchmark for training and fine-tuning multimodal vision-language models. It includes 7,843 high-resolution, rights-cleared photographs sourced from GuruShots, each selected through a unique in-game peer-ranking system that reflects aesthetic and compositional quality validated by a global photography community.

Each image was then enhanced with multi-tiered human annotation through Perle.ai’s expert-in-the-loop pipeline, including:

Pixel-level segmentation

Structured scene descriptions

Technical metadata (e.g., exposure, focal length, depth-of-field assessments)

Title and category-level labels aligned to visual content

This combination of peer review, expert annotation and visual diversity enables DSD to provide context-rich training data that improves model grounding and multimodal comprehension.

Key empirical findings from the research paper include:

Fine-tuning LLAVA-NEXT on DSD led to a 24.09% increase in BLEU-4 , with corresponding gains in ROUGE-L, BERTScore and CLIPScore , validating stronger semantic precision and image-text alignment.

When benchmarked against the DSD annotations, AWS Rekognition achieved only a 0.19 F1 score , demonstrating the limitations of automated commercial tagging systems for high-quality dataset curation.

BLIP2 models also showed meaningful improvement when fine-tuned on the DSD, indicating that the dataset generalizes across different architectures and not just LLAVA-style models.

What makes the DSD and DataSeeds.AI uniquely valuable?

Data-centric development : The DSD supports the shift from model-centric to data-centric AI by prioritizing quality, context and diversity in training inputs.

Scalable generation : DataSeeds.AI can rapidly build domain-specific datasets by launching on-demand GuruShots photo challenges and/or by drawing from Zedge Premium’s and GuruShots’ massive catalog of 30M+ and growing, rights-cleared images enriched with EXIF metadata, tags and geolocation diversity.

Human-aligned annotation : Unlike auto-tagged datasets, the DSD annotations provide interpretability, nuance and grounding that support vision-language understanding and generalization to real-world use cases.

Open availability: All data, models and benchmarking results are reproducible and available on HuggingFace, encouraging adoption, validation and further innovation.

This foundation positions Zedge’s DataSeeds.AI platform as a differentiated supplier of high-fidelity, human-reviewed datasets tailored to the evolving needs of the generative AI ecosystem.

Access the research paper here

Access the DSD here

