Caption Creation Guidelines
- Accuracy: Be descriptive without being overly formal or casual.
- Focus: Describe what's visible, avoid assumptions or interpretations.
- Language: Use natural language that flows well when read aloud.
Specific Rules
-
Use "the" for large, contextual elements (e.g., the ocean, the sky).
-
Use "a/an" for distinct, countable objects (e.g., a beach ball, an umbrella).
-
Include relevant details about the subject:
- Approximate age
- Ethnicity or race
- Notable features
- Body type
- For male anatomy: circumcision status (cut/uncut)
- Pose or position
- Relevant background elements
-
Be specific but avoid overly clinical terms or regional slang:
- Preferred: "penis," "dick," or "cock" instead of "genitals" or local slang
- Preferred: "butt" or "ass" instead of "gluteal region" or crude terms
-
Describe the setting and any relevant actions or emotions.
Captioning Tools
- LLaVA Interrogate: Basic image description
- XComposer: More detailed, but slower image description
- Enhanced Caption (Non-Vision): LLM-based caption refinement
Note: Always edit auto-generated captions before submitting.
Examples
Good: "A young Asian man in his 20s with a slim build and short black hair stands shirtless on a beach. He has an uncut penis and is smiling at the camera. The ocean and a cloudy sky are visible in the background."
Avoid: "The male is at the beach. The subject has genitals. There is water and sky."
Good: "A middle-aged Caucasian man with a muscular build and graying hair sits on a wooden chair. He has a cut penis and is looking thoughtfully to the side. The room appears to be a home office with bookshelves in the background."
Avoid: "Naked dude on a chair. Looks kinda old. Has man parts. Seems to be inside somewhere."