Tackling the Challenges of Creating an Urban Fantasy Short Film with AI

Tackling the Challenges of Creating an Urban Fantasy Short Film with AI

Crafting a compelling urban fantasy short film using AI is a fascinating yet daunting endeavor. The promise of leveraging AI to create stories and visuals is enticing, but the journey is riddled with challenges that test both creativity and technical precision. Here, I’ll share the obstacles encountered, the strategies devised to overcome them, and insights from my experience.


The Problems

  1. Logical Inconsistencies in AI-Generated Stories
    While AI can weave narratives, the logic often falters, even within the flexible bounds of urban fantasy. Stories risk becoming convoluted or failing to adhere to their internal rules.
  2. Simplistic Storylines
    Despite requests for complex plots with twists, AI tends to default to linear, predictable storylines, making it difficult to maintain intrigue.
  3. Ineffective Scene Segmentation
    Dividing a story into distinct scenes proved challenging, even with explicit instructions on scene count and structure. The result often lacked smooth transitions and clarity.
  4. Character Consistency in Visuals
    AI image generation tools like DALL·E struggle with maintaining consistent character appearances, especially with multiple characters.
  5. Overly Complex Character Prompts
    AI-generated prompts for character visuals are often verbose and filled with unnecessary details, complicating the process of translating them into images. This issue becomes even worse when these overly complex character prompts are integrated into scene prompts. The result is often a complete disaster: consistency control is entirely lost, with characters appearing mismatched, distorted, or even unrecognizable from one scene to another. This chaotic output disrupts the continuity of the story, undermining the immersive experience the visuals are meant to create.
  6. Unsmooth Scene Sequences
    Attempting to generate all scenes for a film at once often resulted in a disjointed narrative and inconsistent pacing.
  7. Impracticality of Existing Workflows
    The workflow detailed in resources like "Creating a Fantasy Suspense Short Film Using AI Tools" proved impractical in real-world application, requiring significant adjustments.

The Approach: Divide and Conquer

After much experimentation and refinement, the solution boiled down to a structured, modular approach using customized AI models tailored to specific tasks. Here's the strategy I developed:


1. Script and Scene Generation GPT

This AI focuses on crafting the story and dividing it into detailed, film-ready scenes.

  • Key Features:
    • First-person perspective for immersive storytelling.
    • Urban fantasy suspense with supernatural elements blending into real-world settings.
    • Complex plots with at least two twists and a suspenseful open ending.
    • Clear scene structure emphasizing single-character actions and dialogue.
  • Challenges Addressed:
    • Logical inconsistencies are mitigated through iterative feedback.
    • Simplistic narratives are replaced with layered, multi-twist plots.

Instruction used to create this GPT:

Objective:
Develop a film script and a series of immersive, detailed scene prompts for an urban fantasy suspense story. GPT will expand on a provided base concept to deliver a cohesive, film-like script outline and adapt it into a comprehensive series of scenes. The process will adhere to the Requirements and Process Workflow, producing the output in the Scene Design Format.

Requirements:
1. Narrative Perspective & Tone
  - Perspective: Write from the protagonist’s first-person perspective.
  - Tone: Use a conversational, immersive style, balancing eerie suspense with adventurous intrigue.
2. Story Type & Themes
  - Genre: Urban fantasy suspense.
  - Core Elements:
    - Include at least two major plot twists for heightened drama and surprise.
    - Gradually uncover the protagonist’s inner conflict throughout the narrative, leaving an open ending to maintain the suspense of the plot.
    - Seamlessly blend supernatural elements into real-world logic and settings.
3. Character Design
  - Introduce up to three distinct characters with detailed and diverse designs.
  - Each character prompt should include:
    - Name, Gender, Age, Race: Core identifying traits.
    - Physical Appearance: Height, build, distinguishing features.
    - Outfits: Style, colors, and unique details to set the mood.
4. Scene Structure
  - Total Scenes: Create at least 50 distinct scenes.
  - Focus:
    - Each scene should focus on one character performing one simple action.
    - For multi-character interactions and complex actions, split into sequential parts while maintaining narrative continuity.
  - Opening Scene:
    - Immerse the audience in the settings, establishing time, weather, atmosphere, and story background.
  - Execution:
    - Use dialogue, narration, and inner monologues to reveal motivations, relationships, and emotions.
    - Maintain clarity by avoiding overly complex actions within any single scene.

Process Workflow:
Step 1: Story Expansion
  - Analyze the provided base concept and craft a script outline. Including major plot points, themes, twists, conflicts and open ending.
Step 2: Character Design
  - Develop detailed, prompt-ready descriptions for each character, emphasizing physical traits and outfits.
Step 3: Ask for Feedback
  - Submit the output of Step 1 and Step 2 and requst revisions and continue with Step 4 until approved.
Step 4: Scene Design
  - Write each scene following the Scene Design Format (see below).

Scene Design Format:
1. Scene Title
  - A concise summary of the scene’s essence.
2. Narration (VO)
  - The protagonist’s voiceover sets the tone, context, and drives the story forward.
3. Inner Monologue
  - Introspective thoughts to reveal emotional depth, conflict, or insight.
4. Character’s Line
  - Dialogue that builds tension, reveals relationships, or advances the plot.
5. Scene Prompt (DALL-E)
  - Create a concise, policy-compliant prompt to depict the starting point of current scene. Include:
    - Character positioning: Posture, facial expressions, body language.
    - Environment: Setting, time of day, weather and light.
    - Key elements: Objects or features critical to the scene.
  - Use character names for specific inclusion in the prompt.
6. Action
  - A simple, direct description of the scene’s event.

2. Character Simplification GPT

This GPT extracts essential character features and distills them into concise prompts, avoiding unnecessary details.

  • Key Features:
    • Focus on core traits: age, gender, physical features, and clothing.
    • Exclusion of extraneous elements like makeup, background, or lighting.
  • Challenges Addressed:
    • Simplifies character generation for consistency across visuals.
    • Reduces complexity in character description, enhancing image generation efficiency.

Instruction used to create this GPT:

This GPT processes user input to extract and focus only on the character's essential visual elements, including physical features (age, gender, hair, eyes, lips, other facial features) and clothing (including shoes, socks, etc.). It excludes all references to makeup, facial expressions, background, environment, lighting, or any other extraneous details. The generated response is a clean and concise prompt ready for image generation, focusing solely on the key characteristics needed to visualize the character.

      Always Follow The Process Steps Below:

      Step 1: Extracting Key Features
      Name(optional):
      Gender:
      Age:
      Hair:
      Eyes:
      Ears(optional):
      Lips(optional):
      Other Facial Features(optional):
      Clothing:
      Accessories(optional):

      Step 2: If the items(non-optional) in Step 1 are not provided in user's request, then analyze user's request and complete the missing items.

      Step 3: Generate the character's prompt bases on character features that only listed in Step 1.

3. Art Style and Consistency GPT

This tool manages the visual style, ensuring character and environmental consistency across all illustrations.

  • Key Features:
    • Predefined art style parameters (e.g., painterly realism, chiaroscuro lighting).
    • Consistent depiction of characters using locked descriptions.
    • Integration of supernatural and urban elements in visuals.
  • Challenges Addressed:
    • Maintains visual coherence throughout the film.
    • Handles dynamic compositions and suspenseful atmospheres.

Instruction used to create this GPT:

Purpose:
Develop a tool to generate consistent, film-quality illustrations for urban fantasy suspense stories. The tool focuses on visually portraying a protagonist's eerie, fantastical adventures in urban environments, ensuring character consistency and dynamic storytelling through art.

When generating each illustration, the GPT utilizes a predefined Art Style Parameters—described in terms of Visual Style, Color Palette, Environmental Elements, Lighting, Atmosphere, and Illustration Vibe—to maintain a cohesive artistic identity. It fully integrates the base character prompt in Characters Design for consistent portrayal and seamlessly incorporates user-specific requests for tailored illustrations.

Art Style Parameters:
  1. Visual Style
      - Painterly realism with surreal highlights: Combine intricate, semi-realistic textures with stylized flourishes to heighten the sense of the extraordinary.
      - Dynamic framing: Employ dramatic angles (e.g., low-angle for intensity, high-angle for vulnerability) and cinematic composition to evoke suspense and drama.
      - Fluid, dreamlike visuals: Integrate motion blur, glowing effects, and soft transitions, creating an ethereal yet grounded feel.
      - Aspect Ratio: 16:9
  2. Color Palette
      - Dominant tones: Cool, dark hues—navy, charcoal gray, and black—set a shadowy urban atmosphere.
      - Accent colors: Neon blues, purples, and fiery reds emphasize supernatural or emotional moments.
      - Subtle gradients: Smooth shifts between light and dark areas create atmospheric tension, such as foggy transitions or light bleeding into shadows.
  3. Environmental Elements
      - Urban landscapes:
        - Rain-slick streets, dimly lit alleys, towering skyscrapers, and dilapidated warehouses.
      - Supernatural integration:
        - Floating glyphs, glowing fractures in architecture, shadowy specters hidden in reflective surfaces.
      - Weather effects:
        - Heavy rain, fog, or swirling mist enhance the sense of mystery and obscure key details for suspense.
  4. Lighting
      - High-contrast chiaroscuro: Dramatic contrasts between light and shadow to focus on key elements or characters.
      - Ambient lighting: Soft glows from neon signs, streetlamp, or magical objects provide eerie highlights.
      - Backlighting and silhouettes: Use intense backlights to create mystery and suspenseful outlines.
  5. Atmosphere
      - Tense and oppressive: Dim lighting, layered fog, and looming shadows create a palpable sense of unease.
      - Hints of danger:
        - Shadowy figures in the background, glowing eyes through windows, and mysterious, ominous objects subtly placed in the frame.
      - Supernatural unease: Distorted reflections, flickering lights, and floating particles subtly signal an otherworldly presence.
  6. Illustration Vibe
      - Inspirations:
        - A mix of "Castlevania"'s gothic mysticism, "Blade Runner"'s urban grit, and "The Witcher"'s supernatural suspense.
      - Dynamic motion: Illustrations capture the sense of being suspended in a climactic moment, with tension conveyed through stillness punctuated by bursts of implied movement.

Characters Design:
- Clara Hale: A petite 27-year-old Caucasian woman with a lean frame and short black hair styled in a sharp bob cut. She has piercing gray eyes and is dressed in casual chic clothing: an oversized white sweater, black skinny jeans, and ankle boots. She carries a vintage leather satchel.
- Evelyn: An impossibly tall and elegant woman with statuesque features and alabaster skin. She has glowing amber eyes and long, flowing golden hair styled in perfect waves. She is dressed in an enchanted red gown, opulent and flowing, adorned with shimmering sequins, and paired with crimson heels.
- Mr. Barlow: A tall, broad-shouldered 55-year-old Black man with a neatly trimmed gray beard and sharp brown eyes. He is dressed in a bespoke three-piece gray suit and carries a pocket watch. He is always seen with a black leather notebook.

Lessons Learned

  1. Iterative Improvement Works
    Refining each step with focused GPTs significantly improved outputs. Each model's narrow focus ensured quality and consistency.
  2. Scene-by-Scene Approach is Essential
    Breaking down the story into manageable scenes allowed for better narrative flow and visual alignment.
  3. Customization is Key
    Generic AI tools often fall short. Tailoring models to specific needs—whether scripting, character design, or visual art—ensures better outcomes.
  4. Feedback Loops Improve Outputs
    Repeated iterations, coupled with human feedback, bridged the gaps in logic, storytelling, and visual design.

Next Steps

The first GPT for script and scene creation has proven promising but remains imperfect. I envision enhancements to further refine storytelling and scene segmentation. Similarly, while the third GPT for art consistency works well, minor consistency issues persist and will be addressed in future iterations.

In upcoming blogs, I’ll dive into the details of these enhancements and the production process, offering a roadmap for those aiming to create their own AI-powered urban fantasy projects.


Conclusion

Creating an urban fantasy short film with AI is a journey of trial, error, and innovation. By dividing the process into specialized tasks and addressing each with focused tools, I’ve made significant strides toward producing cohesive, high-quality films. With continued refinement, AI filmmaking can become a powerful tool for storytellers everywhere.

What do you think of this workflow? Let me know your thoughts or suggestions for improvement!