Deep Video Analysis

Repsense doesn’t just detect that a video exists. Built in Europe for the realities of modern information operations, the platform reads the video itself — speech, visual text, brands, products, personas, narrative, and sentiment — transforming every TikTok, Instagram Reel, YouTube Short, broadcast segment, and advertisement into structured, searchable intelligence.

Powered by

Gemini 2.5 Flash LLM with case-specific prompting · Applied to TikTok · Instagram Reels · YouTube Shorts · TV · Radio · Podcasts · Advertisements

The Gap

What Your Current Tool Sees vs. What Repsense Sees

A TikTok video reviews your product. Your monitoring tool shows you: the post caption, a hashtag, and a mention count. If it’s a newer platform, maybe a transcript of the spoken words.

Repsense shows you: every word spoken with timestamps. Every on-screen text overlay with timestamps. Every brand and product identified, with cumulative screen time. The persona speaking. The dominant colour palette. The emotional register. Whether the narrative reinforces or contradicts your communication framework. All structured, searchable, and filterable alongside every other piece of content in your monitoring.

This is the difference between knowing a video exists and understanding what it means.

What You Get From Every Video

Structured Intelligence, Not Just Transcription

Every video processed by Repsense’s deep analysis engine returns a structured output with six components. The output is standardised across all video types - TikTok, Reels, Shorts, broadcast, advertisement - so video intelligence integrates seamlessly with text-based monitoring in the same dashboard, the same filters, and the same reports.

Output Field

What It Contains

Title

Visual Text

AI-generated concise title summarising the video’s core message. Searchable and sortable alongside text article titles.


Summary

Transcription

Ratings

Insight-driven summary (75–125 words): core message, target audience, strategic positioning, emotional appeal. Written for analyst consumption, not raw transcript.


Every spoken word, timestamped to 0.1-second precision. Click any phrase to jump to that moment in the video. Searchable across all monitored content.


Every on-screen text element - overlays, captions, price tags, CTAs, disclaimers — timestamped. The discount code that flashed for two seconds. The competitor name on a comparison slide.


Metadata

Structured object detection across six categories: brands (with screen time), products (granular: "Corolla Hybrid 2025" not "car"), temptation/aspiration ("financial freedom," "social belonging"), offer type ("discount," "limited edition"), dominant colours, and named personas appearing.


AI-scored evaluation on five dimensions (1–10 scale): originality of concept, visual production quality, demographic diversity, compositional quality, and message clarity. Enables filtering by content quality, not just content existence.

All six components are indexed and searchable. Search "strawberry" and find every video that mentioned, showed, or featured strawberries - spoken, written on screen, or visible in frame. Search a competitor brand and find every video where that brand appeared, with cumulative screen time. Filter by narrative match to find only videos that impact your strategic positioning.

How It Works

Three Layers of Video Intelligence

Layer 1: Speech-to-Text

Audio content transcribed using speech-to-text technology with multi-language support. Applied to all audio/video content: TV shows, radio, podcasts, TikTok, YouTube, Instagram. The transcription layer alone matches or exceeds what competitors offer as their complete video capability. For Repsense, it’s the first layer of three.

Layer 2: Deep LLM Analysis

Upon configurable activation, video content undergoes deep analysis by a large language model (Gemini 2.5 Flash) with case-specific prompting. The LLM processes the video’s textual, spoken, and visual components simultaneously - reading what is said, what is shown, and what is written on screen - and outputs the structured six-component analysis described above. The prompt is tailored to each monitoring use case: advertisement monitoring, brand tracking, political analysis, or threat detection each receive a prompt engineered for their specific analytical requirements.

Layer 3: Voice Recognition & Speaker Identification

For broadcast monitoring, a third layer applies speaker diarisation: segmenting audio into individual speaker turns, classifying gender, and identifying named speakers. Up to 100 pre-registered voices are tracked by name; the top 1,000 most-frequent voices are automatically labelled. Face detection on video content separately tracks on-screen presence versus speaking time. This layer powers the broadcast compliance and political neutrality use case - who spoke, for how long, and in what context.

All three layers feed into the same platform.

Video intelligence appears alongside text-based mentions in unified dashboards, narrative clustering, competitive benchmarking, and AI analyst reports. No separate video tab. No siloed analytics. One intelligence picture.

Where Deep Video is Applied

Use Cases Across the Platform

  • TikTok videos, Instagram Reels, and YouTube Shorts deep-analysed for brand mentions, product placement, influencer content, and narrative impact. The Social Media Manager sees what’s inside the video, not just that it was posted.

  • TV, radio, and online advertisements decomposed into structured objects: brands, products, offers, personas, colours, and production quality ratings. Track competitive advertising across formats with consistent metadata.

  • Voice recognition and speaker identification applied to TV and radio for political neutrality, gender balance, and representation monitoring. Screen time tracked separately from speaking time. The capability deployed for LRT, the Lithuanian National Broadcaster.

  • Video-native disinformation and coordinated narrative operations analysed at the content level. What story is being told inside the video, how it connects to text-based operations, whether coordination patterns span video and text simultaneously.

  • Earned media in video tracked alongside text coverage. Influencer endorsements, product reviews, and brand mentions inside video content scored for impact and integrated into campaign reporting.

See Inside the Videos Your Tools Only Count

Request a demo to see deep video analysis applied to your monitoring use case — social intelligence, ad monitoring, broadcast compliance, or narrative threat detection.

See how deep video intelligence appears in reports