Deep Video Analysis
Repsense doesn’t just detect that a video exists. Built in Europe for the realities of modern information operations, the platform reads the video itself — speech, visual text, brands, products, personas, narrative, and sentiment — transforming every TikTok, Instagram Reel, YouTube Short, broadcast segment, and advertisement into structured, searchable intelligence.
Powered by
Gemini 2.5 Flash LLM with case-specific prompting · Applied to TikTok · Instagram Reels · YouTube Shorts · TV · Radio · Podcasts · Advertisements
The Gap
What Your Current Tool Sees vs. What Repsense Sees
A TikTok video reviews your product. Your monitoring tool shows you: the post caption, a hashtag, and a mention count. If it’s a newer platform, maybe a transcript of the spoken words.
Repsense shows you: every word spoken with timestamps. Every on-screen text overlay with timestamps. Every brand and product identified, with cumulative screen time. The persona speaking. The dominant colour palette. The emotional register. Whether the narrative reinforces or contradicts your communication framework. All structured, searchable, and filterable alongside every other piece of content in your monitoring.
This is the difference between knowing a video exists and understanding what it means.
What You Get From Every Video
Structured Intelligence, Not Just Transcription
Every video processed by Repsense’s deep analysis engine returns a structured output with six components. The output is standardised across all video types - TikTok, Reels, Shorts, broadcast, advertisement - so video intelligence integrates seamlessly with text-based monitoring in the same dashboard, the same filters, and the same reports.
Output Field
What It Contains
Title
Visual Text
AI-generated concise title summarising the video’s core message. Searchable and sortable alongside text article titles.
Summary
Transcription
Ratings
Insight-driven summary (75–125 words): core message, target audience, strategic positioning, emotional appeal. Written for analyst consumption, not raw transcript.
Every spoken word, timestamped to 0.1-second precision. Click any phrase to jump to that moment in the video. Searchable across all monitored content.
Every on-screen text element - overlays, captions, price tags, CTAs, disclaimers — timestamped. The discount code that flashed for two seconds. The competitor name on a comparison slide.
Metadata
Structured object detection across six categories: brands (with screen time), products (granular: "Corolla Hybrid 2025" not "car"), temptation/aspiration ("financial freedom," "social belonging"), offer type ("discount," "limited edition"), dominant colours, and named personas appearing.
AI-scored evaluation on five dimensions (1–10 scale): originality of concept, visual production quality, demographic diversity, compositional quality, and message clarity. Enables filtering by content quality, not just content existence.
All six components are indexed and searchable. Search "strawberry" and find every video that mentioned, showed, or featured strawberries - spoken, written on screen, or visible in frame. Search a competitor brand and find every video where that brand appeared, with cumulative screen time. Filter by narrative match to find only videos that impact your strategic positioning.
How It Works
Three Layers of Video Intelligence
Layer 1: Speech-to-Text
Audio content transcribed using speech-to-text technology with multi-language support. Applied to all audio/video content: TV shows, radio, podcasts, TikTok, YouTube, Instagram. The transcription layer alone matches or exceeds what competitors offer as their complete video capability. For Repsense, it’s the first layer of three.
Layer 2: Deep LLM Analysis
Upon configurable activation, video content undergoes deep analysis by a large language model (Gemini 2.5 Flash) with case-specific prompting. The LLM processes the video’s textual, spoken, and visual components simultaneously - reading what is said, what is shown, and what is written on screen - and outputs the structured six-component analysis described above. The prompt is tailored to each monitoring use case: advertisement monitoring, brand tracking, political analysis, or threat detection each receive a prompt engineered for their specific analytical requirements.
Layer 3: Voice Recognition & Speaker Identification
For broadcast monitoring, a third layer applies speaker diarisation: segmenting audio into individual speaker turns, classifying gender, and identifying named speakers. Up to 100 pre-registered voices are tracked by name; the top 1,000 most-frequent voices are automatically labelled. Face detection on video content separately tracks on-screen presence versus speaking time. This layer powers the broadcast compliance and political neutrality use case - who spoke, for how long, and in what context.
All three layers feed into the same platform.
Video intelligence appears alongside text-based mentions in unified dashboards, narrative clustering, competitive benchmarking, and AI analyst reports. No separate video tab. No siloed analytics. One intelligence picture.
Where Deep Video is Applied
Use Cases Across the Platform
-
TikTok videos, Instagram Reels, and YouTube Shorts deep-analysed for brand mentions, product placement, influencer content, and narrative impact. The Social Media Manager sees what’s inside the video, not just that it was posted.
-
TV, radio, and online advertisements decomposed into structured objects: brands, products, offers, personas, colours, and production quality ratings. Track competitive advertising across formats with consistent metadata.
-
Voice recognition and speaker identification applied to TV and radio for political neutrality, gender balance, and representation monitoring. Screen time tracked separately from speaking time. The capability deployed for LRT, the Lithuanian National Broadcaster.
-
Video-native disinformation and coordinated narrative operations analysed at the content level. What story is being told inside the video, how it connects to text-based operations, whether coordination patterns span video and text simultaneously.
-
Earned media in video tracked alongside text coverage. Influencer endorsements, product reviews, and brand mentions inside video content scored for impact and integrated into campaign reporting.
See Inside the Videos Your Tools Only Count
Request a demo to see deep video analysis applied to your monitoring use case — social intelligence, ad monitoring, broadcast compliance, or narrative threat detection.
See how deep video intelligence appears in reports

