VLC player demos real-time AI subtitling for videos

Otter@lemmy.ca · edit-2 24 hours ago

VLC player demos real-time AI subtitling for videos

Doorbook@lemmy.world · 2 hours ago

The nice thing is, now at least this can be used with live tv from other countries and languages.

Think you want to watch Japanese tv or Korean channels with out bothering about downloading, searching and syncing subtitles

sugar_in_your_tea@sh.itjust.works · 2 hours ago

I prefer watching Mexican football announcers, and it would be nice to know what they’re saying. Though that might actually detract from the experience.

InFerNo@lemmy.ml · 1 minute ago

GOOOOOOAAAAAAAAALLLLLLLLLL

billwashere@lemmy.world · 3 hours ago

This might be one of the few times I’ve seen AI being useful and not just slapped on something for marketing purposes.

PalmTreeIsBestTree@lemmy.world · 2 hours ago

And not to do evil shit

squid_slime@lemm.ee · 2 hours ago

When we getting amd’s fsr upscaling and frame-gen? Also would subtitles make more sense to use the jellyfin approach.

Clot@lemm.ee · 7 hours ago

Will it be possible to export these AI subs?

Scrollone@feddit.it · 5 hours ago

Imagine the possibilities!

Nalivai@lemmy.world · 8 hours ago

The technology is nowhere near being good though. On synthetic tests, on the data it was trained and tweeked on, maybe, I don’t know.
I corun an event when we invite speakers from all over the world, and we tried every way to generate subtitles, all of them run on the level of YouTube autogenerated ones. It’s better than nothing, but you can’t rely on it really.

Scrollone@feddit.it · 5 hours ago

No, but I think it would be super helpful to synchronize subtitles that are not aligned to the video.

Petter1@lemm.ee · 5 hours ago

You were not able to test it yet calling it nowhere near good 🤦🏻

Like how should you know?!

Nalivai@lemmy.world · edit-2 3 hours ago

Relax, they didn’t write a new way of doing magic, they integrated a solution from the market.
I don’t know what the new BMW car they introduce this year is capable of, but I know for a fact it can’t fly.

SloppyPuppy@lemmy.world · 8 hours ago

Hes maaaa, hes what? Hes maaaaaa!

renzev@lemmy.world · 18 hours ago

This sounds like a great thing for deaf people and just in general, but I don’t think AI will ever replace anime fansub makers who have no problem throwing a wall of text on screen for a split second just to explain an obscure untranslatable pun.

Appoxo@lemmy.dbzer0.com · 6 hours ago

That still happens? Maybe wanna share your groups? ;)

rustyricotta@lemmy.ml · 14 hours ago

Bless those subbers. I love those walls of text.

cley_faye@lemmy.world · 10 hours ago

It’s unlikely to even replace good subtitles, fan or not. It’s just a nice thing to have for a lot of content though.

boonhet@lemm.ee · edit-2 4 hours ago

I have family members who can’t really understand spoken English because it’s a bit fast, and can’t read English subtitles again, because again, too fast for them.

Sometimes you download a movie and all the Estonian subtitles are for an older release and they desynchronize. Sometimes you can barely even find synchronized English subtitles, so even that doesn’t work.

This seems like a godsend, honestly.

Funnily enough, of all the streaming services, I’m again going to have to commend Apple TV+ here. Their shit has Estonian subtitles. Netflix, Prime, etc, do not. Meaning if I’m watching with a family member who doesn’t understand English well, I’ll watch Apple TV+ with a subscription, and everything else is going to be pirated for subtitles. So I don’t bother subscribing anymore. We’re a tiny country, but for some reason Apple of all companies has chosen to acknowledge us. Meanwhile, I was setting up an Xbox for someone a few years ago, and Estonia just… straight up doesn’t exist. I’m not talking about language support - you literally couldn’t pick it as your LOCATION.

brbposting@sh.itjust.works · 2 hours ago

For all their faults, Apple knows accessibility. Good job Timmy.

FordBeeblebrox@lemmy.world · 13 hours ago

They are like the * in any Terry Pratchett (GNU) novel, sometimes a funny joke can have a little more spice added to make it even funnier

FMT99@lemmy.world · 16 hours ago

Translator’s note: keikaku means plan

oldfart@lemm.ee · 9 hours ago

No such comment yet? I’ll be the first then.

Oh no, AI bad, next thing they add is cryptocurrency mining!

katy ✨ · 8 hours ago

ai for accessibility is nowhere near the same thing as crypto mining

Petter1@lemm.ee · 5 hours ago

Lol, that is his opinion as well, this was sarcasm, i’m 99% sure Yours does not seem so sarcastic…

oldfart@lemm.ee · 6 hours ago

But it’s burning amazon forests for capitalist greed!

/s

Mr_Blott@feddit.uk · 9 hours ago

I will be impressed only when it can get through a single episode of Still Game without making a dozen mistakes

m8052@lemmy.world · 21 hours ago

What’s important is that this is running on your machine locally, offline, without any cloud services. It runs directly inside the executable

YES, thank you JB

Petter1@lemm.ee · 5 hours ago

Justin Bieber?

brbposting@sh.itjust.works · 2 hours ago

Ah JBK of course!

T4V0@lemmy.pt · 4 hours ago

Jack Black?

maccentric@sh.itjust.works · 3 hours ago

James Brown?

asbestos@lemmy.world · 23 hours ago

Finally, some good fucking AI

Petter1@lemm.ee · 5 hours ago

Finally some good AI fucking 🤭

shyguyblue@lemmy.world · 23 hours ago

I was just thinking, this is exactly what AI should be used for. Pattern recognition, full stop.

snooggums@lemmy.world · 23 hours ago

Yup, and if it isn’t perfect that is ok as long as it is close enough.

Like getting name spellings wrong or mixing homophones is fine because it isn’t trying to be factually accurate.

vvv@programming.dev · 18 hours ago

I’d like to see this fix the most annoying part about subtitles, timing. find transcript/any subs on the Internet and have the AI align it with the audio properly.

Scrollone@feddit.it · 5 hours ago

YES! I can’t stand when subtitles are misaligned to the video. If this AI tool could help with that, it would be super useful.

TJA!@sh.itjust.works · 22 hours ago

Problem ist that now people will say that they don’t get to create accurate subtitles because VLC is doing the job for them.

Accessibility might suffer from that, because all subtitles are now just “good enough”

snooggums@lemmy.world · 17 hours ago

Regular old live broadcast closed captioning is pretty much ‘good enough’ and that is the standard I’m comparing to.

Actual subtitles created ahead of time should be perfect because they have the time to double check.

Railcar8095@lemm.ee · 21 hours ago

Or they can get OK ones with this tool, and fix the errors. Might save a lot of time

LandedGentry@lemmy.zip · 21 hours ago

Honestly though? If your audio is even half decent you’ll get like 95% accuracy. Considering a lot of media just wouldn’t have anything, that is a pretty fair trade off to me

TheMachineStops@discuss.tchncs.de · edit-2 17 hours ago

From experience AI translation is still garbage, specially for languages like Chinese, Japanese, and Korean , but if it only subtitles in the actual language such creating English subtitles for English then it is probably fine.

catloaf@lemm.ee · 14 hours ago

That’s probably more due to lack of training than anything else. Existing models are mostly made by American companies and trained on English-language material. Naturally, the further you get from the model, the worse the result.

TheMachineStops@discuss.tchncs.de · 13 hours ago

It is not the lack of training material that is the issue, it doesn’t understand context and cultural references. Someone commented here that crunchyroll AI subtitles translated Asura Hall a name to asshole.

LandedGentry@lemmy.zip · 15 hours ago

English it’s been great for me yes

shyguyblue@lemmy.world · edit-2 16 hours ago

I imagine it would be not-exactly-simple-but-not- complicated to add a “threshold” feature. If Ai is less than X% certain, it can request human clarification.

Edit: Derp. I forgot about the “real time” part. Still, as others have said, even a single botched word would still work well enough with context.

snooggums@lemmy.world · edit-2 17 hours ago

That defeats the purpose of doing it in real time as it would introduce a delay.

shyguyblue@lemmy.world · 16 hours ago

Derp. You’re right, I’ve added an edit to my comment.

LandedGentry@lemmy.zip · edit-2 21 hours ago

Yeah it’s pretty wonderful To see how far auto generated transcription/captioning has become over the last couple of years. A wonderful victory for many communities with various disabilities.

TheRealKuni@lemmy.world · 19 hours ago

And yet they turned down having thumbnails for seeking because it would be too resource intensive. 😐

DreamlandLividity@lemmy.world · 11 hours ago

I mean, it would. For example Jellyfin implements it, but it does so by extracting the pictures ahead of time and saving them. It takes days to do this for my library.

cley_faye@lemmy.world · 10 hours ago

Video decoding is resource intensive. We’re used to it, we have hardware acceleration for some of it, but spewing something around 52 million pixels every second from a highly compressed data source is not cheap. I’m not sure how both compare, but small LLM models are not that costly to run if you don’t factor their creation in.

serenissi@lemmy.world · 14 hours ago

It is useful for internet streams though, not really for local or lan video.

m-p{3}@lemmy.ca · edit-2 22 hours ago

Now I want some AR glasses that display subtitles above someone’s head when they talk à la Cyberpunk that also auto-translates. Of course, it has to be done entirely locally.

Obi@sopuli.xyz · 20 hours ago

I guess we have most of the ingredients to make this happen. Software-wise we’re there, hardware wise I’m still waiting for AR glasses I can replace my normal glasses with (that I wear 24/7 except for sleep). I’d accept having to carry a spare in a charging case so I swap them out once a day or something but other than that I want them to be close enough in terms of weight and comfort to my regular glasses and just give me AR like overlaid GPS, notifications, etc, and indeed instant translation with subtitles would be a function that I could see having a massive impact on civilization tbh.

Midnight Wolf@lemmy.world · 9 hours ago

soon

Breaking news: “WW3 starts over an insult due to a mistranslated phrase at the G7 summit. We will be nuked in 37 seconds. Fuck like rabbits, it’s all we can do. Now over to Robert with traffic.”

AlligatorBlizzard@sh.itjust.works · 11 hours ago

It’d be incredible for deaf people being able to read captions for spoken conversations and to have the other person’s glasses translate from ASL to English.

Honestly I’d be a bit shocked if the AI ASL -> English doesn’t exist already, there’s so much training data available, the Deaf community loves video for obvious reasons.

vvv@programming.dev · 18 hours ago

I think we’re closer with hardware than software. the xreal/rokid category of hmds are comfortable enough to wear all day, and I don’t mind a cable running from behind my ear under a clothes layer to a phone or mini PC in my pocket. Unfortunately you still need to byo cameras to get the overlays appearing in the correct points in space, but cameras are cheap, I suspect these glasses will grow some cameras in the next couple of iterations.

m-p{3}@lemmy.ca · 20 hours ago

I believe you can put prescription lenses in most AR glasses out there, but I suppose the battery is a concern…

I’m in the same boat, I gotta wear my glasses 24/7.

VerPoilu@sopuli.xyz · 23 hours ago

I hope Mozilla can benefit of a good local translation engine that could come out of it as well.

m-p{3}@lemmy.ca · edit-2 22 hours ago

They technically already do with Project Bergamot.

VerPoilu@sopuli.xyz · 20 hours ago

I know they do, but it’s lacking so many languages.

SuperCub@sh.itjust.works · 22 hours ago

Haven’t watched the video yet, but it makes a lot of sense that you could train an AI using already subtitled movies and their audio. There are times when official subtitles paraphrase the speech to make it easier to read quickly, so I wonder how that would work. There’s also just a lot of voice recognition everywhere nowadays, so maybe that’s all they need?