whisper & Gemini

Gemini

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
import google.generativeai as genai


genai.configure(api_key="这里填密钥",transport='rest')


'''
------list models,you can switch------
gemini-pro
gemini-pro-vision
--------------------------------------
'''

model = genai.GenerativeModel(model_name = "gemini-pro")


def gemini_chat(model):
chat = model.start_chat(history=[])


condition = True
print("prompt:press \"n\" to start a new conversion,and \"q\"to exit")
while condition:
input_content = str(input("you: "))

if input_content == "n":
condition = False
gemini_chat(model)
elif input_content == "q":
condition = False
else:
try:

response = chat.send_message(input_content)
print("--------------------------------------------")

print(f"gemini:{response.text}")

print("--------------------------------------------")

except:
print("prompt:error in your input")
continue


if __name__ == "__main__":
gemini_chat(model)

OpenAI 的言语模型

最近发现了ggmlggml这个基于transformertransformer的语音模型,它是ChatGPT的亲兄弟,知名度却没有ChatGPT那么高

它非常强大,可以根据音频生成字幕,准确度非常高.

模型地址

不仅支持多国语言还能实时翻译.

whisper是官方的命令行工具,有一个开源项目给它制作了GUI

使用

在安装软件之前,需要先下载训练好的语音模型,模型下载

选择ggmlmedium.binggml-medium.bin 并下载

GUI

下载whisper_desktop

打开后选择已经下好的模型并调整GPU配置就可以使用了.

注意文件得是mpeg格式

实测

<<小森林>>夏秋篇<<小森林>>夏秋篇为例

处理完成后便得到了带时间轴的字幕,已经被翻译好了

1
2
3
4
5
6
7
8
9
[00:01:02.000 --> 00:01:08.000]  Komori is a small village in a village in the Tohoku region.
[00:01:08.000 --> 00:01:14.000] There are no shops, and if you go shopping, you can get to the center of the village with a place to stay.
[00:01:14.000 --> 00:01:18.000] There are several small supermarkets and shops in the village.
[00:01:18.000 --> 00:01:24.000] The train is almost down, so it takes about 30 minutes by bicycle.
[00:01:24.000 --> 00:01:26.000] How long will it take to get home?
[00:01:26.000 --> 00:01:30.000] Winter is a time for snow.
[00:01:30.000 --> 00:01:34.000] It takes about an hour and a half to get there.
[00:01:34.000 --> 00:01:40.000] Most people go shopping to the big supermarkets in the neighborhood.
[00:01:40.000 --> 00:01:46.000] If I go there, it will take almost a day.

再试一首歌,是milet的bliss,这次不翻译

1
2
3
4
5
6
7
8
9
10
11
12
13
[00:00:00.000 --> 00:00:06.500]  (♪~ 音楽)
[00:00:06.500 --> 00:00:13.340] 私を守るこの名前を
[00:00:13.340 --> 00:00:20.680] 優しく呼んだその声が
[00:00:20.680 --> 00:00:27.920] 一人寂しく目を閉じるよ
[00:00:27.920 --> 00:00:35.100] あなたを包んで
[00:00:35.100 --> 00:00:42.300] 柔らかなその手
[00:00:42.300 --> 00:00:51.880] 忘れるよ 無数の夜を越えて
[00:00:51.880 --> 00:01:02.680] 離れに散って あなたのもとへ帰るでしょう
[00:01:02.680 --> 00:01:10.320] 夢に押さえて 闇を解いて
[00:01:10.320 --> 00:01:18.500] 夢の中は 色立ち
[00:01:18.500 --> 00:01:25.480] 青く続く道を
[00:01:27.480 --> 00:01:32.480] (♪~ 音楽)

效果很不错

上面两个用的是中等大小的模型,实测小模型效果也不错


whisper & Gemini
https://silenzio111.github.io/2023/12/13/ai/
作者
silenzio
发布于
2023年12月13日
许可协议