Закрыт

fine-tuning the XLSR-Wav2vec 2.0 pre-trained model for the Turkish language and Hungarian language

[login to view URL]%97_Transformers.ipynb#scrollTo=LBSYoWbi-45k

This script can be used for Turkish, but a few changes and visualizations here would be better and model output and script should be able to upload my drive.

facebook/wav2vec2-large-xlsr-53 will be pre-trained model.

• Mozilla Common Voice dataset should be used to train the models

• The models must be trained using wav2vec2 architecture [login to view URL]

2 pre-trained models are enough to train:

o wav2vec2-xlsr-53

([login to view URL])

o wav2vec2-xls-r-300m ([login to view URL])

3) Please pay extra attention to this subsection:

You should follow this script:

[login to view URL]

Inside this script, database installation and model trainings are given in detailed way.

Inside script, database is installed in this part:

3.1) Here instead of “common_voice” dataset you should write

“mozilla-foundation/common_voice_9_0” or other versions (7,8)

All other cleaning and pre-processing steps should be the same as in script.

3.2) And here in this script you can deifne pre-trained model that you want to fine-tune

In the above picture “facebook/wav2vec2-large-xlsr-53” pre-trained model is given.

3.3) After you finish the training, last thing you need to do is to boost the final models with n-gram language model (either 4 or 5). Here is the script for it:

[login to view URL]

This script is intended for Swedish language. For Turkish language you can use Turkish Wikipedia dump. You can find link below:

[login to view URL]

You will follow the given script, but you need to use the given Turkish data above. This is the part you need to change

Or you can generate .arpa file by using this extractor directly:

[login to view URL]

To sum up, you need to run the given colab script and boost the final models with n-gram language model.

This is all about experiments.

4) At the end, you need to write results of the trained models, compare them against each other by using charts, graphs, or tables.

The models should be evaluated on 4 metrics:

word error rate (WER)

character error rate (CER).

RTF= time needed for recognizing the full test set / total length of the full test set

memory requirement = peak GPU memory load (during test)

Additionally compare the final Turkish language models with Hungarian models (minimum 2 comparative graphs). you ’need to train the model for Hungarian. I provide already trained ones below:

[login to view URL]

[login to view URL]

check this for getting dataset for Hungarian n-gram (and also helpful script)

Навыки: Python, Deep Learning, Machine Learning (ML), Data Visualization, Обработка данных

О клиенте:
( 1 отзыв ) Budapest, Hungary

ID проекта: #33751920

7 фрилансеров(-а) готовы выполнить эту работу в среднем за $168

KolaPeters

I have done similar projects to this, please send me a message right away let's get started. I'm a senior engineer with rich experience in Python, Data Processing, Machine Learning (ML), Data Visualization, Deep Learn Больше

$160 USD за 5 дней(-я)
(7 отзывов(-а))
4.5
mmshafiei

I am familiar with the wav2vec2 model and its applications. I'm really interested in the project and I am bidding with the least amount to start working on it. Kindly start the chat to discuss about the project.

$140 USD за 7 дней(-я)
(2 отзывов(-а))
3.6
solonenkooleksa4

Hi @hmdv002. I read your document and saw all links. I have a experience about your project. I am a senior Python programmer with 5+ years of extensive experience. You can read my reviews to check me. I read your job Больше

$200 USD за 7 дней(-я)
(2 отзывов(-а))
3.2
(1 отзыв)
3.1
Work12345x

Hi, I am a very talented software programmer with 13+ years of development experience (5+ years professional work experience). I am a results-oriented professional and possess experience using cutting-edge development Больше

$140 USD за 3 дней(-я)
(4 отзывов(-а))
2.3
aniskhanvw

Hi, I have been an academic at a top-ranked engineering university, since 2013. Currently, I am on a sabbatical, residing in the UK, as a stay-home-dad. I have adequate knowledge of the breadth of ML algorithms with a Больше

$200 USD за 7 дней(-я)
(1 отзыв)
1.6
(1 отзыв)
1.4