I have an AI assistant which will be implemnted in a public area and i want to build a wakeup phrases detection model (like hi siri),
I have 3 wakeup phrases (hi eva,hello eva and salam eva) those should be considered as one class let’s call it class 0 (wakeup phrases) , the other class lets call it *not wake phrases* (people just randomly talking)
I generated audios of the 3 phrases with an AI generator to create audios of people saying those phrases in different accents, and collected a dataset containing people randomly talking for the second class.
I tried to build the model with balanced classes and it gave 100 accuracy and it’s not able to clearly classify the audio to wakeup phrases or not wakeup phrases when i pass a new audio to it! ( i think its due to overfitting) i increased the sample of the class of people randomly talking but still i have an overfitting problem
Now I collected more audios of the second class (not wakeup phrases) I need someone to modify the code, make it functioning correctly without overfitting
And deliver a full functional code within the specified duration

