В работе

384438 DSP work (note detection)

This is work for a iPhone app thats already on the market, and we are now looking to improve the note detection accuracy. Here are the project specs as they stand right now.

1. general specs

- the voice data being analyzed is read from a prerecorded file (i.e. it doesn't need to be analyzed in real time)

- the file format is Apple AIFF (will always be single singer/single channel. Currently being recorded at [url removed, login to view] though this can be changed)

- there are three functions being used (all three are quoted in full at the end of this email) - they start with the AIFF file and return an array of frequencies (one for each FFT window)

a. getData() - reads the AIFF file and writes amplitudes scaled to be between (-1.0 and 1.0) to the inData buffer. This buffer contains amplitudes with no processing applied to them at all

b. getFrequencies() - takes a pointer to the amplitude buffer as one of its arguments (the "inData" argument). Writes processed frequency values to the array pointed to by "outData". It currently just picks the partial with the highest amplitude

c. fft() - called by getFrequencies() within its "main processing loop"

- some variables are hard coded where they shouldn't be, just for convenience

2. voice fluctuation issue with users with poor singing technique

- to deal with this problem we are currently detecting at [url removed, login to view] with a window of 2048 and then using several relatively straightforward algorithms to obtain the most valid note from among groups of 4 or more consecutive frequencies returned by the three functions quoted above

- so one question we have is if you know of a better approach to this problem

3. setup and context

- we would send you the full XCode project file ready to compile. So you would just need a provisioning profile from Apple. If you don't have a developer account with Apple then we can provide you with a membership so you can obtain the necessary profile for testing.

Won't take anytime at all to get you set up with that.

4. performance

- starting with a 30sec recording from the AIFF file it currently takes approx. 6 seconds to read and analyze that data using the three functions mentioned [i.e. getData(), getFrequencies(), and fft() ] and the filtering code on an iPhone 3GS

- we could tolerate an analysis time of 15 secs for an equivalent recording

- for performance testing you would simply be comparing the efficiency of your code to that of the Apple AppStore-bought app. Anything matching that speed or exceeding it is fine

5. accuracy, time resolution, and range

- basically we would like to get as much accuracy and time resolution as is reasonably possible to achieve, within the performance boundaries mentioned above

- of the two, accuracy is more important than time resolution

- the range should cover the entire human concert voice (bass to soprano). We can skip a couple of the lowest notes if that's too taxing on the other parameters. If possible, we would like the upper range to go considerably higher than that (say to approx. 2100Hz)

6. APIs

- the only API requirement is that you don't use any private APIs in the form of a DLL, etc. (we need all code for this in the open, including the DFT/FFT code if used). Note that the FFT approach is not a requirement. If you know of a better approach in terms of accuracy and efficiency for voice then you're free to use that

7. the final product

- you can change the three base functions in any way you wish. Or write new ones if that would be faster. The only thing we would like is that the data flow be clearly the same (i.e. start with the AIFF file, and return an array of processed frequencies for the time resolution being used)

Let me know when your DSP engineers could get started & there quote for this.

Thanks so much,

Kevin Falk

/*--------------------------- functions -------------------------------*/

void SoundProcessor::getData(){

FILE *handle = fopen([[NSHomeDirectory() stringByAppendingPathComponent:(at symbol goes here)"/Library/Caches/[url removed, login to view]"] UTF8String], "rb");

if (handle != NULL){

UInt8 bytes[4];

// move file pointer to SSND chunk

fseek(handle, 4080, SEEK_SET);

// get size of the audio data in bytes

fseek(handle, 4, SEEK_CUR);

fread(bytes, sizeof(UInt8), 4, handle);

UInt32 nBytesData = uInt32FromByteArray(bytes) - 8;

// initialize data arrays

dataSize = (UInt32)round(nBytesData/2);

if (inData) // inData and outData are both instance variables

delete [] inData;

inData = new float[dataSize];

// size of output array should be equal to the number of windows

if (outData)

delete [] outData;

outData = new double[(int)ceil(dataSize/FFT_FrameSize)];

// get data offset from SSND ID

fread(bytes, 4, sizeof(UInt8), handle);

UInt32 dataOffset = uInt32FromByteArray(bytes);

// read data to memory

fseek(handle, 4096 + dataOffset, SEEK_SET); // total offset: 4080 + 16 + dataOffset

UInt8 sBytes[2];

for (int i=0; i < dataSize; i++){

fread(sBytes, 2, sizeof(UInt8), handle);

inData[i] = normFloatFromByteArray(sBytes);

}

fclose(handle);

}

}

int SoundProcessor::getFrequencies(long numSampsToProcess,

long fftFrameSize,

long osamp,

float sampleRate,

float *indata,

double *outData){

static float gInFIFO[FFT_FrameSize];

static float gFFTworksp[2*FFT_FrameSize];

static float gLastPhase[FFT_FrameSize/2+1];

static float gAnaFreq[FFT_FrameSize];

static float gAnaMagn[FFT_FrameSize];

static long gRover = false, gInit = false;

double magn, phase, tmp, window, real, imag;

double freqPerBin, expct;

long i,k, qpd, inFifoLatency, stepSize, fftFrameSize2;

int count = 0; // window counter

double maxAmplitude = 0.; // tracks the max amplitude for the samples (used externally)

// convenience vars

fftFrameSize2 = fftFrameSize/2;

stepSize = fftFrameSize/osamp;

freqPerBin = sampleRate/(double)fftFrameSize;

expct = 2.*M_PI*(double)stepSize/(double)fftFrameSize;

inFifoLatency = fftFrameSize-stepSize;

if (gRover == false) gRover = inFifoLatency;

// initialize static arrays

if (gInit == false) {

memset(gInFIFO, 0, FFT_FrameSize*sizeof(float));

memset(gFFTworksp, 0, 2*FFT_FrameSize*sizeof(float));

memset(gLastPhase, 0, (FFT_FrameSize/2+1)*sizeof(float));

memset(gAnaFreq, 0, FFT_FrameSize*sizeof(float));

memset(gAnaMagn, 0, FFT_FrameSize*sizeof(float));

gInit = true;

}

// main processing loop

for (i = 0; i < numSampsToProcess; i++){

// check if buffered data is sufficient

gInFIFO[gRover] = indata[i];

gRover++;

if (gRover >= fftFrameSize){ // buffer is ready

gRover = inFifoLatency;

// do windowing and real/imaginary interleave

for (k = 0; k < fftFrameSize;k++) {

window = -.5*cos(2.*M_PI*(double)k/(double)fftFrameSize)+.5;

gFFTworksp[2*k] = gInFIFO[k] * window;

gFFTworksp[2*k+1] = 0.;

}

// transform

fft(gFFTworksp, fftFrameSize, -1);

for (k = 0; k <= 128; k++) { // cuts off at frequencies over ~2700

// de-interlace FFT buffer

real = gFFTworksp[2*k];

imag = gFFTworksp[2*k+1];

// compute magnitude and phase

magn = 2.*sqrt(real*real + imag*imag);

phase = atan2(imag,real);

// compute phase difference

tmp = phase - gLastPhase[k];

gLastPhase[k] = phase;

// subtract expected phase difference

tmp -= (double)k*expct;

// map delta phase into +/- Pi interval

qpd = tmp/M_PI;

if (qpd >= 0) qpd += qpd&1;

else qpd -= qpd&1;

tmp -= M_PI*(double)qpd;

// get deviation from bin frequency from the +/- Pi interval

tmp = osamp*tmp/(2.*M_PI);

// compute the k-th partials' true frequency

tmp = (double)k*freqPerBin + tmp*freqPerBin;

// store magnitude and frequency in analysis arrays

gAnaMagn[k] = magn;

gAnaFreq[k] = tmp;

}

// get the fundamental frequency for the current frame/window

// (for now just get the frequency with max amplitude)

int j, maxIndex;

double magnitude;

for (j=0; j < 128; j++){

// ignore frequencies above 2100Hz

if (gAnaMagn[j] > magnitude && gAnaFreq[j] < 2100.0){

magnitude = gAnaMagn[j];

maxIndex = j;

}

}

if (magnitude > 0.3) // treat windows with max amplitude < 0.3 as silence

outData[count] = gAnaFreq[maxIndex];

else

outData[count] = 0.;

count++; // next window

// update the max amplitude recorded so far (convenience var for external code)

double exMax = 0.;

for (int m=0; m < 128; m++){

if (gAnaMagn[m] > exMax)

exMax = gAnaMagn[m];

}

if (exMax > maxAmplitude)

maxAmplitude = exMax;

}

}

if (maxAmplitude > 3.0)

return 3;

else

return 1;

}

void SoundProcessor::fft(float *fftBuffer, long fftFrameSize, long sign){

float wr, wi, arg, *p1, *p2, temp;

float tr, ti, ur, ui, *p1r, *p1i, *p2r, *p2i;

long i, bitm, j, le, le2, k;

long test1 = (long)(2*fftFrameSize-2),

test2 = (long)(log(fftFrameSize)/log(2.)+.5),

test3 = (long)(fftFrameSize/4);

for (i = 2; i < test1; i += 2) {

for (bitm = 2, j = 0; bitm < 2*fftFrameSize; bitm <<= 1) {

if (i & bitm) j++;

j <<= 1;

}

if (i < j) {

p1 = fftBuffer+i; p2 = fftBuffer+j;

temp = *p1; *(p1++) = *p2;

*(p2++) = temp; temp = *p1;

*p1 = *p2; *p2 = temp;

}

}

for (k = 0, le = 2; k < test2; k++) {

le <<= 1;

le2 = le>>1;

ur = 1.0;

ui = 0.0;

arg = M_PI / (le2>>1);

wr = cos(arg);

wi = sign*sin(arg);

if (le2 <= 512){

for (j = 0; j < le2; j += 2) {

p1r = fftBuffer+j; p1i = p1r+1;

p2r = p1r+le2; p2i = p2r+1;

for (i = j; i < test3; i += le) {

tr = *p2r * ur - *p2i * ui;

ti = *p2r * ui + *p2i * ur;

*p2r = *p1r - tr; *p2i = *p1i - ti;

*p1r += tr; *p1i += ti;

p1r += le; p1i += le;

p2r += le; p2i += le;

}

tr = ur*wr - ui*wi;

ui = ur*wi + ui*wr;

ur = tr;

}

}

}

}

Квалификация: Разное, Программирование на С, PHP

Показать больше xcode programming, xcode c programming, xcode 2.1, work voice, work values, use of algorithms in programming, update xcode, update data base by form & php, ui flow, ui developer profile, ti programming, the analysis of algorithms, testing algorithms, temp work, static programming, starting programming, samples of algorithms, programming with algorithms, programming variables, programming terms

О работодателе:
( 2 отзыв(-а, -ов) ) san diego,

ID проекта: #2130287