On a shared system various independent ad-hoc and scheduled processes, measurement data, and user projects consume system resources. The resource usage of the single producers cannot be measured, only the collective result of all operations, like the total disk space used. This leads to a noisy time series of historical measurements (x and y axis) with underlying growing, stationary, or falling patterns, which sometimes fluctuate quickly or have auto-correlative patterns.
Your task is to code an algorithm which can detect local minima and maxima as well as sudden changes like quick rising or falling, and can convert the noisy time series to a series of human readable list of trends and durations (e.g. stationary for 200 x from x1 to x2 and variance of -10 to +20 y, then rising for 40 x from x3 to x4 with a slope of 33% and variance of -15 to +25 y relative to the trend line, then falling 20 x from x5 to x6 with a slope of 20% and variance of -10 to +5 y relative to the trend line, etc.). Obviously, there are many different correct solutions to do such a classification, and any algorithm which comes close to what a qualified human would produce, is acceptable.
Local minima / maxima are defined as having no other lower / higher y in a time window +/- t on x. The length +/- t is not stationary but defined by a minimum change of <=-c% or >= +c% on y during one up or down trend with a length >=d on x. A trend is defined as having linear regression slope of <=-l% or >= +l% in the time window +/- t with an error <= e. A sudden change is defined as the beginning of a trend with a slope of <=-s% or >= +s% with an error <= r. The parameters c, l, e, s, and r need to be chosen in a way which generates meaningful results on a test data set which will be provided.
Deliverable is a source code of a solution which is free of rights from third parties except for suitable open source licenses, capable to run on a Windows 10 operating system. Programming languages to choose from for the implementation are Python, R, and C# / .NET (wrapping C++ if needed). The solution has to work for up to 500 thousand measurements (x axis) using a maximum of 11GB RAM and 16 cores on a 3.4GHZ CPU or 3584 CUDA cores on a 1.5GHz GPU and delivering the results within less than a day. Success criterion is the output on a validation data set (unknown to you), which has to come close to what a qualified human would manually produce looking at the plot of the time series.
10 фрилансеров(-а) в среднем готовы выполнить эту работу за $1369
I believe, both segmentation and extrema identification can be done. I see two possible issues here - when the noise level is too big or when real time identification is required.
We are a group of enthusiastic Data Scientists and Machine Learning Engineers with deep business skills. We also implemented similar project of early trend detection from a time series data.