Function for Calculating Z-scores for data in a Numpy Array
As a preliminary step in data analysis, certain types of data are transformed to ensure “good behavior” and “compatibility with other data.” One such transformation is the Z-score. Two series that has been transformed using the Z-score are more easily compared: [login to view URL]
[login to view URL]
In this project, you calculate Z-scores. You will be using your knowledge of Numpy. As well, you learn about the nuances of constructing a function. Also, if you check the links above, you will learn a useful concept in statistics.
The basic requirement is that you produce a function that takes a numpy array as input and outputs an array of the same shape in which the data is transformed through Z-scores. This means that if the array is (5,2), then for each of the two columns, 5 values are used to calculate the mean and sigma and for each of the values in the column, you calculate: (value – mean)/sigma.
A google search “z-score numpy array” will give you plenty of ideas about implementation and you could achieve an efficient implementation with just one line of code! So let me throw in an enhanced requirement: the user also inputs a scalar value (the second input) to indicate the desired column for transformation. Thus, only the specified column is transformed and the other(s) are left alone. For example, if the user inputs a (10,4) array and inputs a column-indicating scalar with a value of 3, this means that only the third column requires transformation (the three remaining columns are left untouched). If the user does not input this scalar, the default is that all columns are transformed. (Hint: a low-level familiarity with Linear Algebra can quickly guide you to an elegant answer here; you can use 1D arrays of ones for sigma and zeros for means and modify them appropriately to complete your calculation. To elaborate, if you use mean = 0 and sigma = 1, the transformation does not change value. Alternatively, use loops and brute force! Whatever works!)
Finally, I want you to implement checks on whether the inputs make sense. For example, is the first input a numpy array (print error message if it is something else) and is the second input appropriate (print error message if scalar does not correspond to a column number)?
You are required to code in an ipython notebook and create a pdf. Show your function in one cell – call the function myz. In another cell, create inputs using the following code:
x1 = [login to view URL]([[4,3,12],[1,5,20],[1,2,3],[10,20,40],[7,2,44]])
x2 = 3
x3 = 6
Run the function four times – myz(x1), myz(x1,x2), myz(x1,x3), and myz(x2,x3) – and generate results. In the next cell, please bullet-list things you learned by doing this project. Print to pdf and submit pdf.