Data Acquisition: Find or collect your data set of interest. There are many sources on the web for data sets. I would prefer the data to be of a reasonably large size (this is a data mining class after all), but really large data sets can bog down computers. R (for example) can easily handle data sets in the tens or even hundreds of thousands (depending on your computer). A lower limit for data size should be n=1000 although I will be willing to accept exceptions. See below links to look for data sets that might interest you.
Data Analysis: Consider your data carefully. Even if you downloaded it, you should look for information about it. This information should also be included in your proposal: i. How was it collected? ii. What are the data quality issues? iii. Are there biases inherent in who collected the data or how it was collected? iv. Are there any data preparation needed? And v. What are these operations? How might this impact the subsequent conclusions?
Formulate questions that you would like to answer about this data set. You can follow the way the lecture notes listed the question (What is the dependent variable or variables? What are the predictors?) Implement your analysis using data mining tools. These should have some relation to what we have learned in the class! Are you doing a classification or clustering task? Can the data be expressed as a network of some kind? Are there interesting visualizations to do? How will you evaluate the performance of your model, or choose between competing models?
Results Analysis: Gather all results from all individual steps or projects and run your analysis on it. This would include some fidelity criteria (performance evaluation) of the method.
Report Format: Your paper should follow IEEE/ACM standard (.doc word template is also given) [login to view URL] [login to view URL] Total pages should not exceed 6 pages (including references). Times new roman, 10pt size font, single spacing. Subjects: You may work on any dataset in the field you choose; i. Databases and SQL ii. Page Ranking and Web Mining iii. Text Mining and NLP iv. Image Mining and the Web v. Any other data set, I’d prefer you discuss that with me in advance.
Software: Use whatever software is comfortable for you, STATISTICA Data Miner, KNIME, RapidMiner, Weka, SAS Enterprise Miner, Oracle Data Mining, IBM SPSS Modeler, and of course all programming languages C#, C++, Java , Python, R, are fine.
Method: You can use the machine of your choice: Conventional, or non-conventional (Neural nets, Genetic Algorithm, Fuzzy logic, Decision Trees, Frequent Patterns, .. )
10 фрилансеров(-а) в среднем готовы выполнить эту работу за $188
Hello, I am good at data analyzing. I can create data and do related job to reach the answers of thecquestions of this research work. Please feel free to contact me to gave a talk. Regards, Hannaneh