First I want to collect 150 Text Documents related to the single topic .ie., Education.
when a query is given by the user it should return the documents which contains the query [url removed, login to view] returned documents should be in the order of the Relevance.(Relevance can be calculated by using tf-idf values).Most relevant Documents should appear first and till the least relevant [url removed, login to view] must be implemented in python 3.4
I was thinking to use the concepts of Tokenization by using Beautiful soup(to remove the stop words , to remove the HTML tags and to tokenize the words) , Indexing to know the term Frequency of words,posting list,.tf-idf to know the relevance of the Documents.
I want to run my program on python 3.4 console.
Hello. I already have a simple implementation of search engine using tf-idf ranking. It is written in python 2, so it will not be a big deal to convert the code to python 3.