Topic Modelling, Website classifying (200$)

I'm struggling to classify random websites properly.

In fact I have posted almost the same project previously. you can read/view in the below.

[login to view URL]

my purpose is the same as the previous project.

and explaining it deeper, when the tool gets to be made. it'd be applying on thousands / millions of websites.

After posting previous project, I happened to know that one of the general method is developing NLPㅡ model to classify certain websites.

and for that, we need to have manual classified/processed data about the target sort website.

though, in my thinking that's more like picking website category one by one.

What I've really wanted is, something 'automatically-classifying from the judging/defining what kind of website it is.'

So, something likewise this. automatically calculating similarity index between certain websites. (probably by NLP tactics)

and then, cut each of part between websites following the similarity index number.

If certain websites get to be judged similarily each other, then we would be able to bind them all automatically into one category.

without human judgement putting manually processed examples.

So this is what the project should be eventually like.

If you really really wonder the very original purpose of this project, is to have a fresh view of what kind of websites could be existing.

And about this 'website classification' I really do wonder if there's some work have been done/completed before. I'm sure there would be one.

You see, when you look at e-commerce products, there are always categories what kind of product it is. If it's clothes or shoes, computer, USB, or furniture.

I really wonder if there's some website that have pre-judged and pre-classified such categories for 'websites'.

so perhaps we could see rough categories of websites -


community website

porn website


And in this project, there are some of specifics you should consider. Please read below.

Condition 1.

Should be able to operate on global scale. When you search around websites, you can expect it's mostly anglosphere websites written in English, But the project purpose is to even classify websites from another market and another country. For example) website that is written in Russian, Hindi, Chinese.

This is why manual data input could be meaningless and only similarity index measure to acquire website category would be the way.

condition 2.

please show me how it does work by picking 10 times of examples.

condition 3.

after you showing me condition 2, I can cross check bringing images from my backgrounds.

condition 4.

when cross checking in condition 3 is done, I will release the milestone.

condition 5.

when the main script gets to be finished, it would be needing to implement multi-threaded scripting environment to compensate its speed. (the tool should be applying into thousands and millions of websites, so speed itself is important matter)

condition 6.

tool should have similarity index variable inside of the script. so i can adjust how narrow/wide the similarity degree will be.

Essential Note 1.

If you know some service/website that is able to satisfy project purpose, and a service can provide their API and let clients use their service in script/command line, I'm also opened to use such service. You would need to help to use the script. (But when it gets to be 3rd party software/service API using case, Since it is not property made by you, and since it'll cost regularily paying to that 3rd party service, and the offer price would be much lower than 200$. I would release 60$ for setting up the script using API. Please remind that.)

Referrable keywords/links.

[login to view URL]

[login to view URL]

Before offering bid : Please explain briefly how the work would be done. Or perhaps, please explain what other procedures need to be done before going deep in the main work to get this job done together.

Квалификация: Программирование на С, Программирование на C++, Data Scraping, Естественный язык, Web Crawling

Показать больше topic modeling, nlp topic classification, latent dirichlet allocation, topic modeling in r, svd topic model, topic modeling tutorial, topic modeling algorithms, topic modeling python, free website banners 200 250, online modelling website, website template 200, website map integration job, live website chat support job, modelling website pictures, pakistani free modelling website submission portfolio, website similar craiglist job, website chinese find job, website crawler script job, adult website earn money job, conversion website drupal site job

О работодателе:
( 13 отзыв(-а, -ов) ) Incheon, Korea, Republic of

ID проекта: #19717141

3 фрилансеров(-а) в среднем готовы выполнить эту работу за $157


A Data Scientist with experience in Python, R programming, R Shiny, R studio and anything related to data science and python Master in Engineering, Electrical and Electronic Engineer, who is dynamic, reliable, resou Больше

$30 USD за 3 дней(-я)
(4 отзывов(-а))

Hi there Good day My years of experience in building a complete website, web and mobile app solutions for both large and small business have given us a unique insight in creating the right website for your business or Больше

$300 USD за 15 дней(-я)
(1 отзыв)

Hi, This needs application of authentic NLP. Takes good deal of effort, would cost around USD 10000. I propose a Java solution built on top of my own OpenMana NLP Tool (patent pending technology). I can offer the Op Больше

$140 USD за 7 дней(-я)
(0 отзывов(-а))