Deep RL expert needed to guide me in my project


I have the following personnel ML project:

1. modeling a sort of single-player game with a big board of locations, pawns of different colors and quantities to place on it, and a set of rules to place each kind (color) of pawn on the board. There are also time considerations as a moving a pawn from one location to another takes some time related to the distance...

2. Have a Deep RL algorithm learning to play the game and finding the best solution (highest score in the smallest number of moves).

I have only been for a few weeks into Machine Learning stuff starting with openai gym. What I did so far is programming the gym environment, which works with a random agent (I mean the rules work correctly). I tried to train PPO on it but I am not sure if my strategy is the right one regarding action/observation spaces and rewards. I did not really do normalization as I am not sure how to handle it.

I am stuck at this stage with a ton of questions and doubts and would need an expert to coach me, and put me in the right direction as I don't have much time for trial and errors but want to learn how to tackle my specific problem... I might also need some help with coding when necessary (I code in Python).



Brief summary of how my gym env looks like:

I coded a gym environment for a single-player game that consists of:

- a board of n locations (n is 1122 in this example but, in the future, I would like to be able to handle boards of 30000 locations for instance), represented by a simple list of 1122 indexes

- 6 different kinds (colors) of pawns that you can place according to a set of rules that is handled by the gym environment (some pawns can pile up on the same locations, etc.). At the beginning, the player has a fixed number of available pawns per color (stock), which I represent by the code 1 to 6.

- 3 possible pawn actions: NEW, when putting a new pawn on the board from the stock, MOVE, when moving a pawn already on the board to a new location), and REMOVE, when removing a pawn from the board to put it back in the stock.

As an action_space I used a MultiDiscrete([pawn_actions_nr, total_pawns_nr, locations_nr]), where:

- pawn_actions_nr = 0 (NEW), 1 (MOVE) or 2 (REMOVE)

- total_pawns_nr = int from 0 to 60 with 0 to 6 being the 7 black pawns, 7 to 10 the 4 red pawns, and so on

- locations_nr = 0 to 1122, representing each of the 1122 possible locations

Every time a pawn stops at a location, the location takes its color (ex: I can place a red pawn on a given location and then move it to different locations, all these locations will turn red).

Observation space: a box of (1122+60) length, values can be integers from -1 to 1122. The first 1122 represent the index of the locations (and can take value from 0 to 6, 0 being the initial state and 1 to 6 the color of the location) and the last 60 represent all the available pawns from the initial stock, with a possible value from -1 to 1122, representing the location where a given is located, -1 meaning that it is not in the board but still in stock.

The environment does not manage the time so far as I am not sure how to handle the moving delay (do I have to set a fixed time per step and manage past present and future some how? I there a way to handle that as discrete simulations do?...)

Навыки: Machine Learning (ML), Deep Learning, Python

Показать больше: things needed indian project, php expert needed, joomla expert needed, guide project management body knowledge ppt, web security expert needed, photoshop expert needed san jose, volusion expert needed, digg expert needed, linux expert needed, mod rewrite expert needed, zen cart expert needed, php web scraping expert needed, j2me expert needed, tour guide project spanish, net graphics expert needed, java programmer needed asycuda project, project vps admin plesk expert needed, google maps project google maps expert needed

О работодателе:
( 0 отзыв(-а, -ов) ) Annecy, France

ID проекта: #28938998

10 фрилансеров(-а) готовы выполнить эту работу в среднем за €55/час


Hi, I am Ibrahim and I am a data scientist, I can help you with RL, please share what is the desired variation. Regards, Ibrahim Anjum

€36 EUR / час
(45 отзывов(-а))

Hello jomo78,   We have 20 years of strong experience in Python, Machine Learning (ML), Deep Learning, as a result, we can successfully complete this project.   Please, review our profile here: https://www.freelancer.c Больше

€79 EUR / час
(11 отзывов(-а))

Hi, there. I read your description and I am interested in your project. I am a ✪Depp Learning/Machine Learning/Python✪ Expert who you are looking for and have +7 years experience I am familiar in a lot of Python module Больше

€36 EUR / час
(19 отзывов(-а))

Hi, I am expert in machine learning/Deep learning/AI/OCR/data mining/NLP. I implemented algorithms for data classification, text classification, Trading, automation, data mining, speech recognition, time serious analys Больше

€180 EUR / час
(8 отзывов(-а))

Hi, I hope you are doing fine. I have almost 10 years of experience in machine learning algorithms. I can implement various types of artificial intelligence algorithms including yours with Matlab, Python and etc. I hav Больше

€36 EUR / час
(4 отзывов(-а))

Hi I am Python expert and Control Engineer. I work on RL methods such as QL, DQL, PG, DPG and so on. Also I work on game theory methods. I can help you. We can discuss more in chat inbox Thank you

€36 EUR / час
(5 отзывов(-а))

[login to view URL] modeling, web scraping, Time Series, Topic Modeling, Spam detection Python web Development(Flask,FastAPI) Python Programming Data Extraction Data Visualization Statistical Methods Car Prediction app: htt Больше

€36 EUR / час
(0 отзывов(-а))

I am a high school student and also a Machine Learning Engineer with an incredibly curious urge to apply AI to day-to-day modern problems. Skills: Python, C++ and Golang developer, TensorFlow Developer, PyTorch, Compu Больше

€36 EUR / час
(0 отзывов(-а))
(0 отзывов(-а))

Hi, We are a team of data science and ML/AI experts who excel across multiple areas with more than twenty five years of combined experience. We hold expertise in Python, Backend Architecture (micro-services, Kubernete Больше

€40 EUR / час
(0 отзывов(-а))