I need to create an Interpretability model for reinforcement learning agent or policy (to be discussed). My main goal is to make reinforcement learning technique interpretable or transparent for the user (not developer but user). For instance, we can visualize, present future behaviours of actions of the trained algorithm. I want to begin with one model but I need 25 models by the end of July.
That would be simple and a good starting point to do something real / bigger in the next gigs as I have lots of work. Language: Python, you can use TensorFlow, OpenGym AI etc if you want.
A few papers came up in the search. It's a basic approach but might be a good inspiration: [login to view URL]
This might help you understand what I mean by interpretability model: [login to view URL]