Post

Reinforcing Reinforcement Learning Terms, Policies, Models and Top 40 Libraries ๐Ÿ“š

RL is a type of machine learning that lets an agent interact with an environment, receive feedback, and make better decisions over time.

โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”

๐Ÿ“ Terms Used in RL:

  • โŒ˜ Environment: the system or situation that the agent interacts with.

  • โŒ˜ Agent: refers to an autonomous entity that interacts with an environment.

  • โŒ˜ Feedback: refers to the information provided by the environment to the agent after the agent has taken an action (rewards or penalties).

  • โŒ˜ State (S): Current situation returned by the environment.

  • โŒ˜ Policy(ฯ€): The strategy that the agent employs to determine next action.

  • โŒ˜ Value (V): The expected long-term return

  • โŒ˜ Q-Value (Q): the long-term return of given current action

  • โŒ˜ Model: stands for the simulation of environment.

โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”

๐Ÿ“– Model/Policy of RL:

Model-Free vs Model-Based:

  • เน Model-based works with state space and action space grows
  • เน Model-free algorithms rely on trial-and-error to update its knowledge.

On-Policy vs Off-Policy:

  • เน On-policy agent learns based on its current action a derived from the current policy,
  • เน Off-policy counter part learns it based on another policy.

โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”

๐Ÿค– Well-known RL Models:

  • โžŠ Q-Learning:

Model-free algorithm that uses a Q-table to store best action for a state.

  • โž‹ State-Action-Reward-State-Action (SARSA):

Model-based algorithm that updates state-action value based on reward and next state-action.

  • โžŒ Deep Q Network (DQN):

Model-free algorithm that uses deep neural networks to approximate the Q-function.

  • โž Deep Deterministic Policy Gradient (DDPG):

By using deep neural networks, DDPG handles more complex environments and large state spaces than traditional RL algorithms.

โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”

๐Ÿ› ๏ธ Here are some applications of reinforcement learning (RL):

  • ยป Robotics
  • ยป Autonomous Vehicles
  • ยป Healthcare
  • ยป Finance
  • ยป Gaming
  • ยป Energy Management
  • ยป Marketing and Advertising
  • ยป Natural Language Processing
  • ยป Manufacturing
  • ยป Smart Grids
  • ยป Supply Chain Optimization
  • ยป Recommendation Systems
  • ยป Personalization Systems
  • ยป Traffic Signal Control
  • ยป Education and Training
  • ยป Agriculture
  • ยป Industrial Automation
  • ยป Space Exploration
  • ยป Cybersecurity
  • ยป Virtual Assistants

โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”

These are 40 Python Libraries I found for Reinforcement Learning:

  • ๐Ÿ“š Gym
  • ๐Ÿ“š Baselines
  • ๐Ÿ“š Dopamine
  • ๐Ÿ“š TensorLayer
  • ๐Ÿ“š FinRL
  • ๐Ÿ“š Stable-Baselines
  • ๐Ÿ“š ReAgent
  • ๐Ÿ“š Acme
  • ๐Ÿ“š PARL
  • ๐Ÿ“š TF-Agents
  • ๐Ÿ“š TensorFlow
  • ๐Ÿ“š PyTorchRL
  • ๐Ÿ“š Keras-RL
  • ๐Ÿ“š Garage
  • ๐Ÿ“š TensorForce
  • ๐Ÿ“š RLax
  • ๐Ÿ“š Coach
  • ๐Ÿ“š RFRL
  • ๐Ÿ“š Rliable
  • ๐Ÿ“š ViZDoom
  • ๐Ÿ“š Ray RLlib
  • ๐Ÿ“š Dopamine
  • ๐Ÿ“š Acme
  • ๐Ÿ“š Tensorforce
  • ๐Ÿ“š ReAgent (Horizon)
  • ๐Ÿ“š ChainerRL
  • ๐Ÿ“š MushroomRL
  • ๐Ÿ“š TRFL
  • ๐Ÿ“š CleanRL
  • ๐Ÿ“š Tianshou
  • ๐Ÿ“š MAgent
  • ๐Ÿ“š rl-baselines3-zoo
  • ๐Ÿ“š PettingZoo
  • ๐Ÿ“š RLlib
  • ๐Ÿ“š RoboRL
  • ๐Ÿ“š H-baselines
  • ๐Ÿ“š DI-engine

Source in the comments โ†“

โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”

โญ† ๐‘๐ž๐ ๐ข๐ฌ๐ญ๐ž๐ซ ๐Ÿ๐จ๐ซ ๐…๐ซ๐ž๐ž ๐Ž๐ง๐ฅ๐ข๐ง๐ž ๐‡๐š๐ง๐๐ฌ-๐จ๐ง ๐ƒ๐š๐ญ๐š ๐’๐œ๐ข๐ž๐ง๐œ๐ž ๐“๐ฎ๐ญ๐จ๐ซ๐ข๐š๐ฅ (๐„๐ง๐ ๐ญ๐จ ๐„๐ง๐ ๐๐ซ๐จ๐ฃ๐ž๐œ๐ญ): https://www.maryammiradi.com/sonar

 Reinforcement Learning Python Libraries

Translate to Korean

RL์€ ์—์ด์ „ํŠธ๊ฐ€ ํ™˜๊ฒฝ๊ณผ ์ƒํ˜ธ ์ž‘์šฉํ•˜๊ณ , ํ”ผ๋“œ๋ฐฑ์„ ๋ฐ›๊ณ , ์‹œ๊ฐ„์ด ์ง€๋‚จ์— ๋”ฐ๋ผ ๋” ๋‚˜์€ ๊ฒฐ์ •์„ ๋‚ด๋ฆด ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ๊ธฐ๊ณ„ ํ•™์Šต์˜ ํ•œ ์œ ํ˜•์ž…๋‹ˆ๋‹ค.

โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”

๐Ÿ“ RL์—์„œ ์‚ฌ์šฉ๋˜๋Š” ์šฉ์–ด:

  • โŒ˜ ํ™˜๊ฒฝ: ์—์ด์ „ํŠธ๊ฐ€ ์ƒํ˜ธ ์ž‘์šฉํ•˜๋Š” ์‹œ์Šคํ…œ ๋˜๋Š” ์ƒํ™ฉ์ž…๋‹ˆ๋‹ค.

  • โŒ˜ ์—์ด์ „ํŠธ(Agent): ํ™˜๊ฒฝ๊ณผ ์ƒํ˜ธ ์ž‘์šฉํ•˜๋Š” ์ž์œจ์ ์ธ ๊ฐœ์ฒด๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

  • โŒ˜ ํ”ผ๋“œ๋ฐฑ: ์—์ด์ „ํŠธ๊ฐ€ ์กฐ์น˜(๋ณด์ƒ ๋˜๋Š” ํŽ˜๋„ํ‹ฐ)๋ฅผ ์ทจํ•œ ํ›„ ํ™˜๊ฒฝ์—์„œ ์—์ด์ „ํŠธ์—๊ฒŒ ์ œ๊ณตํ•˜๋Š” ์ •๋ณด๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

  • โŒ˜ ์ƒํƒœ(S): ํ™˜๊ฒฝ์—์„œ ๋ฐ˜ํ™˜๋˜๋Š” ํ˜„์žฌ ์ƒํ™ฉ์ž…๋‹ˆ๋‹ค.

  • โŒ˜ ์ •์ฑ…(ฯ€): ์—์ด์ „ํŠธ๊ฐ€ ๋‹ค์Œ ํ–‰๋™์„ ๊ฒฐ์ •ํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉํ•˜๋Š” ์ „๋žต์ž…๋‹ˆ๋‹ค.

  • โŒ˜ ๊ฐ€์น˜ (V) : ์˜ˆ์ƒ๋˜๋Š” ์žฅ๊ธฐ ์ˆ˜์ต

  • โŒ˜ Q-Value (Q): ์ฃผ์–ด์ง„ ํ˜„์žฌ ํ–‰๋™์˜ ์žฅ๊ธฐ ์ˆ˜์ต

  • โŒ˜ ๋ชจ๋ธ: ํ™˜๊ฒฝ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”

๐Ÿ“– RL์˜ ๋ชจ๋ธ/์ •์ฑ…:

๋ชจ๋ธ ํ”„๋ฆฌ(Model-Free) vs ๋ชจ๋ธ ๊ธฐ๋ฐ˜(Model-Based):

  • เน ์ƒํƒœ ๊ณต๊ฐ„๊ณผ ์•ก์…˜ ๊ณต๊ฐ„์ด ์ปค์ง€๋Š” ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ์ž‘ํ’ˆ
  • เน Model-free ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์ง€์‹์„ ์—…๋ฐ์ดํŠธํ•˜๊ธฐ ์œ„ํ•ด ์‹œํ–‰์ฐฉ์˜ค์— ์˜์กดํ•ฉ๋‹ˆ๋‹ค.

์˜จ-ํด๋ฆฌ์‹œ(On-Policy) vs ์˜คํ”„-ํด๋ฆฌ์‹œ(Off-Policy):

  • เน On-policy ์—์ด์ „ํŠธ๋Š” ํ˜„์žฌ ์ •์ฑ…์—์„œ ํŒŒ์ƒ๋œ ํ˜„์žฌ ์ž‘์—…์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.
  • เน Off-policy ์นด์šดํ„ฐ ํŒŒํŠธ๋Š” ๋‹ค๋ฅธ ์ •์ฑ…์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.

โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”

๐Ÿค– ์ž˜ ์•Œ๋ ค์ง„ RL ๋ชจ๋ธ:

  • โžŠ Q-๋Ÿฌ๋‹:

Q-ํ…Œ์ด๋ธ”์„ ์‚ฌ์šฉํ•˜์—ฌ ์ƒํƒœ์— ๋Œ€ํ•œ ์ตœ์ƒ์˜ ์ž‘์—…์„ ์ €์žฅํ•˜๋Š” ๋ชจ๋ธ ์—†๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค.

  • โž‹ ๊ตญ๊ฐ€-ํ–‰๋™-๋ณด์ƒ-๊ตญ๊ฐ€-ํ–‰๋™(SARSA):

๋ณด์ƒ ๋ฐ ๋‹ค์Œ ์ƒํƒœ-ํ–‰๋™์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ƒํƒœ-ํ–‰๋™ ๊ฐ’์„ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค.

  • โžŒ ๋”ฅ Q ๋„คํŠธ์›Œํฌ(DQN):

์‹ฌ์ธต ์‹ ๊ฒฝ๋ง์„ ์‚ฌ์šฉํ•˜์—ฌ Q-function์„ ๊ทผ์‚ฌํ™”ํ•˜๋Š” ๋ชจ๋ธ ์—†๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค.

  • โž ์‹ฌ์ธต ๊ฒฐ์ •๋ก ์  ์ •์ฑ… ๊ทธ๋ž˜๋””์–ธํŠธ(DDPG):

DDPG๋Š” ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ธฐ์กด RL ์•Œ๊ณ ๋ฆฌ์ฆ˜๋ณด๋‹ค ๋” ๋ณต์žกํ•œ ํ™˜๊ฒฝ๊ณผ ๋Œ€๊ทœ๋ชจ ์ƒํƒœ ๊ณต๊ฐ„์„ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค.

โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”

๐Ÿ› ๏ธ ๋‹ค์Œ์€ ๊ฐ•ํ™” ํ•™์Šต(RL)์˜ ๋ช‡ ๊ฐ€์ง€ ์‘์šฉ ๋ถ„์•ผ์ž…๋‹ˆ๋‹ค.

  • ยป ๋กœ๋ณดํ‹ฑ์Šค
  • ยป ์ž์œจ ์ฃผํ–‰ ์ฐจ๋Ÿ‰
  • ยป ํ—ฌ์Šค์ผ€์–ด
  • ยป ๊ธˆ์œต
  • ยป ๋…ธ๋ฆ„
  • ยป ์—๋„ˆ์ง€ ๊ด€๋ฆฌ
  • ยป ๋งˆ์ผ€ํŒ… ๋ฐ ๊ด‘๊ณ 
  • ยป ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ
  • ยป ์ œ์กฐ์—…
  • ยป ์Šค๋งˆํŠธ ๊ทธ๋ฆฌ๋“œ
  • ยป ๊ณต๊ธ‰๋ง ์ตœ์ ํ™”
  • ยป ์ถ”์ฒœ ์‹œ์Šคํ…œ
  • ยป ๊ฐœ์ธํ™” ์‹œ์Šคํ…œ
  • ยป ๊ตํ†ต ์‹ ํ˜ธ ์ œ์–ด
  • ยป ๊ต์œก ๋ฐ ํ›ˆ๋ จ
  • ยป ๋†์—…
  • ยป ์‚ฐ์—… ์ž๋™ํ™”
  • ยป ์šฐ์ฃผ ํƒ์‚ฌ
  • ยป ์‚ฌ์ด๋ฒ„ ๋ณด์•ˆ
  • ยป ๊ฐ€์ƒ ๋น„์„œ
This post is licensed under CC BY 4.0 by the author.