【五代】 徐昌图

饮散离亭西去,浮生常恨飘蓬。回头烟柳渐重重。淡云孤雁远,寒日暮天红。 今夜画船何处?潮平淮月朦胧。酒醒人静奈愁浓。残灯孤枕梦,轻浪五更风。

Read more »

【宋】 王沂孙

渐新痕悬柳,淡彩穿花,依约破初暝。便有团圆意,深深拜,相逢谁在香径。画眉未稳。料素娥、犹带离恨。最堪爱、一曲银钩小,宝帘挂秋冷。 千古盈亏休问。叹慢磨玉斧,难补金镜。太液池犹在,凄凉处、何人重赋清景。故山夜永。试待他、窥户端正。看云外山河,还老尽、桂花影。

Read more »

【宋】 金德淑

春睡起,积雪满燕山。万里长城横缟带,六街灯火已阑珊,人立玉楼间。

Read more »

Source:

  • Random Vectors from the textbook Introduction to Probability, Statistics, and Random Processes by Hossein Pishro-Nik.
  • Random Vectors and the Variance–Covariance Matrix
Read more »

Actor-critic methods are still policy gradient methods. Compared to REINFORCE, actor-critic methods use TD learning to approximate the action value \(q_\pi\left(s_t, a_t\right)\).

What are "actor" and "critic"?

  • Here, "actor" refers to policy update. It is called actor is because the policies will be applied to take actions.
  • Here, "critic" refers to policy evaluation or value estimation. It is called critic because it criticizes the policy by evaluating it.

Sources:

  1. Shiyu Zhao. Chapter 10: Actor-Critic Methods. Mathematical Foundations of Reinforcement Learning.
Read more »
0%