Object Detection for Reinforcement Learning Agents

Authors

  • Benjamin van Oostendorp DigiPen Institute of Technology

DOI:

https://doi.org/10.52846/stccj.2023.3.2.51

Keywords:

object detection, Reinforcement learning, deep q-learning

Abstract

In traditional reinforcement learning applications with images as input, the observation for the agent to learn from, is an image. In these models, a Convolutional Neural Network (CNN) is typically used to extract the features before for the learning process, in order to maximize the cumulative reward. In this paper, a different approach for pre-processing the input for reinforcement learning agents is considered. The proposed approach uses object detectors instead instead of CNNs, and converts each input image into bounding boxes and object locations for the agent to learn from.

References

D. Malik, Y. Li, and P. Ravikumar, “When is generalizable reinforcement learning tractable?” CoRR, vol. abs/2101.00300, 2021. [Online]. Available: https://arxiv.org/abs/2101.00300

R. B. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” CoRR, vol. abs/1311.2524, 2013. [Online]. Available: http://arxiv.org/abs/1311.2524

E. Hernandez, S. Schwettmann, D. Bau, T. Bagashvili, A. Torralba, and J. Andreas, “Natural language descriptions of deep visual features,” CoRR, vol. abs/2201.11114, 2022. [Online]. Available: https://arxiv.org/abs/2201.11114

“Shovel knight: Shovel of hope.” [Online]. Available: https://www.yachtclubgames.com/games/shovel-knight-shovel-of-hope

“Super mario bros.: The lost levels,” 1986.

G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “Openai gym,” 2016.

C. Kauten, “Super Mario Bros for OpenAI Gym,” GitHub, 2018. [Online]. Available: https://github.com/Kautenja/gym-super-mario-bros

P. Skalski, “Make Sense,” https://github.com/SkalskiP/make-sense/, 2019.

K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” CoRR, vol. abs/1406.4729, 2014. [Online]. Available: http://arxiv.org/abs/1406.4729

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. E. Reed, C. Fu, and A. C. Berg, “SSD: single shot multibox detector,” CoRR, vol. abs/1512.02325, 2015. [Online]. Available: http://arxiv.org/abs/1512.02325

J. Redmon, S. K. Divvala, R. B. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” CoRR, vol. abs/1506.02640, 2015. [Online]. Available: http://arxiv.org/abs/1506.02640

G. Jocher, “YOLOv5 by Ultralytics,” 5 2020. [Online]. Available: https://github.com/ultralytics/yolov5

C. Wang, I. Yeh, and H. M. Liao, “You only learn one representation: Unified network for multiple tasks,” CoRR, vol. abs/2105.04206, 2021. [Online]. Available: https://arxiv.org/abs/2105.04206

R. Girshick, “Fast r-cnn,” in 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1440–1448.

S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2017.

A. Bochkovskiy, C. Wang, and H. M. Liao, “Yolov4: Optimal speed and accuracy of object detection,” CoRR, vol. abs/2004.10934, 2020. [Online]. Available: https://arxiv.org/abs/2004.10934

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “Pytorch: An imperative style, high-performance deep learning library,” in Advances in Neural Information Processing Systems 32. Curran Associates, Inc., 2019, pp. 8024–8035. [Online]. Available: http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf

M. Hessel, J. Modayil, H. van Hasselt, T. Schaul, G. Ostrovski, W. Dabney, D. Horgan, B. Piot, M. Azar, and D. Silver, “Rainbow: Combining improvements in deep reinforcement learning,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, Apr. 2018. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/11796

H. van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double q-learning,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30, no. 1, Mar. 2016. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/10295

T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience replay,” 2015, cite arxiv:1511.05952Comment: Published at ICLR 2016. [Online]. Available: http://arxiv.org/abs/1511.05952

Z. Wang, N. de Freitas, and M. Lanctot, “Dueling network architectures for deep reinforcement learning,” CoRR, vol. abs/1511.06581, 2015. [Online]. Available: http://arxiv.org/abs/1511.06581

K. D. Asis, J. F. Hernandez-Garcia, G. Z. Holland, and R. S. Sutton, “Multi-step reinforcement learning: A unifying algorithm,” CoRR, vol. abs/1703.01327, 2017. [Online]. Available: http://arxiv.org/abs/1703.01327

M. G. Bellemare, W. Dabney, and R. Munos, “A distributional perspective on reinforcement learning,” CoRR, vol. abs/1707.06887, 2017. [Online]. Available: http://arxiv.org/abs/1707.06887

M. Fortunato, M. G. Azar, B. Piot, J. Menick, M. Hessel, I. Osband, A. Graves, V. Mnih, R. Munos, D. Hassabis, O. Pietquin, C. Blundell, and S. Legg, “Noisy networks for exploration,” in International Conference on Learning Representations, 2018. [Online]. Available: https://openreview.net/forum?id=rywHCPkAW

D. Kingma and J. Ba, “Adam: A method for stochastic optimization”, International Conference on Learning Representations, 12 2014.

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, Feb 2015. [Online]. Available: https://doi.org/10.1038/nature14236

Yannbouteiller, “vgamepad: Virtual xbox360 and dualshock4 gamepads in python.” [Online]. Available: https://github.com/yannbouteiller/vgamepad

B. H¨oglinger-Stelzer, “Vigem: Virtual gamepad emulation framework.” [Online]. Available: https://vigem.org/

T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll´ar, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Computer Vision – ECCV 2014, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Eds. Cham: Springer International Publishing, 2014, pp. 740–755.

“Mss: An ultra fast cross-platform multiple screenshots module in pure python using ctypes.” [Online]. Available: https://pypi.org/project/mss/

G. Bradski, “The OpenCV Library,” Dr. Dobb’s Journal of Software Tools, 2000.

J. Hosang, R. Benenson, and B. Schiele, “Learning non-maximum suppression,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 6469–6477.

Y. Hu, W. Wang, H. Jia, Y. Wang, Y. Chen, J. Hao, F. Wu, and C. Fan, “Learning to utilize shaping rewards: A new approach of reward shaping,” CoRR, vol. abs/

Downloads

Published

2023-12-31

How to Cite

[1]
B. van Oostendorp, “Object Detection for Reinforcement Learning Agents”, Syst. Theor. Control Comput. J., vol. 3, no. 2, pp. 9–14, Dec. 2023, doi: 10.52846/stccj.2023.3.2.51.
Received 2023-10-09
Accepted 2023-12-27
Published 2023-12-31