Autonomous navigation of robots in unknown environments from their current position to a desired target without colliding with obstacles represents an important aspect in the field of mobile robots. In literature, traditional methods do exist in case a complete knowledge of the environment is available. However, this is not the case in real-life applications where a complete knowledge about stochastic environments can hardly be obtained where the positions of the obstacles are unknown. Our main goal is to navigate a skid-steering mobile robot (SSMR) with non-holonomic constraints in an unknown environment to its desired target in minimum time while avoiding colliding with the obstacles. In the context of autonomous cart navigation, autonomous navigation of mobile robots in unstructured environments can be formulated as a Reinforcement learning (RL) problem. Reinforcement learning is a paradigm in which the agent (robot) learns its optimal path by interacting with the environment and receiving a reward based on its actions. Based on this action-reward scenario, the optimal action for each state can be discovered by maximizing a predefined accumulated reward that reflects the quality of the trajectory taken by the robot. Reinforcement learning can be categorized into two main methods (Bagnell J. Peters J. Kober, 2013), namely value based methods (Q-learning, Deep Q-learning) and policy based methods (REINFORCE with Policy Gradient). (i) Value based methods: In these methods, the value function that maps each state-action pair into a value is learned. Thanks to these methods, the best action to take for each state, the action with the biggest value, can be found. This works well in case of a finite set of actions. (ii) Policy based methods: In policy based methods, instead of learning value functions that give an indication of the expected total reward for each state-action pair, the policy function that maps the state into action is optimized directly (Brundage. M Arulkumaran. K and A., 2017). Policy search methods are more effective in high dimensional action spaces, or when using continuous actions like the case of mobile robot navigation (Neumann G. Peters J. Deisenroth, 2013). However, since both of these methods have their own drawbacks, there is hybrid method called Actor-Critic that employs both value functions and policy search (Brundage. M Arulkumaran. K and A., 2017). In this method, the actor adjust the policy parameters by a policy gradient ascent whereas the critic estimates the action value function using a policy evaluation algorithm such as temporal difference (TD) learning. This can be done through two neural networks running in parallel. Herein, we introduce a reinforcement learning based navigation system that enhance the navigation capabilities of a skid steering mobile robot. This goal will be achieved by using asynchronous deep deterministic policy gradient (ADDPG) through an actor-critic algorithm (Badia A. Mirza M. Graves A. Harley T. Lillicrap T. Silver D. Kavukcuoglu K. Mnih, 2016). Two deep neural-networks are introduced in this case. The first one (actornetwork) transfers the input vector that represents the states of the robot to linear and angular velocity commands that represent the actions. Whereas the second neural network (critic-network) is responsible for predicting the Q-value of the state and action pairs. Deterministic policies will be utilized since it was proposed that they outperform their stochastic counterparts in highdimensional action spaces (Lever G. Hess N. Degris T. Wierstra D. Riedmiller M. Silver, 2014). On the other hand, Simultaneous Localization and Mapping also known as SLAM techniques are to be integrated with reinforcement learning in an attempt to improve the learning rate by providing more accurate estimation of the robots states. SLAM has been an active topic for many years (C. Cadena, 2016) because it provides two fundamental components to robotics: where I am, and what I see. Herein, we introduce how the navigation problem of non-holonomic mobile robots can be formulated as a reinforcement learning problem that could be solved by using ADDPG actor-critic algorithm. Besides, we share the experimental results on how SLAM would help reinforcement learning at all by comparing the learning rate without SLAM (comparison when a partial map of the environment is given and when no map is available). For implementation of the parallel running neural networks, we do our initial experiments using the GPU cluster of University of Twente. However, when it comes to real-life implementations and demonstrations on real robots, the hardware becomes the key component to determine whether the goal can be achieved or not. Therefore, we implement the trained networks on NVIDIA Jetson TX2 in order to show real-life implementation possibilities when the algorithms need to run in a navigating robot (Jetson TX2 Module, online documentation, n.d.). In this sense, NVIDIA Jetson TX2 gives opportunity to run even multiple complex neural networks real-time on a credic card size processor unit which can easily be carried by a small (less than 20cm length) robot. We show our experiments on sensor selection and the OpenAI package in ROSGazebo platform as a possible simulation environment. Last but not least, we discuss the real-life implementation scenarios using our NVIDIA Jetson TX2 and how realistic the simulation results could be when the algorithms are implemented on the real robot. The early experiments indicate that, for an optimal design of a SLAM and reinforcement learning based system to truly work in real-world applications in order to reach a goal in an unknown environment, the hardware setup and the algorithms must be collaboratively designed.