A photovoltaic is an environmentally friendly electric generator that transforms solar energy into electricity. The energy produced by photovoltaics depends on the irradiance of solar light and the surface temperature, it causes the features of the solar output curve to become non-linear.The maximum power point tracking (MPPT) technique is one of the key factor to Optimize photovoltaic output. Several algorithms for MPPT are available to do this tracking; the most frequently used is the peturb & observe (P&O) algorithm. The P&O algorithm is easy to compute and implement, but oscillations often occur at the maximum power point, and the power becomes inaccurate due to a lot of power loss. The deep q network (DQN) algorithm is a part of the reinforcement learning method that will improve P&O performance by using the output value of the P&O algorithm. DQN will record all environmental change data with possible action possibilities owing to the identified value and the action value selected by using the policy; thus, the selected action value corresponds to the best value of the identified condition. The improvement in P&O’s performance with DQN is assessed by three parameters: the speed of tracking, the level of oscillation, and the power tracking. These parameters were tested in three ways: various irradiation, temperature, and both. The proposed method increased the tracking speed when entering the same value at different moments by 33.3%–66.67%, and the oscillation rate was successfully reduced by 60.5%–84.81%. the increment of tracking speed in same input condition at the following periods and the decrement in oscillation rate are accompanied by the power efficiency, above 94% for each test scheme.