Network anomaly detection is very important for making modern cybersecurity systems stronger, especially since cyber threats like zero-day attacks are becoming more complicated. This study looks into how to use Reinforcement Learning (RL), specifically Q-Learning and Deep Q-Learning (DQN), to create an adaptive intrusion detection system that can find both known and new attack patterns. The UNSW-NB15 dataset is used in the study, and a zero-day simulation is created by leaving out the "Fuzzers" and "Reconnaissance" attack types from the training data. Data preprocessing includes feature selection, encoding, normalization, and class balancing with SMOTE-Tomek to address imbalanced data. We compare the two algorithms using different hyperparameter tuning methods. The performance shows that DQN performs better than Q- Learning in all the metrics considered for evaluation. It records a test accuracy of 99.09% and F1-score of 0.9918 in the scenario of normal data. It also holds up well under the presence of zero- day attacks, where the accuracy drops only marginally (0.07%). The findings show that neural-based RL models are better at learning abstract representations of attacks.