This research presents a deep reinforcement learning (DRL)-based flocking control strategy for quadcopter swarms operating in environments with dense obstacles and varying swarm sizes. Unlike conventional rule-based methods, which struggle to adapt to complex and dynamic scenarios, the proposed approach formulates the flocking task within a Partially Observable Markov Decision Process (POMDP), accounting for the local perception and communication constraints of each agent. The control policy is trained using the Proximal Policy Optimization (PPO) algorithm, implemented with centralized training and decentralized execution in a multi-agent setting. Classic boid-inspired rules (separation, cohesion, alignment), explicit collision avoidance, and a virtual corridor constraint are integrated into the reward function. This enables the swarm not only to maintain stable formation and avoid obstacles, but also to operate safely within predefined spatial boundaries, addressing a critical gap in existing flocking research. Extensive simulations were conducted across multiple scenarios, including environments with obstacles and swarm sizes ranging from 5 to 50 quadcopters. The quantitative results demonstrate that the proposed method achieves a target goal arrival of at least 98% across all scenarios, maintains low collision and deconfliction rates (below 2.5% for the largest swarms), and preserves effective spatial separation and stable flocking formations.