Introducing Chirpy, a hardware module designed for swarm robots that enables them to locate each other and communicate through audio. With the help of its deep learning module (AudioLocNet), Chirpy is capable of performing localization in challenging environments, such as those with non-line-of-sight and reverb. To support concurrent transmission, Chirpy uses orthogonal audio chirps and has an audio message frame design that balances localization accuracy and communication speed. As a result, a swarm of robots equipped with Chirpies can on-the-fly construct a path (or a potential field) to a location of interest without the need for a map, making them ideal for tasks such as search and rescue missions. Our experiments show that Chirpy can decode messages from four concurrent transmissions with a Bit Error Rate (BER) of at a distance of 250 cm, and it can communicate at Signal-to-Noise Ratios (SNRs) as low as -32 dB while maintaining ≈ 0 BER. Furthermore, AudioLocNet demonstrates high accuracy in classifying the location of a transmitter, even in adverse conditions such as non-line-of-sight and reverberant environments.