Final Project - Karaoke/Sound Collection

For my Final Project, I’ll be combining my PCOMP project with ICM. I want to integrate the microphone with songs collected from the class to have everyone listen to new music from around the world while also learning new languages.

![

Data Collection: ](attachment:013d5dc7-4642-4b56-89ca-10a68cfb0759:FLORA-ICM_Replacement-fe7e0ee3.png)

Data Collection:

Everybody didn’t fill out the form lol

Everybody didn’t fill out the form lol

So, I guess a lot of the songs will be Chinese for ICM Idol. How will we do this?

Screenshot 2025-12-07 at 22.44.46.png

Aram suggested in class to make this easier, we should include the lyrics Romanized for everyone to be able to read it regardless of the song origin. I thought that this was a great idea and began by looking up romanized lyrics online and using Gemini to help analyze and timestamp each line to match the song's timing. The visual design was inspired by the American Idol neon aesthetic, layered glow effects on text and UI elements achieved through multiple overlapping draws with decreasing alpha values. Audio-reactive visualizations were implemented using p5.sound's Amplitude and FFT analyzers to drive the frequency bars and pulsing background elements in response to the music.

However, I quickly realized that the Gemini is terrible at audio recognition and the timestamps were mostly wrong. How could we solve this??????? Also, the songs that everyone mostly gave were a median time of around 4 mins+ (one of them was 8 mins long). To be honest, I tried my best trying to match the lyrics manually but this was too hard and irritating. I tried to ask AI (Cursor) to help with this problem. It suggested using a structured data format where each lyric line is stored as an object with a time value (in seconds) and a line value (the romanized text):

lyrics: [ { time: 0.0, line: "♪ ♪ ♪" }, { time: 4.0, line: "jag soona soona lage" }, { time: 8.0, line: "jag soona soona lage" }, { time: 12.0, line: "bina tere bina tere" }, ]

The code then compares the current audio playback time against these timestamps and displays whichever line is most recent. This means I only needed to get the start time of each line right—the system handles the rest automatically.

This was still hard and time consuming to do, especially since I had no clue what each lyric was saying to be honest.

So, I decided to switch things up and have fun with it.

/*   Unified IO → Multi-Channel USB-MIDI (Nano 33 IoT / SAMD21)   Simple edition (A6 + A7 touch sensors, no advanced denoise)

Mapping:     CH 1: VL53L0X distance  -> CC#1  (Mod Wheel)     CH 2: Buttons D3, D5..D10 -> Notes (C5 + C4..A4)     CH 3: Touch/FSR A6      -> CC#11 (Expression)     CH 4: Pot A3            -> CC#74 (Brightness)     CH 5: Pot A2            -> CC#71 (Resonance)     CH 6: Gyro X/Y (IMU)    -> CC#16 / CC#17     CH 7: Touch/FSR A7      -> CC#12 */

#include <Wire.h> #include <VL53L0X.h> #include <Arduino_LSM6DS3.h> #include "MIDIUSB.h" #include <math.h>

// ====== LED ====== #ifndef LED_BUILTIN #define LED_BUILTIN 13 #endif

// ====== VL53L0X ====== VL53L0X sensor; const int PIN_XSHUT  = 2; bool sensorOn = false;

// Distance mapping window int D_NEAR = 50; int D_FAR  = 600;

// ====== Buttons D5..D10 ====== const uint8_t BTN_PINS[]  = {5, 6, 7, 8, 9, 10}; const uint8_t BTN_COUNT   = sizeof(BTN_PINS) / sizeof(BTN_PINS[0]); uint8_t btnPrev[6] = {HIGH, HIGH, HIGH, HIGH, HIGH, HIGH}; const uint8_t BTN_NOTES[] = {60, 62, 64, 65, 67, 69}; // C4..A4