Similarly to what he suggested, I tried using least squares with polynomial features of the input, (r, g, b) -> (r^3, g^3, b^3, ..., r, g, b, 1), you then find the color transformation matrix using least squares from a single frame, and then you use it for inference on all the frames, seemed to work extremely well on the example you provided, just don't include the background (transparent pixels) and other duplicates in your data.
for example using some random colors similar to yours (since you didn't upload yours):