Aweeesome ...!! But I am thinking about the reason. I think, when we do not put our focus on the images themselves and on the cross, some trails of the previous image or images stay(s) in our vision system. The next image gets superimposed on those and the result is the grotesque final image.
If we focus on the image itself, the next image appears only after mopping off the previous image trail completely. Thus we can not see the effect.
The main factor here I think is the speed of sequencing through the images. The high speed does not allow the trail to be erased. If the sequencing speed is reduced, I think the effect would no longer be there.