Cross-modal Language Processing in Real Time