reaction time for individual words in a paragraph

I apologize if this question’s been asked before; I searched and searched but could not find exactly what I was looking for.

I would like participants to read a series of sentences but only record articulation onset for specific target words. We’d like to use a text that mimics naturalistic narratives (i.e., a story) which is why the target words (on which we want articulation onset) are not at the end of each sentence.

Based on my knowledge of programming for eye tracking, I wonder if there is a way to tell the system “when the participant get to n” prepare the voice key for “n+1”. Or should we collect articulation onset for 100% of the words in the text and only take the data we need? Crucially, most of the text must be visible during reading. It cannot be one word at a time. Are either of these possible or is there an alternative approach?

Thank you for your time!