JAWs is an application that blind people use to have the contents of web pages spoken aloud to them in a synthesized voice. In fact, JAWs will read any part of the screen, including application menus, alert boxes, etc. The vocalization generally follows focus.
The latest version of JAWs has a trial package that you can use for forty minutes after each restart of the computer, which is generous enough to make getting a feel for the app practical. I've recently been exploring the site I work on using JAWs and it's a pretty strange experience.
I had never quite thought carefully enough about it to make the distinction between audio browser and screen reader, but using JAWs on a page whose structure is intimate to you brings home this distinction right away.
The way a page is laid out is (ideally) optimized for the way the eye absorbs information. Reading the page left to right top to bottom doesn't necessarily present it in the way that's most intuitive to a listener. Reading the page in DOM order may or may not be an improvement. Adding elements that get read but not seen may also help the audible experience.
Ideally, the page could be structured for visual interpretation when displayed on the screen and structured for audible interpretation when spoken. CSS includes can specify a media type for just this reason.
This is not at all the model that JAWS uses, and in fact there's no way to put this model in place for JAWs users. Using JAWs with a web browser, the web browser renders the screen and then JAWs reads it to you. It's strictly limited to created a visual experience, then trying to read you that visual experience.
For that job JAWs works very well, and it's necessary for interoperability with a wide range of applications. But it falls well short of the usability we could be getting by tailoring the spoken experience differently than the seen experience, even in very minor ways.
JAWs
Fangs is a firefox plug in that lets you read the page in the order it would be read to you by JAWs.