On this episode of “How’s That Made?” … Audiobooks!

In recent chats on FB, I had a massive number of listeners asking questions about the audiobook process and what exactly goes into the whole thing… and being the inherently lazy bastard that I am, realized this meant I didn’t have to come up with a new topic for a blog post… I could totally cheat off my FB followers!

Uh.  I mean…  I could… engage with people! and answer interesting questions!!!  Yeah!

(also, forgive me… my current production schedule meant that making a video of this wasn’t gonna happen… I did put a slideshow at the end tho! )



SO.  How is an audiobook born?!  

Well… when a CD player and a book love each other VEEERRRRYY much… they give each other a “special hug” …


No.  In all seriousness.  There are a lot of subtly different ways in which the process happens depending on companies and their size, but the key points and tricky bits are all about the same.  So here it goes!


It does all start, with a book.

In the new world of audio, some authors are writing audio-only releases and pushing them out, but most are still a book property first.  Authors can retain or sell the audio rights to their work to a publisher, usually for a set relatively short period of time, and it is the “rights holder’s” decision on when to make a book into an audio version. (Other rights can include screen rights, derivative works, adaptations and more)

Audiobooks are sold through a number of different outlets, with the largest single sales point being Amazon’s Audible.  CD purchases, Library lending through Overdrive, iTunes and others fill in the rest of the space.

When a book is chosen to be made into audio, the rights holder begins the process by defining a project, choosing their distribution options, and selecting a production team.  The production team can be as simple as a narrator who has the technical know-how to deliver the whole book… or a full production company with audio engineers, directors, talent they hire in, mastering crew, marketing and more.


Picking a Voice

In many respects, the person (or people) that becomes the voice of the book is the most important part in creating the proper experience.  With independent authors and small publishers, the author is usually involved in selecting the voice talent they feel best represents their work… with the larger publishers it is often the case that the author has little to no input on the process.  One common method includes collecting or selecting a pool of auditions, and asking the author to choose among them.

Narration talent fall into 2 very general categories: Eloquent Read, or Dramatic Read.

An eloquent read is a simple, well-paced presentation of the book text, without voices or emotive elements.  It is designed to be as pure an interpretation of the text into spoken word as possible, without coloring the read with the actor’s interpretation.

Dramatic Reads (not to be confused with a melodramatic read, which is almost universally bad) employ an actor who will impart a mild dramatic flair to the work, usually with voice characterizations and accents, and little hints as to the emotion of a scene.  Even non-fiction is done both ways, and different works lend themselves to different styles.

(Anyone who has heard my work will recognize me as the dramatic read style of voice quickly!)

As a narrator, I select titles I think have sales potential, that interest me, and that I feel are written to showcase my talents to audition for.  Some titles I am offered directly, some I submit auditions for.  Like most actors, the vast majority of auditions I submit are not awarded to me… it’s the nature of the life.  If I am made an offer, we negotiate delivery dates and compensation.  I work on some titles for a share of the royalties, and others on a piecemeal basis as “per finished hour” or paid based on the total length of the final product.


Preparing the Script

Once the narrator is chosen, we get a copy of the full text of the book.  (I read mine from PDF, either on iPad or from computer screen… paper makes noise) and we begin the prep process. When I sit down with a new book, I read it to myself in a somewhat fast paced way, looking for important clues like:

  • What is the general plot?
  • Who are the characters (I make a list) and what are the notes about their voice, accent, personality, etc?
  • Are there any big surprises?
  • Are there pronunciation issues, or words I am not familiar with?
  • Is the text complete, or are there missing pieces?
  • Are there editing and proofing errors in the text?


Some narrators prefer to mark up the text extensively, underlining dialogue and marking timing notes and pauses.  I have found that excessive, and it takes time I could otherwise spend getting a feel for the book, and recording it.  It takes me less time to fix an error like reading the dialogue in the wrong voice than it does to underline every person.  (Also, there’s a finite number of highlighter colors out there… and I’ve read some books with a LOOOOT of characters.  72 is my personal record)

Sometimes we will be asked to submit a “proof” sample of the book for approval, this is to ensure that a remote narrator is making directorial choices that meet with the project manager’s approval, then principal recording begins.


Saying the Words

Recording is done in a sound-treated space, with professional microphones that record through a D/A Converter (Digital/Analog Converter) to digital computer sound files.  We use programs called “DAW” or “Digital Audio Workstations” to visually edit audio files and turn the recordings into WAV files for editing and mastering.  My booth is a small one-person space with remote monitor and computer control console, which allows me to control the behavior of the recording software.  (I use a program called Reaper, but ProTools, Adobe Audition and a free program called Audacity are all used by different production teams)

Recording is done in 2 general styles: “Pickup” and “Punch and Roll” recording.

In “Pickup “recording, a narrator goes along reading until she or he makes a mistake then and they …. until she or he makes a mistake, and then they back-up and repeat the line saying it properly. (see what I did there?  Check out the video in the slideshow)  Often we will use something like a finger snap, or the “CLICK” of a dog training clicker to mark the error in the audio file, because on the visual graph of sound it makes a very distinctive | mark that is easy to see.

In “Punch and Roll” recording, when an error occurs the narrator stops the recording, backs up to a logical place (like that comma you just read) and begins the recording again.  This results in a audibly seamless recording with no mistakes when the final raw result comes out.  In fact, my software is kind enough to play back the 2 seconds before where I place my cursor, allowing me to catch up with my speech as I begin again.  Punch and roll is more difficult to see without a live demonstration… sorry.

I record almost exclusively in Punch and Roll, because it reduces the chances of the inclusion of a re-take in the final audio greatly, and speeds up editing.

Raw recording takes most narrators anywhere from 1.5 to 3 times the length of the final finished work… depending on the nature of the material. On average, a 10 hour book takes me 16 hours just to record the raw, unedited audio.


Polishing the Gems

Once the raw recording is complete, editing can begin.  Generally we work with chapter- or section-length chunks, as that makes for a logical break point for the listener.  Editing can be done in-house for studios that have the capability, or sent to an outside editor.  (I do both depending on the title) The editor listens to the entire file, while reading along with the script, and marks any place where the narrator deviated from the script.  These are noted, and a list is sent back to the narrator to record them as change requests (often called CRX) to be spliced in.  The narrator will record a brief section of the text before and after the error, listening to the original recording to duplicate energy and sound, and the editor will use their audio workstation to slice and splice in the audio over the original error. In this way, extra loud breath sounds, background noise that seeped through, strange mouth sounds or any sound that is unusual and impedes the listener’s experience can be removed, or eliminated through re-recording.

Sometimes we as narrators make minor intentional changes to the text of a work to optimize it for audio, and to eliminate any minor proofing errors that made their way into the final print version. Examples of the changes can include leaving out dialogue tags (he said, she said) in choppy conversations – if voice characterizations are used, dealing with tables or lists, and altering notes to the reader to be more sensible to a listener instead.

Narrators control many of the sounds that need to be mitigated through things like proper hydration, brushing our teeth before recording, and eating and drinking with caution to avoid stomach rumbles, which will be heard in the microphone.  Dehydration leads to sticky mouth sounds, food in teeth can cause sloppy clacking sounds, and any noise within the gastric system is audible on the recording.

(I have recorded tight deadline sections with a large pillow belted over my stomach to muffle a rumbly tummy from drinking water too fast… it looked like Santa.  Also, the nature of speaking for long periods coupled with drinking upwards of a liter of water each hour, causes increased burps… which almost always come up during a tender love scene.  Murphy’s law)

The editing process takes roughly 2 to 2.5 times the final length of the finished process.


Making the best sound possible

Once editing is complete, and all errors and trouble noises are fixed, mastering can begin.  Mastering seeks to create a file which is loud enough to be easy to hear in any environment, relatively uniform in volume so a listener doesn’t need to turn the volume up and down to compensate, and has a proper tone and feel.  Processing techniques including compression, limiting and normalizing seek to make the soft bits louder, the loud bits softer, and make sure no one part goes so high as to cause a distorted sound.

Mastering technicians also will often accentuate or reduce the higher or lower frequencies of the audio to complement the underlying recording, using tools called equalizers.  The file is then converted into a smaller compressed format which many of us are familiar with… MP3 for most books.

Mastering is often the fastest portion for spoken word, and often only takes about .5 times the final length or less.


… and off to market

If a project manager for a right’s holder has hired all this work out, they typically put the file through an internal quality control check.  This can be as simple as spot-checking the files for quality, or as complete as having a proofer listen to the entire work to verify consistency and accuracy to the text.  If the book is to the rights holder’s liking, it is approved and goes to retail production.

Packaging for production involves encoding the MP3 files with metadata that indicate the book title, chapter number, author, artist, publisher and any other information the retail coordinator feels is important, then for digital download the file is wrapped in digital rights management protection to prevent piracy.



From start to finish, the whole process usually involves roughly 10-12 hours of production time per finished hour of audio, including the time of the proofers, project managers, voice talent, editors and retail production staff.


Ask any follow-up questions in the comments, and I’ll give you the best rundown I can manage!


[slideshow_deploy id=’32256′]

4 Responses

  1. ptkat
    ptkat at |

    Love the in-depth look. Thanks Greg! All from a little ok’ question on Rhys’ page. (wink)

  2. Melissa
    Melissa at |

    Thank you for this info! I listen to a ton of audiobooks and I spend a good amount of time wondering how it is done. This blog post was perfect for me!

  3. Nihcki
    Nihcki at |

    Wow. I hadn’t realized the time and work involved in audiobooks. Thanks for the eye-opener!

  4. kanundra
    kanundra at |

    Great read, thanks 🙂


Please take a minute to leave a comment it is so appreciated !