Audio Understanding Experiment DOI: 10.57967/hf/8154

Voice Sample

Recorded 26 March 2026 by Daniel Rosehill. Unscripted freeform voice note, OnePlus Nord 3.5G, HQ mode.

Duration: 20m 54s Format: FLAC mono 24kHz 16-bit Size: 30.9 MB Device: OnePlus Nord 3.5G Environment: Untreated room

Speaker Profile

GenderMale
Age37 (late 30s)
AccentIrish (Cork), softened
Voice typeBass / Low Baritone
Speaking rate~169 WPM
LocationJerusalem, Israel

Acoustic Profile

Median F0109.6 Hz
F0 range74.9 – 499.9 Hz
HNR9.6 dB (fatigued)
Peak level−1.02 dB
RMS level−22.21 dB
Dynamic range~65.8 dB

Formant Analysis

F1 (jaw openness)669 Hz mean
F2 (tongue position)1,896 Hz mean
F3 (lip rounding)2,873 Hz mean
Voiced frames50.8%
Pitch variability28.3% CV

Voice Quality

Jitter (local)2.713% (elevated)
Shimmer (local)13.089% (elevated)
Crest factor11.47
Bit rate197 kbps
ConditionFatigued, dehydrated

Transcript

AssemblyAI · 97.4% confidence · 3,524 words

00:00So I thought I would record a voice note because today is one of those days where I'm having an immensely difficult time in actually getting out of bed. I am in bed at 4:08 in the afternoon. This is not something that typically happens. I am in bed because I live in Jerusalem and there is the Iranian war going on and we had just a crazy, crazy night.

00:28I was up late last night, which I knew was kind of risky. In this war you kind of learn we've been at war for almost a month. It's going to be a month. Today's I'm recording this on the 26th of March. I should probably start it with that. And on the 28th is going to be a month, so a long time.

00:57Trying to finally get back into some kind of a groove with everything that's being disrupted and. But then this morning woke up to the first rocket siren. Like I'm gonna say seven in the morning, approx. And then we had like just one of those.

01:25So it's very much. There is attacks going on all over all the time. It's a bit unnerving to actually have it up on a screen like this. It is a vibe coded app that I created called Redlert Geodash and it's cool how many open source projects are coming out there at the moment.

01:51No, they've all got. This one has its own unique features to it, but the fact that these can be created by bunches of people in a few hours is revolutionary. Anyway, so coming back to the rockets. Yeah, so we went out to the shelter and then it was just like three or four more rounds of it.

02:18Another attack. I don't know, it's something about that like going back to sleep for 20 minutes thing that just when you do finally just give up on trying to get back to sleep, you're just exhausted. So hence I'm in an energy deficit waiting for some coffee to kick in.

02:49But most significantly, I think is my AI generated podcast. It's called My Word Prompts mywordprompts.com and for voice cloning. So the podcast is basically these two characters. It's Herman and Corn. Corn is a sloth, Herman is a donkey.

03:15So they're both. It's using Chatterbox, which is from Resemble AI. And what's really crazy about it is it's like a, I think 30 second sample and that's it. So each character is me doing a voice.

03:44And I'm recording this and putting it out on GitHub publicly because I realize from all the podcasts and YouTube videos I've done, if anyone does want to make a deepfake voice clone of me, they already have all the information they need.

04:17Actually, I think, I'm not sure if he's. I'm not sure if he still thinks I'm a boss or if I've convinced him of my humanity. But I am a human and it's kind of. I guess there's something, there's something kind of funny about that.

04:46And so I'm sure from Synth, that Synth is so just like to add to my, to add to the mystery, mystery I now have. Like, I can see why I might seem bot like but on my to do list to get a professional headshot.

05:12Very corporate. So I leaned into the AI for my lit, for my little Avatar pick. But, but my original one. There's plenty of photos of me on the Internet or a few at least that are not in any way AI tampered and it's just me.

05:45And I mean, I guess that's obvious, right? But even in a few years you can hear these small differences. So this is how we speak today. And let me talk about the acoustic environment within which I find myself.

06:11One use for having a voice sample that I found is speech to text benchmarking. So if you want to get a benchmark for the accuracy of a model, if I can summon up the motivation to do so, I'll create a ground truth.

06:36And then you listen back to. There's a lot of apps that just let you scrub through the audio and just fix up any things that got wrong and that is your like 100% accuracy benchmark.

07:07So you can do it. It's actually pretty easy, but very, very worthwhile. Extremely worthwhile in fact. Like if you're going to be spending. I've mentioned in my podcast and my, I guess anything I've written here, my blog or elsewhere that I have a very long term view of voice tech.

07:35No, the accuracy is very good. The last thing I'm looking for is something that I can type with on my computer in real time like a streaming response one on an Ubuntu.

08:06And so we're trying to just kind of hold it all together and do our, you know, work on stuff and take care of him. So sometimes I'm holding him and I just. If I had the real time text input, I could just quickly, you know, jot something down into the computer.

08:32And I have to say, the microphone here is pretty decent. And I am recording this voice note today on the HQ setting. Let's see what the HQ setting actually entails. It is. How do I find that out? Ah, yes. WAV stereo. 44.1 kilohertz.

09:00Ooh. So I have a setting in there that's maybe doing noise calculation. Well, this is. It's going to be a one shot, one shot data set. So it is what it is.

09:27And I think from the one thing I've learned about TTS, the 30 second. If you're trying to do voice cloning, so 30 seconds, it's really. I've tried. I played around with my voices for the characters in this podcast, Herman and corn.

09:54But if you say like, this is Daniel and I'm walking around the living room in Jerusalem and I'm having a quite pleasant day today, like, if you read something like a robot, then your voice tone will sound robotic.

10:20Right. Those things. If you're training on a small set of voice audios, what I actually ended up doing for those voice clones, for anyone who's ever listened to this podcast, is try to find something I could say in 30 seconds that I could have a bit of enthusiasm and a bit of the other opposite.

10:55Now what other delightful things do I have? Because I'm going to try to stretch this out to 15 minutes and LFS storage in GitHub. GitHub, say I have filled up my LFS storage.

11:21I'm already paying for GitHub and how did I fill up so much LFS storage? I don't know, but I'm sure Claude knows. So I'll probably ask Claude, hey, what's going on here?

11:47But you know, some things never change. I am a backup worries person. And the more, the more that you have one project where you've got stuff, oh, this is in a object store, this is in a repo, it becomes harder to actually get a decent backup.

12:13Oh gosh, that sounds very old. Yeah, late 30s. There's no escaping late 30s or 37. Like 36, it's kind of an edge case, like you know, your late 30s, but it could be argued your late mid-30s where 37 is just. No, you're, you're practically 40.

12:48We did live in other countries, just for a year. Nothing too glamorous. We lived in the Ha and Aberdeen when I was really little. So little that I don't remember any of it. But we moved back to Cork and I moved to Israel because I'm Jewish.

13:16I do believe Israel is the place for Jewish people to live. But I also want to be a peaceful part of the world and the war with Iran is just, and all the countries here, it's just a massive drain.

13:50And I just kind of at one day said wait, I don't need to do this. Like, I don't know from whatever YouTube revenue I was making, it was like maybe $50 a month or something. I was like, I, I can just step back.

14:15Oh Yeah, the videos YouTube channel that was, that was fun, important. I do actually now aspire to return but it's going to be so different.

14:41I would say that's the main issue with the pressures of jobs and fatherhood. Like there's a lot of things I'm trying to be a bit more strategic about what I spend time on.

15:16To create a voice clone of myself. And of course I will absolutely say I've tried a couple of times just for fun. I, I, it's actually I've never got good results. In fact I got terrible results.

15:42Probably to be honest, prank my wife and my friends, like use a, use a robobot calling service and see if I could trick, you know, that's just the kind of person I Am. I'm. I am a prankster.

16:16Wait, no, actually, I have an Irish accent. This is how I speak. And this is my theory. Anyway, I don't know if it stands up to scrutiny, but it just doesn't shift the center point far enough.

16:41And I actually found, to my surprise with Chatterbox, that as I went up towards, like, I remember for the first while in the podcast, I was actually really completely stopped, now that I think about it.

17:07And it was problematic. And I was like, trying to figure out what was going wrong. And I think the. Through trial and error, I actually overshot the training for Chatterbox.

17:34I guess there was conflicts in the training data basically create a lot of hallucinations. So I think that's enough use cases for this file. Licensing open source.

18:01I want to narrate something that is, like, in the public good. But do ask me, please receive my consent.

18:34You have to speak lots of short sentences and do the ground truth for each. I already have that data set. I much prefer just trying it out this way.

19:00I wanted to create a mix like an EQ mix because I was doing voiceovers on the podcast. This is, this is, as I said, pretty much just like minus the noise cancellation. I forgot to turn off. This is just raw me speaking.

19:28And it did that really well. And I can run this through Claude and say, okay, this is me speaking for 20 minutes. Let's run it through Whisper. Like, what piece do I speak at? What's my wpm?

19:59It's a microphone specific. So this might be my EQ for my OnePlus. It might not hold work as well on a different computer, different microphone, but you might learn some useful things about your own speech.

20:25Great guy and like, he. He walked me through all the settings and it was, it was amazing, but I've forgotten already what it was. So for people getting into this, I think I will have to go now because I badly need to drink some water.

20:51Recorded today. Over and out.