eXtream Software Development Forum

Full Version: Large file concatenation
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Greetings from Florida!

If a long recording requires it, USB Audio Recorder PRO produces multiple 2GB wav files since Android's largest file size is apparently 4GB.  The files are named with successive appended letter "a"s; for example,
Record_2017_Jun_27_221713.832_2.wav
Record_2017_Jun_27_221713.832_2a.wav
Record_2017_Jun_27_221713.832_2aa.wav
Record_2017_Jun_27_221713.832_2aaa.wav

What desktop tools will assemble these files into a single file for editing by Audacity?


Tom
Cape Coral
Audacity itself can probably do it.
Yes, it can be done within Audacity by opening each file successively or concurrently as new tracks, then select each segment and cut-and-paste them together in one track and exporting the large wav.  I am doing that [later edit: unsuccessfully] in Linux now and, I suspect, Audacity in Windows (with NTFS or exFat drives) should [edit: but doesn't] work, too.

Ideally, I'd like a CLI one-liner or GUI equivalent.  A search suggests that, in Linux, sox can do it at the command line with simply
Code:
sox filein.wav filein2.wav filein3.wav fileout.wav
but, in my attempts on Mint Debian, four files that should become one 7.3GB file instead become ~750MB or, if I use some switches to explicitly specify 192kHz 24-bit output like the input files, 10.3GB - neither correct.  [edit: that sox command will work if both input and output files are <4GB.]

My Windows command line skills have become a little rusty (how ironic; that's what kept me away from Linux until I made the commitment) so I haven't tried on it, yet.  Searching again suggests the same sox command line in Windows and some DirectShow headache.

Has anyone found another solution, preferably in Linux?  Thanks.
Ah Ha!

It appears that the underlying difficulty is that wav file sizes, like Android and FAT32 drives, cannot exceed 4GB.  Even though Audacity can assemble <4GB wavs into a single track of virtually unlimited length - and it falsely appears to be able to export a file of the apparently correct size with the .wav extension - the file will not subsequently correctly import to the correct length; it is crippled.

A solution is to export the assembled track(s) as a .flac.  FLAC files are compressed but nevertheless lossless and don't appear to have any practical length limit.  They are also surprisingly smaller - in my short experience with them, anyway.  The 2GB, 2GB, 2GB and 1.3GB files I assembled in the previous message export from Audacity as a faulty 7.3GB wav file; they exported as a 3.4GB flac, though, that was both bit-accurate and much faster loading. 

So, I am now looking for a desktop tool that will import a series of <4GB wavs and write a flac of those assembled wavs.
Well, sox can do it after all, at the command line of both Linux and Windows:
Code:
sox filein.wav filein2.wav filein3.wav -C5 fileout.flac
... will concatenate the three input wav files and produce a flac of compression level 5 (0 is least compression, 8 is highest).  My faulty 7.3GB wav was replaced by a 3.4GB (compression level 8) or 3.9GB (level 0).  Level 8 takes more time to produce, and the size difference in my test is not very significant so I've opted for the default 5; YMMV.  Regardless of the compression level used, flac uncompressed file data will be identical to the original.

In both Linux and Windows, too, Audacity can export a selected assembled track in flac; compression level is set in the export options.

I'm working on a Linux bash script that will handle the trailing "a"s that USB Audio Recorder PRO produces in successive filenames so correctly sequencing a collection of 2GB files in a folder should be easier.  I suppose a Windows batch file could do the same.
Rather than try to concatenate a series of .wavs by "a"-suffixed filenames, e.g.
Record_2017_Jun_27_221713.832_2.wav
Record_2017_Jun_27_221713.832_2a.wav
Record_2017_Jun_27_221713.832_2aa.wav
Record_2017_Jun_27_221713.832_2aaa.wav
... etc.,

Linux ls command switches permit listing files in time-of-last-modification (in this case, the time of creation) sequence, newest first; reversing that produces the correct sequence to concatenate, so this CLI command will yield a proper .flac of the wavs in the current folder:
Code:
sox $(ls -lcrt *.wav | sed -n "s/.*:..//p") -C5 outfile.flac

I tested this with a 7.5-hour 192kHz 24-bit stereo recording:

-rwxr----- 1 gtbecker gtbecker  2097395348 Jul  4 22:58 Record_2017_Jul_04_142431.014_1.wav
-rwxr----- 1 gtbecker gtbecker  2097394448 Jul  4 23:00 Record_2017_Jul_04_142431.014_1a.wav
-rwxr----- 1 gtbecker gtbecker  2097394334 Jul  4 23:01 Record_2017_Jul_04_142431.014_1aa.wav
-rwxr----- 1 gtbecker gtbecker  2097394298 Jul  4 23:02 Record_2017_Jul_04_142431.014_1aaa.wav
-rwxr----- 1 gtbecker gtbecker  2097394292 Jul  4 23:04 Record_2017_Jul_04_142431.014_1aaaa.wav
-rwxr----- 1 gtbecker gtbecker  2097394400 Jul  4 23:05 Record_2017_Jul_04_142431.014_1aaaaa.wav
-rwxr----- 1 gtbecker gtbecker  2097394520 Jul  4 23:06 Record_2017_Jul_04_142431.014_1aaaaaa.wav
-rwxr----- 1 gtbecker gtbecker  2097394532 Jul  4 23:08 Record_2017_Jul_04_142431.014_1aaaaaaa.wav
-rwxr----- 1 gtbecker gtbecker  2097394538 Jul  4 23:09 Record_2017_Jul_04_142431.014_1aaaaaaaa.wav
-rwxr----- 1 gtbecker gtbecker  2097394532 Jul  4 23:11 Record_2017_Jul_04_142431.014_1aaaaaaaaa.wav
-rwxr----- 1 gtbecker gtbecker  2097394532 Jul  4 23:12 Record_2017_Jul_04_142431.014_1aaaaaaaaaa.wav
-rwxr----- 1 gtbecker gtbecker  2097394514 Jul  4 23:13 Record_2017_Jul_04_142431.014_1aaaaaaaaaaa.wav
-rwxr----- 1 gtbecker gtbecker  2097394538 Jul  4 23:15 Record_2017_Jul_04_142431.014_1aaaaaaaaaaaa.wav
-rwxr----- 1 gtbecker gtbecker  2097394550 Jul  4 23:16 Record_2017_Jul_04_142431.014_1aaaaaaaaaaaaa.wav
-rwxr----- 1 gtbecker gtbecker  2097394592 Jul  4 23:17 Record_2017_Jul_04_142431.014_1aaaaaaaaaaaaaa.wav
-rwxr----- 1 gtbecker gtbecker   456523776 Jul  4 23:18 Record_2017_Jul_04_142431.014_1aaaaaaaaaaaaaaa.wav
-rw-r--r-- 1 gtbecker gtbecker 20604433119 Jul  5 15:49 outfile1.flac

The 20.6GB flac (of 32GB of wavs) can then be loaded in Audacity for editing.

One complication arises for multitrack recordings like 4-channel which USB Audio Recorder PRO creates as two stereo pairs.  The first two tracks' (first stereo pair) filename contains a "1" before the "a"s; the second pair contains "2"; both files have the same timestamp.  The sox command above will require separating them into two folders, lest the four tracks become one stereo flac of interspersed files.  I'll work on that.
Short of a nice one-liner, here is a process that works for me - in Linux.

The Android drive that contains USB Audio Recorder PRO recordings is, on my machine, /dev/sdh1; I move an SD card but an MTP USB connection might also work.  I mount it to /mnt/test, copy the files (retaining timestamps) to two directories (1 and 2, each of which holds a stereo pair of four recorded tracks) in ~Desktop/RecordingWork.  sox then concatenates the files in each directory and produces a flac for each, which can then be imported to Audacity which can then save a project of four tracks.  Here, the recording is contained in three pairs of 2GB files and lesser fragments (~14GB) resulting in a pair of ~4.5GB flacs:
Code:
$ sudo mkdir -p ~/Desktop/RecordingWork/1
$ sudo mkdir -p ~/Desktop/RecordingWork/2
$ sudo mkdir -p /mnt/test

$ sudo mount /dev/sdh1 /mnt/test
$ cp -p /mnt/test/USBAudioRecorderPRO/Recordings/*_*_*_*_*_1*.wav ~/Desktop/RecordingWork/1  # -p retains timestamps
$ cp -p /mnt/test/USBAudioRecorderPRO/Recordings/*_*_*_*_*_2*.wav ~/Desktop/RecordingWork/2
$ sudo umount /mnt/test
$ cd ~/Desktop/RecordingWork
$ sox $(ls -lcrt ./1/*.wav | sed -n "s/.*:..//p") -C0 ./1/outfile_1.flac
$ sox $(ls -lcrt ./2/*.wav | sed -n "s/.*:..//p") -C0 ./2/outfile_2.flac

$ audacity ./1/outfile_1.flac ./2/outfile_2.flac

This listing is the same but shows intermediate and resulting files:
Code:
$ sudo mkdir -p ~/Desktop/RecordingWork/1
$ sudo mkdir -p ~/Desktop/RecordingWork/2
$ sudo mkdir -p /mnt/test

$ sudo mount /dev/sdh1 /mnt/test
 $ ls -l /mnt/test
      131072 Jun 26 12:39 Android
      131072 Jun 29 20:55 DCIM
      131072 Jun 26 12:52 System Volume Information
      131072 Jun 26 12:49 USBAudioRecorderPRO

 $ ls -l /mnt/test/USBAudioRecorderPRO
      131072 Jun 26 12:49 Firmware
      131072 Jul  7 15:39 Recordings

 $ ls -l /mnt/test/USBAudioRecorderPRO/Recordings
   836466230 Jul  7 15:52 Record_2017_Jul_07_130851.459_1aaa.wav
  2097393950 Jul  7 15:39 Record_2017_Jul_07_130851.459_1aa.wav
  2097393986 Jul  7 15:09 Record_2017_Jul_07_130851.459_1a.wav
  2097393440 Jul  7 14:39 Record_2017_Jul_07_130851.459_1.wav
   836466230 Jul  7 15:52 Record_2017_Jul_07_130851.459_2aaa.wav
  2097393950 Jul  7 15:39 Record_2017_Jul_07_130851.459_2aa.wav
  2097393986 Jul  7 15:09 Record_2017_Jul_07_130851.459_2a.wav
  2097393440 Jul  7 14:39 Record_2017_Jul_07_130851.459_2.wav

$ cp -p /mnt/test/USBAudioRecorderPRO/Recordings/*_*_*_*_*_1*.wav ~/Desktop/RecordingWork/1  # -p retains timestamps
 $ ls -lcrt ~/Desktop/RecordingWork/1  # -crt sorts chronologically
  2097393440 Jul  7 14:39 Record_2017_Jul_07_130851.459_1.wav
  2097393986 Jul  7 15:09 Record_2017_Jul_07_130851.459_1a.wav
  2097393950 Jul  7 15:39 Record_2017_Jul_07_130851.459_1aa.wav
   836466230 Jul  7 15:52 Record_2017_Jul_07_130851.459_1aaa.wav

$ cp -p /mnt/test/USBAudioRecorderPRO/Recordings/*_*_*_*_*_2*.wav ~/Desktop/RecordingWork/2
 $ ls -lcrt ~/Desktop/RecordingWork/2
  2097393440 Jul  7 14:39 Record_2017_Jul_07_130851.459_2.wav
  2097393986 Jul  7 15:09 Record_2017_Jul_07_130851.459_2a.wav
  2097393950 Jul  7 15:39 Record_2017_Jul_07_130851.459_2aa.wav
   836466230 Jul  7 15:52 Record_2017_Jul_07_130851.459_2aaa.wav

$ sudo umount /dev/test
$ cd ~/Desktop/RecordingWork
$ sox $(ls -lcrt ./1/*.wav | sed -n "s/.*:..//p") -C0 ./1/outfile_1.flac
 $ ls -lcrt
  4446653264 Jul  8 14:55 outfile_1.flac
  2097393440 Jul  7 14:39 Record_2017_Jul_07_130851.459_1.wav
  2097393986 Jul  7 15:09 Record_2017_Jul_07_130851.459_1a.wav
  2097393950 Jul  7 15:39 Record_2017_Jul_07_130851.459_1aa.wav
   836466230 Jul  7 15:52 Record_2017_Jul_07_130851.459_1aaa.wav

$ sox $(ls -lcrt ./2/*.wav | sed -n "s/.*:..//p") -C0 ./2/outfile_2.flac
 $ ls -lcrt
  4573594185 Jul  8 14:57 outfile_2.flac
  2097393440 Jul  7 14:39 Record_2017_Jul_07_130851.459_2.wav
  2097393986 Jul  7 15:09 Record_2017_Jul_07_130851.459_2a.wav
  2097393950 Jul  7 15:39 Record_2017_Jul_07_130851.459_2aa.wav
   836466230 Jul  7 15:52 Record_2017_Jul_07_130851.459_2aaa.wav

$ audacity ./1/outfile_1.flac ./2/outfile_2.flac
 $ ls -l
        4096 Jul  8 15:20 1
        4096 Jul  8 14:55 2
     4966234 Jul  8 15:19 outfile.aup
        4096 Jul  8 15:19 outfile_data
I hope this is useful to someone.