In order to test part some new functionality I’m working on I needed to combine a bunch of somewhat arbitrary documents into one. At first I thought about just re-scanning the documents but the scanner attached to my PC is a single-sheet flatbed. I also thought about running them through our sheet-feed scanner but it’s attached to my counterpart’s PC, not shared, and I didn’t want to bother him with it. I’ve been working a bit with the PdfSharp library lately so I decided to do the programmer thing and write a script to concatenate the documents. Of course, this would be a perfect chance to flex my developing F# muscles and came up with what I think is a pretty decent solution.
Before taking a look at the script there are a few things to note:
- The script is intended to be run from within a folder that contains the documents that you want to concatenate. This is mainly because I had a bunch of documents I wanted to combine and including the full path to each one quickly became unwieldy.
- I don’t check that files exist or that what’s supplied is actually a PDF. PdfSharp will throw an exception in those cases
- I needed something quick so I hard-coded the destination file name as Result.pdf.
- fsi.CommandLineArgs includes the name of the script as the first array item. The easiest thing I could think of for getting a list that didn’t include the script name was to create a new list with Array.toList and grab the Tail.
- The PdfPages class is built around IEnumerable rather than IEnumerable<T> so I needed to use a comprehension to build the pages list.
// PdfConcat.fsx #r "PdfSharp.dll" open System; open System.IO open PdfSharp.Pdf open PdfSharp.Pdf.IO let readPages (sourceFileName : string) = use source = PdfReader.Open(sourceFileName, PdfDocumentOpenMode.Import) [ for p in source.Pages -> p ] let createDocFromPages pages = let targetDoc = new PdfDocument() pages |> List.iter (fun p -> targetDoc.Pages.Add p |> ignore) targetDoc let docNames = (Array.toList fsi.CommandLineArgs).Tail let workingDirectory = Environment.CurrentDirectory; let targetFileName = Path.Combine(workingDirectory, "Result.pdf") let allPages = [ for n in docNames -> Path.Combine(workingDirectory, n) ] |> List.map readPages |> List.concat let doc = createDocFromPages allPages doc.Save targetFileName doc.Dispose()
D:\ScannedDocuments>fsi D:\Dev\FSharp\PdfConcat\PdfConcat.fsx "07-09-2012 05;29;44PM.PDF" "07-11-2012 08;47;13PM.PDF" "07-11-2012 08;48;24PM.PDF" "07-11-2012 08;50;27PM.PDF"