Concatenating PDFs with F# and PdfSharp

In order to test part some new functionality I’m working on I needed to combine a bunch of somewhat arbitrary documents into one. At first I thought about just re-scanning the documents but the scanner attached to my PC is a single-sheet flatbed. I also thought about running them through our sheet-feed scanner but it’s attached to my counterpart’s PC, not shared, and I didn’t want to bother him with it. I’ve been working a bit with the PdfSharp library lately so I decided to do the programmer thing and write a script to concatenate the documents. Of course, this would be a perfect chance to flex my developing F# muscles and came up with what I think is a pretty decent solution.

Before taking a look at the script there are a few things to note:

  • The script is intended to be run from within a folder that contains the documents that you want to concatenate. This is mainly because I had a bunch of documents I wanted to combine and including the full path to each one quickly became unwieldy.
  • I don’t check that files exist or that what’s supplied is actually a PDF. PdfSharp will throw an exception in those cases
  • I needed something quick so I hard-coded the destination file name as Result.pdf.
  • fsi.CommandLineArgs includes the name of the script as the first array item. The easiest thing I could think of for getting a list that didn’t include the script name was to create a new list with Array.toList and grab the Tail.
  • The PdfPages class is built around IEnumerable rather than IEnumerable<T> so I needed to use a comprehension to build the pages list.

Script:

// PdfConcat.fsx

#r "PdfSharp.dll"

open System;
open System.IO
open PdfSharp.Pdf
open PdfSharp.Pdf.IO

let readPages (sourceFileName : string) =
  use source = PdfReader.Open(sourceFileName, PdfDocumentOpenMode.Import)
  [ for p in source.Pages -> p ]

let createDocFromPages pages =
  let targetDoc = new PdfDocument()
  pages |> List.iter (fun p -> targetDoc.Pages.Add p |> ignore)
  targetDoc

let docNames = (Array.toList fsi.CommandLineArgs).Tail
let workingDirectory = Environment.CurrentDirectory;
let targetFileName = Path.Combine(workingDirectory, "Result.pdf")

let allPages =
  [ for n in docNames -> Path.Combine(workingDirectory, n) ]
  |> List.map readPages
  |> List.concat

let doc = createDocFromPages allPages
doc.Save targetFileName
doc.Dispose()

Usage:

D:\ScannedDocuments>fsi D:\Dev\FSharp\PdfConcat\PdfConcat.fsx "07-09-2012 05;29;44PM.PDF" "07-11-2012 08;47;13PM.PDF" "07-11-2012 08;48;24PM.PDF" "07-11-2012 08;50;27PM.PDF"
Advertisements