favicon mika @ schillinger's lab

Removing watermarks and other clutter from manuscripts

June 22, 2022, 12:51 PM

Consider reading a manuscript that includes automatically generated watermarks/metadata on each page, which covers some of the text. Additionally, the document is split into several PDFs. After the third page of the second PDF referencing something from the first PDF you might wonder, if you can remove that clutter and merge the cleaned PDFs.

#!/usr/bin/env bash
## file: script.sh

## directory structure
# $ tree
#   .
#   ├── 1.pdf
#   ├── 2.pdf
#   ├── 3.pdf
#   ├── 4.pdf
#   ├── orig_merged.pdf
#   ├── script.sh
#   └── workdir
#       ├── 1.pdf
#       ├── 2.pdf
#       ├── 3.pdf
#       ├── 4.pdf
#       ├── clean_compressed.pdf
#       └── clean.pdf
#   
#   1 directory, 12 files

mkdir -p workdir

for ...

Read more...