Awk Mail Merge
By funcsec
- 6 minutes read - 1271 wordsI had a project where I needed to reach out to a number companies to solicit bids for a catering contract. I found that sometimes the companies would expose their email addresses on their websites or have contact us forms. Email addresses directly on website, especially when they are readable by bots are a good way to get a large amount of spam, but it was quite helpful for this project.
I wanted to write a small program that would email the companies or generate text to copy paste into their contact us forms on their respective websites.
There are a number of great resources that already cover mail merges
with awk
1, 2. I wanted to play around with awk
to add another
general office tool into my script library.
If you don’t already know, awk
and its cousins like gawk
are used
for pattern scanning and processing. awk
is a programming language
typically used in bash scripts.
You can see all the code for this on Github.
Mail merge template
A template file was generated. The fields were separated with curly
brackets, {email}
and {name}
. At the top of the file the email
header material was added, but was stripped away from the normal email
From: me <me@example.com>
To: <{email}>
Subject: Catering for event - {name}
Hello {name},
Hope you’re all having a great day. We’re exploring our
options for catering. We are looking for catering for breakfast and lunch.
What options do you provide and what is your pricing?
Thanks,
me
me@example.com
company
This file was saved as template-example.txt
.
Contacts file
The second part of the mail merge was the contacts file; the list of
people who were to receive the template. The document was created as a
CSV
file, and had the two fields that were present in the mail merge
template, name and email. The 3rd field was a link to the website if
they only had a contact form and no email address on their website,
good for reducing bot spam but bad for this mail merge.
name,email,url
Awesome Vendor,vendor@vendoremail.com,https://vendoremail.com
Best Vendor,vendor@bestvendor.com,https://bestvendor.com
This file was saved as example-contacts.csv
With these 3 pieces, all of the data was in place. What was needed next was the application configurations and scripting to glue it together.
Msmtp configuration
msmtp
was used as the mail transfer agent(MTA), a program that can
talk to a remote SMTP server. It can be found in most linux distro
repositories. To install it under Debian/Ubuntu, the following apt
command can be used.
sudo apt install msmtp
The MTA needed to be configured to use my SMTP email provider. The
following configuration was used to allow msmtp
to authenticate to
the email provider to send email. The file below was used to connect
to gmail. Typically, this file is stored at ./msmtprc
, however for
this example the file was saved at ./msmtprc-example
.
account default
host smtp.gmail.com
port 587
protocol smtp
auth on
from me@example.com
user me@example.com
password password
tls on
tls_starttls on
tls_trust_file /etc/ssl/certs/ca-certificates.crt
If you’re going to try this for yourself it’s best to test msmtp
beforehand to make sure it is correctly authenticating and able to
send email through your SMTP/email provider.
Awk script
awk
was used for the find and replace action needed to generate the
customized email. It did this by taking the content from a single line
pulled out of the CSV, used it to replace the name and email field data
in the template, then output the new data by containing the field
substitutions which are then sent over to the MTA msmtp
.
This awk script technically would also work for parsing the entire CSV at once, but then it would not allow for individually sending the resulting customized emails unless they were output to a file and then the file read back into the script with another loop that would send them. This solution of having one loop seemed better.
The variable template
in the following script was defined as a command line
variable later in the bash script.
This file was saved as ./awk-merge.awk
.
#!/usr/bin/awk -f
BEGIN {
FS=",";
}
{
name=$1;
email=$2;
while ( (getline ln < template) > 0 )
{
sub(/{name}/,name,ln);
sub(/{email}/,email,ln);
print ln;
}
close(template);
exit;
}
FS=",";
told awk
that commas were used as field separators; the
default is to use spaces. The script takes 3 inputs, the name, the
email, and the location. The location of the template is defined as a
variable later in the bash script.
Bash script
To glued together this whole thing, a bash script was created to
iterate through the CSV document line by line, outputting a single CSV
line to awk
which would process and replace the fields. The
output would be piped to msmtp
to be sent.
The first section of the bash script simply allows for other variables to be set for the contacts CSV, the template, and the msmtp configuration. This was added to make it easier to reuse the code in the future.
It can be reused by indicating other contact, template, and msmtp configuration files.
./awk-merge.sh [contact] [template] [msmtp]
If I ever use this script again I will likely make these individual flags rather than variables so that I can set them independently. This was a quick and dirty solution.
The file created was called ./awk-merge.sh
.
#!/usr/bin/env bash
# awk merge
# mass mailer using GNU/Linux tools
#set -euo pipefail
[[ -z $3 ]] && MSMTPCONF="./msmtprc-example" || MSMTPCONF="$3"
[[ -z $2 ]] && TEMPLATE="./example-template.txt" || TEMPLATE="$2"
[[ -z $1 ]] && CONTACTS="./example-contacts.csv" || CONTACTS="$1"
CONTACTSTMP="$(mktemp)"
ACCOUNT="default"
With that the default or custom variables were set.
The example contacts document above had a header indicating what each
of the CSV columns contained. Obviously the email address “email” is
invalid, so the 1st line of the CSV contacts document needed to be
ignored. This was done using the tail
command to output to a
temporary file that was then used later in the script.
tail -n +2 "$CONTACTS" > "$CONTACTSTMP"
The real meat of the script was a while loop that iterated over the lines of the new temporary contacts document which lacks the header row. The script has two branches. If the CSV input does not have anything for the email field, then it sends the URL field to filefox and outputs the generated text to STDOUT. If the input does have something in the email field, then it generates the email and sends it, skipping opening firefox and outputting the text.
while read -r i; do
EMAIL="$(echo $i | awk -F, '{ print $2 }')"
if [ -x $EMAIL ]; then
echo "$i" | awk -f ./awk-merge.awk -v template="$TEMPLATE"
firefox "$(echo $i | awk -F, '{ print $3 }')"
echo "==========================================="
else
echo "$i" | awk -f ./awk-merge.awk -v template="$TEMPLATE" \
| msmtp -a "$ACCOUNT" "$EMAIL"
fi
sleep 5
done < "$CONTACTSTMP"
The sleep command at the end is not actually necessary.
After the loop was done, the temporary contacts file needed to be removed.
rm "$CONTACTSTMP"
Conclusion
A mail merge script is pretty handy to have. It can save a time and money when used to replace a mail merge program like Mail Chimp or Constant Contact. It should be noted that email/SMTP providers often have rate limits to prevent spammers from decreasing the reputation of their service. The sleep command at the end of the loop can be used slow down the sending of email so that it is under the threshold of the email provider.
You can see all the code for this on Github.
-
Advance your awk skills with two easy tutorials: https://opensource.com/article/19/10/advanced-awk ↩︎
-
Mail merge: https://pmitev.github.io/to-awk-or-not/Exercises/mail-merge/ ↩︎