AttachmentDownloader - Keep Your Documents Safe

Download and organize all the important documents sent to you locally.

Introduction

It's common sense that we should keep our documents organized. Moreover, documents can prove you paid an important bill, like your rent. When I moved to Australia, the estate agency sent me the payment receipt by email, which is fine. However, the receipt was a link in the email's body.

This approach adopted by the estate agency has two main problems:

  1. Since the document isn't attached to the email (it's just a link), the estate agency can delete it. And for some reason, in the future, I need to prove that I have paid it; I wouldn't be able to do so - It's my word against theirs.

  2. Downloading the document is an active process (I mean, when the document is just attached to your email, you trust it will be there - perhaps you shouldn't - and you don't need to save it locally).

This "pet project" aims to automate the task of downloading the document from a link on the email's body. It would be interesting to consolidate some multithreading patterns in Go that I was studying and use Gmail API.

Installation

If you want to use it with your Gmail account, you gonna need to handle some Google's bureaucracy to let this app access your emails. For instance, let's focus on the App's configuration.

  1. Clone the repository on your machine.

     git clone https://github.com/abaruchi/AttachDownloader.git
    
  2. Create a configuration file (JSON) with the following fields:

    Accounts: It's a JSON array, where each entry is an Account to fetch the emails. A user can define several different Gmail accounts, for example;

    ClientType: Indicates the account type. So far, we have implemented only a connection for Gmail;

    ClientConf: It's specific for every ClientType. For the Gmail client, two fields should be supplied:

    SecretFilePath: This is the credential file you download from Google Console (check the next section for more details).

    TokenFile: The token file is used for later authentication.

    ID: An Unique ID for this email account.

    EmailAddress: The email address associated with this account.

    LocalRootPath: The root directory used to download and store the attachments documents.

    Filters: It's an array with filters to apply to the emails.

    ID: The filter's ID.

    SearchString: It depends on the email Client Type you are using. You can use any search string for Gmail Client to filter out the emails.

    FollowURL: It's a boolean field. When set to false, we download the attachments of the filtered emails. We parse the email body to download a document from a link if set to true.

    BaseURL: The BaseURL is used when the FollowURL is set to true. The user should set the root URL to parse the email body and download. Any URL that doesn't match won’t be downloaded.

     {
       "Accounts": [{
         "ClientType": "gmail",
         "ClientConf": {
           "SecretFilePath": "/var/tmp/gmail_config.json",
           "TokenFile": "token.json"
         },
         "ID": "MyGmailAccount",
         "EmailAddress": "youremail@gmail.com",
         "LocalRootPath": "/tmp/MyGmailAccount",
         "Filters": [
           {
             "ID": "SomeUniqueID",
             "SearchString": "from: someone@somedomain.com",
             "FollowURL": true,
             "BaseURL": "https://baseurlToDownload.com"
           }
         ]
       }]
     }
    
  3. Compile it:

     cd AttachDownloader
     go build
    

Gmail Accounts

Before using this project, setting up your account to allow the Downloader to access your inbox and download the attachments is important. An overview of how it works and its details can be found in Google's document.

The first step is to set up your environment. Note that you must create a project and allow access to the Gmail API. When everything is properly set, download the credential file locally (put that file in a safe location and pay attention not to send it to any versioning system). When downloaded, specify the file's full path in our configuration file (SecretFilePath).

Now, run it as follows:

# Supposing your configuration file is on the same path.
./AttachDownloader download --configfile ./config.json --httpPortFlag :8080 --verbose

It will open your browser, asking to confirm and allow the Downloader to access your inbox. Once you accept it, you will be redirected to localhost:8080.

Then, paste it in your terminal, where you ran the command.

After that, your attachments will be downloaded in the directory you set as LocalRootPath in your configuration file. The emails will be saved in this directory following the rule:

sender_domain_epochTS (for example, will be saved as someone_somedomain.com_timestamp).

Design

As stated at the begging of this post, this is a side project that I developed to practice Go. When I was thinking about how to implement it. In the project's README file, you gonna find some diagrams that explain how it works in more detail.

Next Steps

There are some optimizations that we could implement. For instance:

  • Apply date limit to the Search Filter: The downloader keeps track of the last time you ran it to avoid downloading the same email twice. For that end, we store a file locally and after searching and processing the emails we filter out the emails before the date and store the new ones. We could check the last date it was downloaded and add the after filter to optimize the Gmail API usage.

  • Save files with the correct extension: In the current implementation, we download the files, and then we check the file type to add the correct extension to it. However, this information can be obtained during the download phase, and we could save it with the correct extension (no need to run a special task for that).

  • Command Line Parameters: This app has several different parameters. Most of them are set in the configuration file. We could also provide the same parameters in the command line so we can overwrite/ignore what was in the config file.