Email Bot

I would like to set up a bot to retrieve email addresses from journal articles of interest. The process would be:

  1. Query for all articles in the last 7 days that mention my subject of interest (soe) e.g. genome wide association studies.
  2. Pull out all authors and their email addresses, if provided.
  3. For missing author address, query for N additional articles by the author, and find those where the author is first or last author, a position most likely to provide and email address. Update the database with the address.
  4. Send a custom email to each author advertising my product.

For this project I will be using Guile and its http-client method to screen scrape data. Set up an environment with:

1
mbc@xps:~/projects/conman$ guix environment --network --expose=/etc/ssl/certs/  --manifest=manifest.scm

The manifest looks like:

manifest.scm
1
2
(specifications->manifest
'("guile" "guile-lib" "coreutils" "gawk" "sed" "findutils" "glibc" "grep" "openssl" "gnutls" "curl" "emacs" "libcanberra" "guile-dbi"))

In one method I obtained an error I had great difficulty debugging. Error is presented in bytecode. Eventually I decided to convert the bytecode to text. Error message:

1
2
3
scheme@(guile-user)> 
ice-9/boot-9.scm:1669:16: In procedure raise-exception:
In procedure string-length: Wrong type argument in position 1 (expecting string): #vu8(123 34 101 114 114 111 114 34 58 34 65 80 73 32 114 97 116 101 32 108 105 109 105 116 32 101 120 99 101 101 100 101 100 34 44 34 97 112 105 45 107 101 121 34 58 34 49 48 48 46 48 46 49 57 54 46 50 48 55 34 44 34 99 111 117 110 116 34 58 34 52 34 44 34 108 105 109 105 116 34 58 34 51 34 125 10)

To decode:

1
2
3
4
(use-modules (ice-9 iconv))
(bytevector->string #vu8(123 34 101 114 114 111 114 34 58 34 65 80 73 32 114 97 116 101 32 108 105 109 105 116 32 101 120 99 101 101 100 101 100 34 44 34 97 112 105 45 107 101 121 34 58 34 49 48 48 46 48 46 49 57 54 46 50 48 55 34 44 34 99 111 117 110 116 34 58 34 52 34 44 34 108 105 109 105 116 34 58 34 51 34 125 10) "utf8")

=> "{\"error\":\"API rate limit exceeded\",\"api-key\":\"100.0.196.207\",\"count\":\"4\",\"limit\":\"3\"}\n"

see https://www.ncbi.nlm.nih.gov/books/NBK25497/ for a discussion of API keys

Share