You are viewing our old blog site. For latest posts, please visit us at the new space. Follow our publication there to stay updated with tech articles, tutorials, events & more.

Parse Emails in PHP

0.00 avg. rating (0% score) - 0 votes

Problem statement :

1.Extracting information from emails in PHP (eg-attachments),

2. Simulating a daemon using PHP.

Extracting information from emails in PHP
Email Protocols
Popular protocols for retrieving mail include POP3 and IMAP4, sending mail is usually done using the SMTP protocol.Another important standard supported by most email clients is MIME, which is used to send binary file email attachments. Attachments are files that are not part of the email proper, but are sent with the email.
In computing, the Post Office Protocol (POP) is an application-layer Internet standard protocol used by local e-mail clients to retrieve e-mail from a remote server over a TCP/IP connection. POP has been developed through several versions, with version 3 (POP3) being the current standard.
Internet Message Access Protocol (IMAP) is a communications protocol for email retrieval. IMAP uses port 143, and IMAP over SSL (IMAPS) uses port 993. IMAP, unlike POP, specifically allows multiple clients simultaneously connected to the same mailbox, and through flags stored on the server, different clients accessing the same mailbox at the same or different times can detect state changes made by
other clients.Support of many flag like Seen flag , which unable to read only unread mails.
The PHP IMAP functions imap_fetchstructure and imap_fetchbody are used to work out the structure of an email and get the message body and attachments,but they can be fiddly to use because the message parts can be nested. So, it’s required to flatten the message parts into a new array, indexed by the part number which can be directly passed to imap_fetchbody. It fiddly to do any recursion into the sub parts of the emails sent using Apple’s mail program (and therefore probably from iOS devices like iPhones and iPads)
Fetching Mails
First connect to the IMAP server and download the message structure for the message (imap_fetchstructure call below)
  $connection = imap_open($server, $login, $password);
  $server can be any mailbox name consists of a server and a mailbox path on this server
  connect with a pop3 mail server  {localhost:110/pop3}INBOX
  connect with a imap  mail server {localhost:993/imap/ssl}INBOX
                                                        
$login and $password are credentials of an authentic user.
then , fetch the structure of the mail to get the information embedded in the mail
$structure = imap_fetchstructure($connection, 1);
different parts of mail can be  fetched using imap_fetchbody() as below
imap_fetchbody($connection, $emailNumber, $partNumber);
below Message Structure defines different parts .
Message structure
The structure of an email message is generally something like this, with part numbers:
1 – Multipart/alternative headers
1.1 – Plain text message
1.2 – HTML version of message
2 – Inline attachment, etc
In Apple Mail it’s like this instead:
1 – Plain text message
2 – Multipart/alternative headers
2.1 – HTML version of the message
2.2 – Inline attachment, etc
If the message has been forwarded, then it will look like this:
1 – Multipart/alternative headers
1.1 – Plain text message
1.2 – HTML version of message
2 – Message/RFC822
2.0 – Attached message header
2.1 – Plain text message
2.2 – HTML version of message
2.3 – Inline attachment, etc
We can also integrate Clamp daemon  for the scanning of  attachments extracted from the mails and stored in the disk.
We need to use imap protocol for fetching mails. imap protocol supports many flag like Seen flag , which unable us to read only unread mails but in pop3 we can’t do this.
Mark mails seen
imap_setflag_full($connection, $email_number, “\Seen \Flagged”);  here $connection is a imap stream connetion
Now search for only unseen mails
imap_search($connection, “UNSEEN “);
Simulating a daemon using PHP
If you want to run a php/any file with certain time gap then you can configure it in Crontab.
But as we know in crontab we can run a file minimum in 1 minute gap.
But if it requires to run a file below than 1 minute gap then you have to run it through  nohup. For  this First create a bash script to run the php file. Then run the bash script with nohup below runs parse.php after interval of 1 second
vi parse.sh
#!/bin/bash
while(true)
do
php -q parse.php 1>&2
sleep 1;
done
We can also write a monitor for the monitoring of the running process.
  function run_in_background($Command, $Priority = 0)
  {
      if($Priority)
          $PID = shell_exec(“nohup nice -n $Priority $Command 2> /dev/null & echo $!”);
      else
          $PID = shell_exec(“nohup $Command 2> /dev/null & echo $!”);
      return($PID);
  }
function is_process_running($PID)
  {
      exec(“ps $PID”, $ProcessState);
      return(count($ProcessState) >= 2);
  }
echo(“Running parser. . .”)
run:
$ps = run_in_background(“php -q parse.php > outfile”);
while(is_process_running($ps))
  {
    echo(” . “);
      ob_flush(); flush();
           sleep(1);
  }
goto run;
But here’s an issue if the monitoring process gets killed may be because of scarcity of the resources , for this we must run the monitor process as a cron which keeps on running at an interval of 1 minute
Still, some shortcomings
We can’t ensure this for each and email service provider .
Like , we can’t find a particular information in a specified part of the mail.Different mail clients provide different customization of mails. So, for Gmail it may be in part 1 but for yahoo it may be in some other part.
So we have to iterate for every parts in the mail. Further parts can be nested also. This requires flattening of the message parts.

5 thoughts on “Parse Emails in PHP

  1. I have been training students on AngularJS for past 6 months, and at times, I have used your blog as reference for the class training and also for my personal project development. It has been so much useful. Thank you, keep writing more 🙂

Comments are closed.