How to parse an ISO Date string


One of the questions most often asked is how to parse a date string. Since there are so many different formats in common use, we won't try to cover them all. Instead we will concentrate on the ISO 8601 standard representation.

ISO 8601 specifies the string format...

YYYY-MM-DDTHH:MM:SS:shh:mm

Where...

The characters '-', 'T', ':', and '+' are separators, supposedly meant to enhance human readability of the string. In practice, the 'T' is often replaced with a space, which actually does enhance human readability.

The full string for the moment of this writing is 2015-09-05T06:32:28:-04:00 , which looks better, at least to me, as 2015-09-05 06:32:28:-04:00.

The standard allows for any lower order elements to be left out. The time zone is often left out, as in 2015-09-05 06:32:28 . In this case, the standard says we assume the string represents local time. We could also shorten it further. Leaving out the seconds, 2015-09-05 06:32 would mean we assume 0 seconds. We could leave out the time with 2015-09-05, in which case we would assume midnight.

An item of interest is that any element can be extended in precision by writing it as a float. For example, noon of September 5 could be written as 2015-09-05.5 . This is very rarely done, and really complicates things, so we will not implement that. Another item is that the ':' separator in the time zone may be left out, which is not a good thing for readability, man nor machine. Speaking of time zone, UTC can be specified by simply writing 'Z' (leaving out the colon separator). While all this sounds complicated enough, there remains the fact that humans are sloppy and forgetful. It is all too easy to write 'U' or 'UTC' or 'GMT', instead of 'Z', for example.

We can accomodate most of these things easily, simply by changing (almost) all non-numeric characters to a space, creating a series of space delimited 'tokens'. We can then parse the string token by token, stuffing the numbers into the right places as we go.

So lets get to writing some code. Our function prototype will look like this...

char parseIsoDate(char * isostring, struct tm * result);

This function takes as arguments the zero terminated string to be parsed, and a pointer to struct tm into which we shall stuff the numbers. It will return the number of elements parsed.

#include <time.h>
#include <string.h>
char parseIsoDate(char * isostring, struct tm * result) {

  time_t temp;
  int i = 0;
  char c;

  /* replace certain characters in isostring */
  do {
    c = isostring[i];
    if ( c == 'T') c = ' ';
    if (  c == 'U'  ||  c == 'G' ) c = 'Z';
    isostring[i++] = c;
  } while (c);

  /*  prepare the result */
  result->tm_year = 0;
  result->tm_mon = 0;
  result->tm_mday = 1;
  result->tm_hour = 0;
  result->tm_min = 0;
  result->tm_sec = 0;
  result->tm_isdst = -1;

  /* get the first token (year) */
  char count = 0;
  char * p;
  char * dash = "-";
  char * space = " ";
  char * colon = ":";
  char * token = strtok_r(isostring, dash, &p);

  do {


    if (token == NULL) break;
    result->tm_year = atoi(token) - 1900;
    count++;

    /* continue with month, day, yada... */
    token = strtok_r(NULL, dash, &p);
    if (token == NULL) break;
    result->tm_mon = atoi(token) - 1;
    count++;

    token = strtok_r(NULL, space, &p);
    if (token == NULL) break;
    result->tm_mday = atoi(token);
    count++;

    token = strtok_r(NULL, colon, &p);
    if (token == NULL) break;
    result->tm_hour = atoi(token);
    count++;

    token = strtok_r(NULL, colon, &p);
    if (token == NULL) break;
    result->tm_min = atoi(token);
    count++;

    // at this point we don't know if our next delimiter is a colon, 'Z', or nothing
    // in any event, it will be non-numeric, so we can safely take the seconds
    result->tm_sec = atoi(p);


    token = p;

    if(*token){
      count++;
      do{
        token++;
        c = *token;
        if(c == 0  ||  c == 'Z'  ||  c == ':') break;
      }while(1);
    }

    break;
  } while (1);


  // if c is non-zero, we have a timezone to parse
  if(c){
    int h = 0;
    int m = 0;

    if(c == ':'){ // there is a colon delimiter
      token = strtok_r(NULL, colon, &p);
      h = atoi(token);
      m = atoi(p);
    }

    else if(c != 'Z'){
      // there is no delimiter
      // if it is just hours, p[1] or p[2] will be zero
      if( p[1] == 0  ||  p[2] == 0 ){
        h = atoi(p);
      }
      else{ // 4 digit zone
        token = &p[2];
        m = atoi(token);
        p[2]=0;
        h = atoi(p);
      }
    }

    // if h is negative we must negate m also
    if( h < 0 ) m = -m;

    temp = mk_gmtime(result);
    long tz = h * 3600L + 60L * m;
    temp -= tz;

  }




  // if there was no timezone specified , we assume isotime represents local time
  else {
    temp = mktime(result);
  }

  // normalize result
  localtime_r(&temp, result);
  return count;
}

This function, while not completely ISO compliant, will properly parse the most common use cases.
Questions and comments can be sent to