About SHA1

Last days I’ve been coding a C implementation of SHA-1 algorithm, I found this very interesting so let’s write about it. This algorithm is used to generate a hash from an input, that can be useful not to store passwords in clear, or to check the integrity of a file (the probability to have the same hash for two different files is weak). I followed the RFC3174, you can find more information in it. I will detail here a C implementation that generates hashes from a string.

Message padding

SHA-1 works by computing blocks of 512 bits using a specific syntax for the last block to treat. If you work with strings, that means you need to cut the string into blocks of 64 chars (as one char is height bits), this is called message padding. Because the world is not perfect, a string is not always a multiple of 64 chars, that’s why the last block needs to store additionnal information about the string.

The last block is composed of 4 parts:

the last part of the string
one single bit set to 1
a variable part filled with zeros
64 bits where the length of the input string will be stored, that allows sha1 to hash at maximum 2^64 bits

To generate this block, the last part of the string must be less than 447 bits (512 - (64 + 1)) to have enough place for the single bit and the length, if this is not the case, the end must be filled with zero blanks and another block must be generated.

Here is a piece of C code that can be used to generate these blocks:

#include <stdio.h>
#include <stdlib.h>

typedef uint unsigned int;

static void
sha1_set_len(char *word, int len)
{
  uint                  ulen = len & 0xFFFFFFFF;

  word[60] = ulen >> 24;
  word[61] = ulen >> 16;
  word[62] = ulen >> 8;
  word[63] = ulen;
}

void
sha1_string(char *input)
{
  char                  word[64];
  int                   i, len, computed, flag;

  len = strlen(input);
  flag = 0;
  for (i = 0; i <= len; i += 64)
    {
      computed = i + 64 > len ? len % 64 : 64;
      strncpy(word, &input[i], computed);
      if (computed != 64)
        {
          memset(&word[computed], 0, 64 - computed);
          word[computed] = (char) 0x80;
          if (computed < 56)
            sha1_set_len(word, len * 8);
          else
            ++flag;
        }
      /* Compute the block here */
    }
  if (flag > 0)
    {
      memset(word, 0, 64);
      sha1_set_len(word, len * 8);
      /* Compute the block here */
    }
}

Some notes about this piece of code, as we are dealing with bytes, if we have the place for a single bit in the block, we also have the place for 8 bits, that’s why we set the last bit to one by assigning one char (8 bits) to 0x80 (1000 0000 in binary).

To set the len, we use bitwise operations: we could perform a memcpy of something like this, but this would’nt be compatible everywhere for endianness issues. Note that this implementation only stores 32 bits, that limits the size of the hashed input to 2^32 bits, which is enough for a string.

Generation of the hash

Structure and initialization

The hash is made by performing several operations on each block, using a temporary buffer of 2048 bits: W, and two little buffers of 160 bits: A and H.

As bitwise operations will be performed on these buffers, we need to work with unsigned int arrays. It’s also a good idea to gather it in a structure because it will be used in several functions:

typedef struct          sha1_s   /* sha1 structure */
{
  uint                  w[80];   /* 16 + 64, used to compute */
  uint                  h[5];    /* contains H0 H1 H2 H3 H4 H5 */
  uint                  a[5];    /* contains A B C D E */
}                       sha1_t;

#define A               sha1->a[0]
#define B               sha1->a[1]
#define C               sha1->a[2]
#define D               sha1->a[3]
#define E               sha1->a[4]

#define H(x)           sha1->h[x]

W and A do not require any initialization: both will be cleared before computing each block. H is initialized with some constants defined in the RFC, so all we need to do at creation of sha1_t structure is:

static void
sha1_initialize(sha1_t *sha1)
{
  sha1->h[0] = 0x67452301;
  sha1->h[1] = 0xEFCDAB89;
  sha1->h[2] = 0x98BADCFE;
  sha1->h[3] = 0x10325476;
  sha1->h[4] = 0xC3D2E1F0;
}

We’ll use three functions to get a clean code:

The left circular shift: S^n(X) = (X << n) OR (X >> 32-n)

static uint
sha1_op_left_shift(uint n, uint x)
{
  return (x << n) | (x >> (32 - n));
}

A function that performs different kind of bitwise operations on buffer A according to the current rank of W:

static uint
sha1_f(int t, sha1_t *sha1)
{
  if (t < 20)
    return (B & C) | ((~B) & D);
  if (t < 40)
    return B ^ C ^ D;
  if (t < 60)
    return (B & C) | (B & D) | (C & D);
  return B ^ C ^ D;
}

Another function that depends on the current rank in W, that will returns a different constant K:

static uint
sha1_k(int t)
{
  if (t < 20)
    return 0x5A827999;
  if (t < 40)
    return 0x6ED9EBA1;
  if (t < 60)
    return 0x8F1BBCDC;
  return 0xCA62C1D6;
}

For more information about these three functions, I invite you to have a look at the RFC (where there are used as defines, not functions).

Now we are ready, let’s create the hash

There are two available methods to generater the hash, we will use the first one defined in the RFC.

At beginning, a first pass is made on W: the block is copied at the beginning of W, then a left circularshift is made to the end of W. This way, we get all ranks of W initialized in a manner that is strongly linked to the block (and so the input string).

#define W(t)    sha1->w[t]

static void
sha1_compute_block_first_pass(uchar *block, sha1_t *sha1)
{
  for (t = 0; t < 16; t++)
    {
      W(t) = block[t * 4] << 24;
      W(t) |= block[t * 4 + 1] << 16;
      W(t) |= block[t * 4 + 2] << 8;
      W(t) |= block[t * 4 + 3];
    }
  for (t = 16; t < 80; t++)
    {
      W(t) = sha1_op_left_shift(1, W(t - 3) ^ W(t - 8) ^ W(t - 14) ^ W(t - 16));
    }
}

We perform a second pass that aims at generating 5 values (A,B,C,D,E) according to the content stored in W.

We first initialize A,B,C,D,E to H0,H1,H2,H3,H4

Then, for each rank of W, we make different kind of bitwise operationson the rank we are computing. Finally, we add to each rank of H the freshly generated corresponding value in A :

void
sha1_compute_block_second_pass(uchar *block, sha1_t *sha1)
{
  for (i = 0; i < 5; i++)
    {
      sha1->a[i] = sha1->h[i];
    }
  for (t = 0; t < 80; t++)
    {
      tmp = sha1_op_left_shift(5, A) + sha1_f(t, sha1) + E + W(t) + sha1_k(t);
      E = D;
      D = C;
      C = sha1_op_left_shift(30, B);
      B = A;
      A = tmp;
    }
  for (i = 0; i < 5; i ++)
    {
      sha1->h[i] += sha1->a[i];
    }
}

Fetching the result

That’s it, most of the work is done. After processing each blocks of the string, the hash is the hexadecimal representation of bits stored in H, we can easily get it like this:

static char *
sha1_get_hash(sha1_t *sha1)
{
  char                  result[40 + 1];

  snprintf(result, 41, "%8x%8x%8x%8x%8x", H(0), H(1), H(2), H(3), H(4));
  return (strdup(result));
}