Problem Set 5: Speller

Evidence of a 5-point Submission

  • check function calculates the hash index once and uses a loop with short-circuiting to inspect that index, uses strcasecmp to compare words
  • Implements a hash function that is more complex than sorting by the first n letters of a word or summing the ASCII values of a word, calculates an appropriate value of N
  • load function error handles after malloc and fopen calls, uses fscanf to store the new word directly in the table (without using a buffer)
  • size function returns size in O(1) (handled within the load function)
  • unload function uses recursion or a loop to free the entire table, only declares one or two new variables or functions

Evidence of a 4-point Submission

  • check function calculates the hash index once and uses a loop with short-circuiting to inspect that index, uses strcasecmp to compare words
  • Student attempts to implement a hash function that is more complex than sorting by the first n letters of a word
  • load function function error handles after malloc and fopen calls, may use a buffer to transfer a word into the table
  • size function returns size in O(1) (handled within the load function)
  • unload function uses recursion or a loop to free the entire table, may declare three or four new variables or functions

Evidence of a 3-point Submission

  • check function calculates the hash index once and uses a loop with short-circuiting to inspect that index
  • Student attempts to implement a hash function that is more complex than sorting by first letter
  • load function uses a buffer to transfer a word into the table
  • size function returns size in O(1) (handled within the load function)
  • unload function iterates through each linked list in each element of the table at most once

Evidence of a 2-point Submission

  • Defines N to be a number much larger than required by the given hash function
  • check function calculates the hash index more than once, does not short-circuit, uses strcmp to compare strings
  • hash function is less complex (sorts into fewer indices) than sorting by first letter
  • load function does not check that fopen or malloc have executed successfully, may not fscanf directly into the node, may unnecessarily initialize the table to NULL, may use other extraneous variables and conditionals within the loop
  • size function returns size in O(n)
  • unload function iterates through the table using extraneous variables or computation for each element

Evidence of a 1-point Submission

  • Defines N to be a number much smaller than required by the given hash function
  • check function does not use strcmp or strcasecmp to compare words, may malloc or do other unnecessary computation for each word in the list
  • hash function remains unchanged or may produce an index that is outside of the range of [0, N-1].
  • load function iterates through the dictionary in a way that is confusing or needlessly complicated
  • size function returns size in O(n)
  • unload function uses a confusing method to iterate through the table, may visit elements in the table more than once or skip over elements of the table

Example Implementations (Worse vs. Better)

check Function

Worse Implementation

The example below uses an unnecessary conditional statement

unsigned int hcode = hash(word);
node *trav = table[hcode];

while (trav != NULL)
{
    if (strcasecmp(trav->word, word) == 0)
    {
        return true;
    }
    else
    {
        trav = trav->next;
    }
}

return false;

Better Implementation

The example below uses no extraneous variables or conditionals

node *trav = table[hash(word)];

while (trav != NULL)
{
    if (strcasecmp(trav->word, word) == 0)
    {
        return true;
    }

    trav = trav->next;
}

return false;

Hash Function

Worse Implementation

The example below uses a value of N that will leave empty slots at both the beginning and end of the hash table. It also uses an unnecessary conditional statement

const unsigned int N = 1000000;

unsigned int hash(const char *word)
{
    int sum = 0;

    for (int i = 0; i < 3; i++)
    {
        if (strlen(word) > i)
        {
            sum += toupper(word[i]);
        }
        else
        {
            sum += 0;
        }
    }

    return (sum % N);
}

Better Implementation

Although the hash function itself is not better, the example below picks a more appropriate value for N and uses no unnecessary conditionals or variables. Note that it still wouldn’t be most efficient to call strlen twice.

const unsigned int N = 72;

unsigned int hash(const char *word)
{
    int sum = 0;
    int iterations = strlen(word) < 3 ? strlen(word) : 3;

    for (int i = 0; i < iterations; i++)
    {
        sum += toupper(word[i]) - 'A';
    }

    return sum % N;
}

load Function

Worse Implementation

The example below uses a buffer and an unnecessary conditional statement (there is no need to check if table[bucket] != NULL)

char new_word[LENGTH + 1];

while (fscanf(dict, "%s", new_word) != EOF)
{
    n = malloc(sizeof(node));
    if (n == NULL)
    {
        return false;
    }

    strcpy(n->word, new_word);
    n->next = NULL;

    int bucket = hash(n->word);
    if (table[bucket] != NULL)
    {
        n->next = table[bucket];
    }

    table[bucket] = n;
    size_of_dict += 1;
}

// clean up...

Better Implementation

The example below avoids using a buffer by freading directly into the node

// load dictionary...

while (true)
{
    node *newnode = malloc(sizeof(node));
    if (newnode == NULL)
    {
        return false;
    }

    if (fscanf(source, "%s", newnode->word) == EOF)
    {
        free(newnode);
        break;
    }

    unsigned int hcode = hash(newnode->word);

    newnode->next = table[hcode];
    table[hcode] = newnode;

    dsize++;
}

// clean up...

size Function

Worse Implementation

The example below includes an unnecessary conditional

if (counter > 0)
{
    return counter;
}

return 0;

Better Implementation

The example below includes no unnecessary conditionals

// calculated in load
return dsize;

unload Function

Worse Implementation

The example below includes an unnecessary conditional and initializes many variables

for (int x = 0; x < N; x++)
{
    node *checker = table[x];
    while (checker != NULL)
    {
        node *tmp = checker;
        checker = checker->next;
        free(tmp);
    }

    if (x == N - 1 && checker == NULL)
    {
        return true;
    }
}
return false;

Better Implementation

The example below has no unnecessary conditionals and uses recursion to avoid the need for extraneous variables

void vacate_list(node *head)
{
    if (head == NULL)
    {
        return;
    }

    vacate_list(head->next);
    free(head);
}

bool unload(void)
{
    for (int i = 0; i < N; i++)
    {
        vacate_list(table[i]);
    }

    return true;
}