Problem Set 6: DNA

Evidence of a 5-point Submission

  • Stores the subsequences in a variable using the .fieldnames attribute and [1:] delimiter
  • Opens files using the with context manager, and handles all necessary variables within the file context
  • Stores the return of the longest_match function in a dictionary
  • Checks for matches using a method that will short-circuit on checking a person when it finds one STR that does not match

Evidence of a 4-point Submission

  • Opens files using the with context manager, may use a few unnecessary variables while reading in data or may declare necessary variables outside of the file context
  • Stores the return of the longest_match function in a dictionary
  • Checks for matches using a method that will short-circuit on checking a person when it finds one STR that does not match

Evidence of a 3-point Submission

  • Opens files and closes after use
  • May store the return of the longest_match function in an array
  • May use an unnecessarily complex data structure (nested dictionary)

Evidence of a 2-point Submission

  • Does not close a file after reading in data (only applicable if the with method is not used)
  • Uses a CSV reader to read the DNA file, which is a plaintext file.
  • Iterates over indices, perhaps using the range function, when list elements themselves would suffice, as via for _ in _.
  • Wraps the rest of the main function in an unnecessary else after error handling
  • Wraps the rest of the main function inside of a with statement, keeping a file open for an unnecessarily long time
  • Does not short-circuit when checking for matches

Evidence of a 1-point Submission

  • Opens files in a mode other than "r" (read-only mode)
  • Hard-codes subsequences
  • Does not use the longest_match function
  • Checks for matches in a way that is hard to understand, may loop through each person or sequence more than once

Example Implementations (Worse vs. Better)

Opening Files

Worse Implementation

The example below opens a file but does not subsequently close it after use. Could also be improved by storing the subsequences in a variable

reader = csv.DictReader(open(sys.argv[1]))

Better Implementation

The example below uses the with keyword to open the file, handles data processing within the indented section beneath, and uses the people and strs array to store data for access later

people = []
with open(sys.argv[1]) as db_file:
    reader = csv.DictReader(db_file)
    strs = reader.fieldnames[1:]

    for row in reader:
        people.append(row)

Find Longest Match

Worse Implementation

The example below uses an array to store subsequence matches (as opposed to a dictionary) and unnecessarily iterates over a range (as opposed to over the elements)

longest_STRS = []
for i in range(len(reader.fieldnames) - 1):
    longest_STR_i = longest_match(DNA, reader.fieldnames[i+1])
    longest_STRS.append(longest_STR_i)

Better Implementation

The example below uses a dictionary to store subsequence matches (as item-key pairs) and iterates over the elements of strs`

# data processing
    strs = reader.fieldnames[1:]

# ... rest of the function

matches = {}
for str in strs:
    matches[str] = longest_match(sequence, str)

Check For Matches

Worse Implementation

The example below will not short-circuit

for person in people:
    count = 0
    for str in strs:
        if int(person[str]) == matches[str]:
            count += 1
    if count == len(strs):
        print(person["name"])
        sys.exit(0)
print("No match")
sys.exit(0)

Better Implementation

Using strs as defined above, the example below will short-circuit

for person in people:
    match = True
    for str in strs:
        if int(person[str]) != matches[str]:
            match = False
            break
    if match:
        print(person["name"])
        sys.exit(0)
print("No match")
sys.exit(0)