Problem Set 6: DNA
Evidence of a 5-point Submission
- Stores the subsequences in a variable using the
.fieldnamesattribute and[1:]delimiter - Opens files using the
withcontext manager, and handles all necessary variables within the file context - Stores the return of the
longest_matchfunction in a dictionary - Checks for matches using a method that will short-circuit on checking a person when it finds one STR that does not match
Evidence of a 4-point Submission
- Opens files using the
withcontext manager, may use a few unnecessary variables while reading in data or may declare necessary variables outside of the file context - Stores the return of the
longest_matchfunction in a dictionary - Checks for matches using a method that will short-circuit on checking a person when it finds one STR that does not match
Evidence of a 3-point Submission
- Opens files and closes after use
- May store the return of the
longest_matchfunction in an array - May use an unnecessarily complex data structure (nested dictionary)
Evidence of a 2-point Submission
- Does not close a file after reading in data (only applicable if the
withmethod is not used) - Uses a CSV reader to read the DNA file, which is a plaintext file.
- Iterates over indices, perhaps using the
rangefunction, when list elements themselves would suffice, as viafor _ in _. - Wraps the rest of the main function in an unnecessary else after error handling
- Wraps the rest of the main function inside of a
withstatement, keeping a file open for an unnecessarily long time - Does not short-circuit when checking for matches
Evidence of a 1-point Submission
- Opens files in a mode other than
"r"(read-only mode) - Hard-codes subsequences
- Does not use the
longest_matchfunction - Checks for matches in a way that is hard to understand, may loop through each person or sequence more than once
Example Implementations (Worse vs. Better)
Opening Files
Worse Implementation
The example below opens a file but does not subsequently close it after use. Could also be improved by storing the subsequences in a variable
reader = csv.DictReader(open(sys.argv[1]))
Better Implementation
The example below uses the with keyword to open the file, handles data processing within the indented section beneath, and uses the people and strs array to store data for access later
people = []
with open(sys.argv[1]) as db_file:
reader = csv.DictReader(db_file)
strs = reader.fieldnames[1:]
for row in reader:
people.append(row)
Find Longest Match
Worse Implementation
The example below uses an array to store subsequence matches (as opposed to a dictionary) and unnecessarily iterates over a range (as opposed to over the elements)
longest_STRS = []
for i in range(len(reader.fieldnames) - 1):
longest_STR_i = longest_match(DNA, reader.fieldnames[i+1])
longest_STRS.append(longest_STR_i)
Better Implementation
The example below uses a dictionary to store subsequence matches (as item-key pairs) and iterates over the elements of strs`
# data processing
strs = reader.fieldnames[1:]
# ... rest of the function
matches = {}
for str in strs:
matches[str] = longest_match(sequence, str)
Check For Matches
Worse Implementation
The example below will not short-circuit
for person in people:
count = 0
for str in strs:
if int(person[str]) == matches[str]:
count += 1
if count == len(strs):
print(person["name"])
sys.exit(0)
print("No match")
sys.exit(0)
Better Implementation
Using strs as defined above, the example below will short-circuit
for person in people:
match = True
for str in strs:
if int(person[str]) != matches[str]:
match = False
break
if match:
print(person["name"])
sys.exit(0)
print("No match")
sys.exit(0)