Problem Set 6: DNA
Evidence of a 5-point Submission
- Stores the subsequences in a variable using the
.fieldnames
attribute and[1:]
delimiter - Opens files using the
with
context manager, and handles all necessary variables within the file context - Stores the return of the
longest_match
function in a dictionary - Checks for matches using a method that will short-circuit on checking a person when it finds one STR that does not match
Evidence of a 4-point Submission
- Opens files using the
with
context manager, may use a few unnecessary variables while reading in data or may declare necessary variables outside of the file context - Stores the return of the
longest_match
function in a dictionary - Checks for matches using a method that will short-circuit on checking a person when it finds one STR that does not match
Evidence of a 3-point Submission
- Opens files and closes after use
- May store the return of the
longest_match
function in an array - May use an unnecessarily complex data structure (nested dictionary)
Evidence of a 2-point Submission
- Does not close a file after reading in data (only applicable if the
with
method is not used) - Uses a CSV reader to read the DNA file, which is a plaintext file.
- Iterates over indices, perhaps using the
range
function, when list elements themselves would suffice, as viafor _ in _
. - Wraps the rest of the main function in an unnecessary else after error handling
- Wraps the rest of the main function inside of a
with
statement, keeping a file open for an unnecessarily long time - Does not short-circuit when checking for matches
Evidence of a 1-point Submission
- Opens files in a mode other than
"r"
(read-only mode) - Hard-codes subsequences
- Does not use the
longest_match
function - Checks for matches in a way that is hard to understand, may loop through each person or sequence more than once
Example Implementations (Worse vs. Better)
Opening Files
Worse Implementation
The example below opens a file but does not subsequently close it after use. Could also be improved by storing the subsequences in a variable
reader = csv.DictReader(open(sys.argv[1]))
Better Implementation
The example below uses the with
keyword to open the file, handles data processing within the indented section beneath, and uses the people
and str
s array to store data for access later
people = []
with open(sys.argv[1]) as db_file:
reader = csv.DictReader(db_file)
strs = reader.fieldnames[1:]
for row in reader:
people.append(row)
Find Longest Match
Worse Implementation
The example below uses an array to store subsequence matches (as opposed to a dictionary) and unnecessarily iterates over a range (as opposed to over the elements)
longest_STRS = []
for i in range(len(reader.fieldnames) - 1):
longest_STR_i = longest_match(DNA, reader.fieldnames[i+1])
longest_STRS.append(longest_STR_i)
Better Implementation
The example below uses a dictionary to store subsequence matches (as item-key pairs) and iterates over the elements of str
s`
# data processing
strs = reader.fieldnames[1:]
# ... rest of the function
matches = {}
for str in strs:
matches[str] = longest_match(sequence, str)
Check For Matches
Worse Implementation
The example below will not short-circuit
for person in people:
count = 0
for str in strs:
if int(person[str]) == matches[str]:
count += 1
if count == len(strs):
print(person["name"])
sys.exit(0)
print("No match")
sys.exit(0)
Better Implementation
Using str
s as defined above, the example below will short-circuit
for person in people:
match = True
for str in strs:
if int(person[str]) != matches[str]:
match = False
break
if match:
print(person["name"])
sys.exit(0)
print("No match")
sys.exit(0)