Counter and namedtuple
Collections¶
- Counter - For counting the number of things in a thing :)
- namedtuple - For writing self-documenting code!
collection.Counter¶
Counters are dictionaries that have a number value for each key. They're very useful when you need to count the number of elements in a collection, for example the word count of a portion of text.
In [1]:
from collections import Counter, namedtuple
from pathlib import Path
from pprint import pprint
import requests
import csv
In [2]:
# Here we count the colors in a list
color_count = Counter(['red', 'blue', 'red', 'green', 'blue', 'blue', 'yellow'])
print('color_count:', color_count, '\n')
# Values can also be passed explicitly
color_count_2 = Counter(red=5, green=2, blue=7, orange=1)
print('color_count_2:', color_count_2, '\n')
# We can also compare Counter objects
# Addition
print('addition:', color_count + color_count_2, '\n')
# Subtraction
print('subtraction:', color_count - color_count_2, '\n')
# Intersection
print('intersection:', color_count & color_count_2, '\n')
# Union
print('union:', color_count | color_count_2, '\n')
In [3]:
# Let's do a word count on Dr. Suess' Yertle the Turtle
r = requests.get('http://www.spunk.org/texts/prose/sp000212.txt')
yertle = Counter(r.text.split())
print('Most common 10 words:')
print('\n'.join(map(str, yertle.most_common(10))), '\n')
print('Least common 10 words:')
print('\n'.join(map(str, yertle.most_common()[:-11:-1])))
collections.namedtuple¶
namedtuples allow you to write self-documenting code. They're most useful when you end up iterating over streams of data (like the rows in a csv) where you don't want to creates tuples and refer to them by their number index but where creating a dictionary may not be necessary. You can combine them with Counters to do cool stuff :)
In [4]:
Person = namedtuple('Person', ['name', 'age'])
amy = Person('Amy', 31)
print(amy, '\n')
bob = Person(name='Bob', age=17)
print(bob, '\n')
susan = Person(**{'name': 'Susan', 'age': 45})
print(susan, '\n')
people = [
('Aaron', 56),
('Wilfred', 89),
('Bertha', 2)
]
pprint(list(map(Person._make, people)))
In [5]:
def clean_line(line):
"""
Return a version of the string that is more compatible
with the creation of namedtuples.
"""
clean_word = lambda string: string.strip('\n ?').replace(' ', '_').replace('-', '_')
return list(map(clean_word, line.split(',')))
# This data is from the consumer complaints
# dataset found on catalog.data.gov
complaints_file = Path('..', 'files', 'Consumer_Complaints.csv')
with complaints_file.open() as infile:
header = clean_line(infile.readline())
Complaint = namedtuple('Complaint', header)
complaints = list(map(Complaint._make, csv.reader(infile)))
issues = Counter([complaint.Issue for complaint in complaints])
print('Most common issues:')
pprint(issues.most_common(3))
print()
companies = Counter([complaint.Company for complaint in complaints])
print('Companies with the most complaints:')
pprint(companies.most_common(3))
print()
# namedtuples can act as dictionaries using their ._asdict() method
first_complaint = complaints[0]
print(first_complaint._asdict()['Issue'], '\n')
# It's normally better to refer to the namedtuple's attribute, however
print(first_complaint.Issue, '\n')
# We can also use the ._fields attribute to see all the other attributes
# of a particular namedtuple
print('Complaint fields:')
pprint(first_complaint._fields)
In [6]:
# This data is from New York's Leading Causes of Deaths
# dataset found on catalog.data.gov
death_file = Path('..', 'files', 'New_York_City_Leading_Causes_of_Death.csv')
with death_file.open() as infile:
header = clean_line(infile.readline())
Death = namedtuple('Death', header)
deaths = map(Death._make, csv.reader(infile))
print('The most common cause of death is:')
print(Counter([death.Cause_of_Death for death in deaths]).most_common(1))
In [ ]: