博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
3. Dictionaries and Sets
阅读量:5079 次
发布时间:2019-06-12

本文共 11742 字,大约阅读时间需要 39 分钟。

1. Generic Mapping Types

  • The collections.abc module provides the Mapping and MutableMapping ABCs to formalize the interfaces of dict and similar types (in Python 2.6 to 3.2, these classes are imported from the collections module, and not from collections.abc).
  • All mapping types in the standard library use the basic dict in their implementation, so they share the limitation that the keys must be hashable.

a = dict(one=1, two=2, three=3)b = {'one': 1, 'two': 2, 'three': 3}c = dict(zip(['one', 'two', 'three'], [1, 2, 3]))d = dict([('two', 2), ('one', 1), ('three', 3)])e = dict({'three': 3, 'one': 1, 'two': 2})print(a == b == c == d == e)  # Truefrom collections import abcprint(isinstance(a, abc.Mapping))  # True# 1. The main value of the ABCs is documenting and formalizing the # minimal interfaces for mappings, and serving as criteria for isinstance# tests in code that needs to support mappings in a broad sense.print(hash((1, 2, (3, 4))))  # -2725224101759650258# 1. An object is hashable if it has a hash value which never changes# during its lifetime (it needs a __hash__() method), and can be compared# to other objects (it needs an __eq__() method). # 2. Hashable objects which compare equal must have the same hash value.# 3. (atomic immutable types / frozen set / tuple when all its items are hashable)# 4. User-defined types are hashable by default because their hash value is their# id() and they all compare not equal. If an object implements a custom __eq__# that takes into account its internal state, it may be hashable only if all its# attributes are immutable.

 2. Dict Comprehensions

DIAL_CODES = [	(86, 'China'),	(91, 'India'),	(1, 'United States')]print {country.upper(): code for code, country in DIAL_CODES if code > 66}# {'INDIA': 91, 'CHINA': 86}

 3. Overview of Common Mapping Methods

4. Mappings with Flexible Key Lookup

my_dict.get(key, default)# my_dict[key]  # KeyErrormy_dict.setdefault(key, []).append(new_value)# if key not in my_dict:# 	my_dict[key] = []# my_dict[key].append(new_value)# 1. setdefault returns the value, so it can# be updated without requiring a second search.import collectionsmy_dict = collections.defaultdict(list)my_dict[key].append(new_value)print my_dict.default_factory  # 
# 1. When instantiating a defaultdict, you provide a callable# that is used to produce a default value whenever __getitem__# is passed a nonexistent key argument.# 2. If no default_factory is provided, the usual KeyError# is raised for missing keys.# 3. The default_factory of a defaultdict is only invoked to# provide default values for __getitem__ calls, and not for# the other methods. (__missing__)class MyDict(dict): def __missing__(self, key): if isinstance(key, str): raise KeyError(key) return self[str(key)] def get(self, key, default=None): try: return self[key] except KeyError: return default def __contains__(self, key): return key in self.keys() or str(key) in self.keys() d = MyDict([('1', 'one'), ('4', 'four')])print d[1] # oneprint d[2] # KeyErrorprint d.get(3) # Noneprint 4 in d # True# 1. If you subclass dict and provide a __missing__ method, the# standard dict.__getitem__ will call it whenever a key is not found,# instead of raising KeyError. The __missing__ method is just called# by __getitem__. (i.e., for the d[k] operator).# 2. A better way to create a user-defined mapping type is to # sub‐class collections.UserDict instead of dict# 3. A search like k in my_dict.keys() is efficient in Python 3 even# for very large mappings because dict.keys() returns a view, which is# similar to a set, and containment checks in sets are as fast as in# dictionaries. In Python 2, dict.keys() returns a list, k in my_list# must scan the list.

 5. Variations of dict

  1. collections.OrderedDict
    • Maintains keys in insertion order. The popitem method of an OrderedDict pops the first item by default, but if called as my_odict.popitem(last=True), it pops the last item added.
  2. collections.ChainMap
    • Holds a list of mappings that can be searched as one. The lookup is performed on each mapping in order, and succeeds if the key is found in any of them. This is useful to interpreters for languages with nested scopes, where each mapping represents a scope context. The “ChainMap objects” section of the collections docs has several examples of ChainMap usage.
    • import builtinspylookup = ChainMap(locals(), globals(), vars(builtins))
  3. collections.Counter

    • A mapping that holds an integer count for each key. Updating an existing key adds to its count. This can be used to count instances of hashable objects (the keys) or as a multiset—a set that can hold several occurrences of each element.
    • import collectionsct = collections.Counter('abracadabra')print ct  # Counter({'a': 5, 'b': 2, 'r': 2, 'c': 1, 'd': 1})ct.update('aaaaazzz')print ct  # Counter({'a': 10, 'z': 3, 'b': 2, 'r': 2, 'c': 1, 'd': 1})print ct.most_common(2)  # [('a', 10), ('z', 3)]
  4. collections.UserDict
    • A pure Python implementation of a mapping that works like a standard dict.
    • import collectionsclass MyDict2(collections.UserDict):	def __missing__(self, key):		if isinstance(key, str):			raise KeyError(key)		return self[str(key)]	def __contains__(self, key):		return str(key) in self.data	def __setitem__(self, key, item):		self.data[str(key)] = item   # stores all keys as strd = MyDict2([('1', 'one'), ('4', 'four')])print(d[1])  # oneprint(d[2])  # KeyErrorprint(d.get(3))  # Noneprint(4 in d)   # Trued.update([('1', '111')])print(d.get(1))  # 111# 1. The main reason why it’s preferable to subclass from UserDict# rather than from dict is that the built-in has some implementation# shortcuts that end up forcing us to override methods that we can# just inherit from UserDict with no problems.# 2. UserDict does not inherit from dict, but has an internal dict# instance, called data, which holds the actual items.

6. Immutable Mappings

from types import MappingProxyTyped = {1: 'A'}d_proxy = MappingProxyType(d)print(d_proxy)  # mappingproxy({1: 'A'})print(d_proxy[1])  # 'A'# d_proxy[2] = 'B'  # TypeErrord[2] = 'B'print(d_proxy)  # {1: 'A', 2: 'B'}print(d)  # {1: 'A', 2: 'B'}# 1. updates to the original mapping can be seen in the# mappingproxy, but changes cannot be made through it.

 7. Set Theory

l = ['spam', 'spam', 'eggs', 'spam']s1 = set(l)s2 = {'spam'}print(hash(frozenset(range(4))))  # 7327027465595056883print(s1)  # {'eggs', 'spam'}print(s1 & s2)  # {'spam'}print(s2.pop())  # spamprint(s2)  # set()s3 = {i for i in range(10) if i < 4}  # Set Comprehensionsprint(s3)  # {0, 1, 2, 3}# 1. Create s1 is slower than s2 because, to evaluate it, Python# has to look up the set name to fetch the constructor, then build# a list, and finally pass it to the constructor.# 2. Set elements must be hashable. The set type is not hashable,# but frozenset is.# 3. Given two sets a and b, a | b returns their union, a & b# computes the intersection, and a - b the difference.# 4. There’s no literal notation for the empty set, just use set().

 

8. dict and set Under the Hood

 8.1 Hash Tables in Dictionaries

  • A hash table is a sparse array (i.e., an array that always has empty cells). In standard data structure texts, the cells in a hash table are often called “buckets.” In a dict hash table, there is a bucket for each item, and it contains two fields: a reference to the key and a reference to the value of the item. Because all buckets have the same size, access to an individual bucket is done by offset.

  • Python tries to keep at least 1/3 of the buckets empty; if the hash table becomes too crowded, it is copied to a new location with room for more buckets.
  • To put an item in a hash table, the first step is to calculate the hash value of the item key, which is done with the hash() built-in function.
  • Hashes and equality:
    • The hash() built-in function works directly with built-in types and falls back to calling __hash__ for user-defined types. If two objects compare equal, their hash values must also be equal. (hash(1) == hash(1.0))
    • Ideally, objects that are similar but not equal should have hash values that differ widely.
    • Starting with Python 3.3, a random salt value is added to the hashes of str, bytes, and datetime objects. The salt value is constant within a Python process but varies between interpreter runs. The random salt is a security measure to prevent a DOS attack.
  • The hash table algorithm:
    • To fetch the value at my_dict[search_key], Python calls hash(search_key) to obtain the hash value of search_key and uses the least significant bits of that number as an offset to look up a bucket in the hash table (the number of bits used depends on the current size of the table). If the found bucket is empty, KeyError is raised. Otherwise the found bucket has an item—a found_key:found_value pair—and then Python checks whether search_key == found_key. If they match, that was the item sought: found_value is returned.
    • However, if search_key and found_key do not match, this is a hash collision. This happens because a hash function maps arbitrary objects to a small number of bits, and—in addition—the hash table is indexed with a subset of those bits. In order to resolve the collision, the algorithm then takes different bits in the hash, massages them in a par‐ticular way, and uses the result as an offset to look up a different bucket. If that is empty, KeyError is raised; if not, either the keys match and the item value is returned, or the collision resolution process is repeated.
    • when inserting items, Python may determine that the hash table is too crowded and rebuild it to a new location with more room. As the hash table grows, so does the number of hash bits used as bucket offsets, and this keeps the rate of collisions low.
    • even with millions of items in a dict, many searches happen with no collisions, and the average number of collisions per search is between one and two.

8.2 Practical Consequences of How dict Works

  • Keys must be hashable objects:
    • Hashable: 
      • It supports the hash() function via a hash() method that always returns the same value over the lifetime of the object.
      • It supports equality via an eq() method.
      • If a == b is True then hash(a) == hash(b) must also be True.
      • User-defined types are hashable by default because their hash value is their id() and they all compare not equal.
    • If you implement a class with a custom __eq__ method, you must also implement a suitable __hash__, because you must always make sure that if a == b is True then hash(a) == hash(b) is also True. Otherwise you are breaking an invariant of the hash table algo‐rithm, with the grave consequence that dicts and sets will not deal reliably with your objects. If a custom __eq__ depends on mutable state, then __hash__ must raise TypeError.

  • Dicts have significant memory overhead:
    • Because a dict uses a hash table internally, and hash tables must be sparse to work, they are not space efficient.
    • Replacing dicts with tuples reduces the memory usage in two ways: by removing the overhead of one hash table per record and by not storing the field names again with each record.
  • Key search is very fast:
    • trading space for time.
  • Key ordering depends on insertion order:
    • a dict built as dict([(key1, value1), (key2, value2)]) compares equal to dict([(key2, value2), (key1, value1)]), but their key ordering may not be the same if the hashes of key1 and key2 collide.
  • Adding items to a dict may change the order of existing keys:
    • In Python 3, the .keys(), .items(), and .values() methods return dictionary views, which behave more like sets than the lists returned by these methods in Python 2. Such views are also dynamic: they do not replicate the contents of the dict, and they immediately reflect any changes to the dict.

 

转载于:https://www.cnblogs.com/lb477/p/10925072.html

你可能感兴趣的文章
poj2752 Seek the Name, Seek the Fame
查看>>
软件开发和软件测试,我该如何选择?(蜗牛学院)
查看>>
基本封装方法
查看>>
bcb ole拖拽功能的实现
查看>>
生活大爆炸之何为光速
查看>>
bzoj 2456: mode【瞎搞】
查看>>
[Typescript] Specify Exact Values with TypeScript’s Literal Types
查看>>
[GraphQL] Reuse Query Fields with GraphQL Fragments
查看>>
Illustrated C#学习笔记(一)
查看>>
理解oracle中连接和会话
查看>>
两种最常用的Sticky footer布局方式
查看>>
Scrapy实战篇(三)之爬取豆瓣电影短评
查看>>
HDU 5510 Bazinga KMP
查看>>
[13年迁移]Firefox下margin-top问题
查看>>
Zookeeper常用命令 (转)
查看>>
Java程序IP v6与IP v4的设置
查看>>
RUP(Rational Unified Process),统一软件开发过程
查看>>
数据库链路创建方法
查看>>
Enterprise Library - Data Access Application Block 6.0.1304
查看>>
重构代码 —— 函数即变量(Replace temp with Query)
查看>>