python爬虫可以爬取哪些有用的东西_python编程入门

(38) 2024-09-02 23:01:05

使用python编写嗅探器

When you’re just getting started with Python, essentially everything you’re learning is foundational. There is not a lot of time or motivation to stop and ask when will I use this? when you’re learning about basic things like data types, conditionals, loops, and functions…because the answer is as a Python programmer, you’ll use all of these things all the time! Whether you are a data scientist building machine learning models, or a software developer building the backend of a website, it’s unlikely you’ll be able to make any progress at all if you don’t know the basics.

你是刚刚开始使用PythonW¯¯母鸡,基本上一切你学习是基础。 没有太多时间或动力停止询问我什么时候使用它? 当您学习诸如数据类型,条件,循环和函数之类的基本知识时……因为答案是作为Python程序员,您将始终使用所有这些东西! 无论您是构建机器学习模型的数据科学家,还是构建网站后端的软件开发人员,如果您不了解基础知识,那么您根本不可能取得任何进展。

But once you understand the essential foundations of Python, it can be challenging to identify which new tools and concepts will be helpful — and possibly even necessary—in your advancement, compared to those that might just be a waste of your time.

但是,一旦您了解了Python的基本基础,与那些可能只是浪费时间的工具和概念相比,确定哪些新工具和概念对您的前进将是有帮助的,甚至可能是必要的。

In this story, I’ll walk you through:

在这个故事中,我将向您介绍:

  1. Review of SOLID software development principles

    审查SOLID软件开发原则

  2. Examples of “tacked-on” scenarios that are challenging to implement while following the SOLID principles

    遵循SOLID原则时难以实施的“附加”方案示例

  3. How a decorator approach overcomes this challenge

    装饰方法如何克服这一挑战

  4. Using Python decorators written by others

    使用其他人编写的Python装饰器

  5. Writing your own decorators in Python

    用Python编写自己的装饰器

The code in this blog post can also be found in this Google Colaboratory notebook:

此博客文章中的代码也可以在以下Google合作实验室笔记本中找到:

SOLID原则 (SOLID principles)

If you’re planning on contributing to “real” software projects, not just one-off scripts or personal projects, it’s a good idea to be familiar with the SOLID principles.

如果您打算为“实际”软件项目做出贡献,而不仅仅是一次性脚本或个人项目,那么熟悉SOLID原则是个好主意。

python爬虫可以爬取哪些有用的东西_python编程入门 (https://mushiming.com/)  第1张

Uncle Bob Consulting, LLC 叔叔鲍勃咨询有限公司

The SOLID principles were originally created by Robert C. Martin, AKA “Uncle Bob” of Uncle Bob Consulting. To quote from his website:

SOLID原则最初是由Bob Uncle Bob Consulting的AKA“ Uncle Bob”创建的。 引用他的网站 :

The principles are mental cubby-holes. They give a name to a concept so that you can talk and reason about that concept. They provide a place to hang the feelings we have about good and bad code. They attempt to categorize those feelings into concrete advice.

原则是精神上的漏洞。 他们给一个概念起了个名字,以便您可以谈论和推理这个概念。 它们提供了一个地方,可以使我们对代码的好坏有所感触 。 他们试图将这些感觉归类为具体建议。

The 5 SOLID principles are:

SOLID的5条原则是:

  • Single-Responsibility Principle

    小号英格尔-责任原则

  • Open-Closed Principle

    Ø笔封闭原则

  • Liskov Substitution Principle

    大号 iskov替换原则

  • Interface Segregation Principle

    覆盖整个院落分离原则

  • Dependency Inversion Principle

    d ependency倒置原则

The two we’re going to focus on in this blog post are the single-responsibility principle and the open-closed principle. These are the two principles that are most challenging to follow in the “tacked-on” scenarios described later.

我们在本博文中将重点关注的是单一责任原则和开放式封闭原则。 这是在后面所述的“固定”方案中要遵循的最具挑战性的两个原则。

单一责任原则 (Single-responsibility principle)

Every component of a piece of software should only be responsible for a single set of functionality. In the context of the examples in this blog, a “component” will mean a Python function.

一个软件的每个组件应该只负责一组功能。 在本博客中的示例上下文中,“组件”表示Python函数。

开闭原理 (Open-closed principle)

Already-written software should be “closed” for modification, but “open” for extension. Meaning, we should avoid re-writing existing code whenever possible, but ideally we should be able to add extended functionality to that code.

已经编写的软件应“关闭”以进行修改,而应“打开”以进行扩展。 意思是,我们应该尽可能避免重写现有代码,但是理想情况下,我们应该能够为该代码添加扩展功能。

“附加”软件方案 (“Tacked-on” software scenarios)

While “SOLID” is a widely-known term in software development, “tacked-on” is just a name that I came up with for a category of software scenarios where you have a perfectly-functional software library, but now a stakeholder is asking you to “tack on” some more functionality that is related to the original functionality while simultaneously being conceptually distinct.

虽然“ SOLID”是软件开发中一个众所周知的术语,但是“ tacked-on”只是我在一个具有完美功能的软件库的软件场景类别中使用的名称,但是现在,利益相关者正在询问您可以“附加”更多与原始功能相关的功能,同时又在概念上有所区别。

Some examples of “tacked-on” software scenarios are:

“附加”软件方案的一些示例包括:

  • Caching

    快取

  • Logging

    记录中

  • Access control

    访问控制

  • Input validation

    输入验证

  • Tweaks to input or output format

    调整输入或输出格式

In all of these cases, you want to have the original functionality, but also every time you invoke that original functionality, you want something else to happen on top of it.

在所有这些情况下,您都希望拥有原始功能,而且每次调用该原始功能时,您都希望在其上发生其他事情。

缓存示例 (Caching example)

Let’s focus on caching as a specific example of a “tacked-on” scenario. In general, caching means that you save the result of a costly operation in a cache (usually implemented as some form of a dictionary/hashmap) so that later you can just look up that result instead of re-doing that costly operation.

让我们专注于缓存作为“固定”方案的特定示例。 通常, 缓存意味着将代价高昂的操作结果保存在缓存中 (通常以某种形式的字典/哈希图的形式实现),以便稍后您可以查找该结果,而不用重新执行该代价高昂的操作。

If you’ve done any technical interview algorithms practice before, you’ve likely encountered the task of optimizing the recursive Fibonacci number algorithm. We’ll use it as a stand-in for a more realistic computing task. An example implementation of this algorithm (found in the Python docs) is:

如果您以前做过任何技术面试算法练习,那么您可能已经遇到了优化递归斐波那契数算法的任务。 我们将使用它作为更实际的计算任务的替身。 该算法的示例实现(可在Python docs中找到)是:

def fib(n):
if n < 2:
return n
return fib(n-1) + fib(n-2)

Let’s assume that your coworker already wrote that code as part of a larger software project, as well as this code (standing in for a more realistic unit test suite) to test the functionality:

假设您的同事已经将该代码作为较大的软件项目的一部分进行了编写,同时还假定该代码(代表更现实的单元测试套件)已用于测试功能:

assert fib(25) == 75025

The problem with this implementation is that you end up re-computing the same values repeatedly. For example, if you’re computing fib(10) that means fib(9) + fib(8). Then fib(9) =fib(8) + fib(7) and fib(8) = fib(7) + fib(6). Already you can see that fib(8) and fib(7) are being computed twice, and this repetition cascades through the rest of the recursion. How can you make this code faster?

此实现的问题是,最终您需要重复地重新计算相同的值。 例如,如果您要计算fib(10) ,则意味着fib(9) + fib(8) 。 然后fib(9) = fib(8) + fib(7)fib(8) = fib(7) + fib(6) 。 您已经看到fib(8)fib(7)进行了两次计算,并且此重复在其余的递归过程中级联。 如何使此代码更快?

The “known” way to optimize your Fibonacci solution is to add in caching, so that once fib(8) is calculated, you never have to calculate it again, you just have to retrieve it from the cache. But how to do that while following the SOLID principles?

优化Fibonacci解决方案的“已知”方法是添加缓存,这样一来,一旦计算出fib(8) ,就fib(8)再次计算,只需从缓存中检索即可。 但是,在遵循SOLID原则的同时该怎么做?

次优缓存解决方案 (Sub-optimal caching solutions)

The most obvious/common way to address this is to make a cache outside of the function, then adapt the function to check the cache before doing the computation and add all computed values to the cache. Something like this:

解决此问题的最明显/最常见的方法是在函数外部创建一个缓存,然后在执行计算之前使该函数适应检查缓存并将所有计算出的值添加到缓存中。 像这样:

cache = {}def fib(n):
if n < 2:
return n if n in cache:
return cache[n] result = fib(n-1) + fib(n-2)
cache[n] = result
return result

The test suite will still pass ✅

测试套件仍将通过

assert fib(25) == 75025

But what about the SOLID principles?

但是SOLID原则呢?

Single-responsibility

单一责任

Now instead of a function that computes Fibonacci numbers, we have a function that computes Fibonacci numbers and manages adding and retrieving things from a cache.

现在,我们有了一个可以计算斐波那契数并管理从缓存中添加和检索事物的函数,而不是计算斐波那契数的函数。

Open-closed

开闭

We did not leave the existing function “closed”. If you looked at the Git blame for the function, 3 of the original lines of the function would be the same, 1 would be removed, and 6 would be added. You’ve re-written the majority of this whole function, and if you later decide to use a different kind of cache, you would need to “open up” the function again.

我们没有将现有功能“关闭”。 如果您将此功能归因于Git ,则该功能的原始行中的3行将是相同的,将删除1行,并添加6行。 您已经重写了整个函数的大部分内容,如果以后决定使用其他类型的缓存,则需要再次“打开”该函数。

Another popular implementation of this would be to avoid creating a global cache by converting the function into a method of a class, and making the cache a member variable of that class. This has the same single-responsibility ❌ and open-closed ❌ issues as the previous solution, and additionally would require re-writing of the unit test ❌ (and all code that currently expects this to be a standalone function rather than a class method).

另一个流行的实现方式是通过将函数转换为的方法,并使该缓存成为该类的成员变量,来避免创建全局缓存。 这与先前的解决方案具有相同的单一职责❌和开放-封闭❌问题,并且另外需要重写单元测试❌(并且所有当前期望这是独立功能而不是类方法的代码) 。

装饰器方法 (The decorator approach)

The decorator approach is a great way to solve this kind of problem. It goes back to the classic “Gang of Four” Design Patterns book published in 1994.

装饰器方法是解决此类问题的好方法。 它可以追溯到1994年出版的经典的“四人帮” 设计模式书。

python爬虫可以爬取哪些有用的东西_python编程入门 (https://mushiming.com/)  第2张

Design Patterns: Elements of Reusable Object-Oriented Software 设计模式:可重用的面向对象软件的元素

While that book describes “composition over inheritance” in the context of designing OOP classes, we can translate it in this context to be about the composition of functions.

尽管该书在设计OOP类的上下文中描述了“继承之上的组成” ,但我们可以在此上下文中将其翻译为与功能的组成有关。

In general, the decorator approach means you want to “wrap” your code in this tacked-on functionality, meaning you create something that is composed of the original functionality, now decorated with the tacked-on functionality.

通常,装饰器方法意味着您要在此附加功能中“包装”您的代码,这意味着您要创建原始功能组成的东西,现在已经用附加功能装饰了。

Because we don’t need to modify the existing code, by using a decorator approach we are able to “tack on” the new functionality while still following the single-responsibility and open-closed SOLID principles.

因为我们不需要修改现有代码,所以通过使用修饰器方法,我们可以“附加”新功能,同时仍然遵循单一职责和开放式SOLID原则。

澄清度 (Clarification)

There are many resources describing the Decorator Pattern, like this article. This terminology is specific to a narrow type of object-oriented “decorator classes”, which can theoretically be implemented using Python decorators. Decorator classes are addressing the same kinds of scenarios we have been discussing, and the general idea is similar. But “decorators in Python” refers to a more high-level construct that can apply to both functions and classes. This potential confusion around the naming of Python decorators is addressed but not resolved in the Python Enhancement Proposal from 2003.

像本文一样,有许多资源描述装饰器模式 。 该术语专用于狭窄类型的面向对象的“装饰器类”,理论上可以使用Python装饰器来实现。 装饰器类正在解决我们一直在讨论的相同类型的场景,并且总体思路是相似的。 但是“ Python中的装饰器”是指可以应用于函数和类的更高级的构造。 围绕Python装饰器命名的这种潜在混淆已得到解决,但2003年的Python增强建议中并未解决。

使用python装饰器 (Using python decorators)

Before we dive into writing our own custom Python decorator, let’s use one that has already been written for us. Conceptually this is similar to learning how to use existing functions like print() or len() before attempting to write functions for ourselves.

在开始编写自己的自定义Python装饰器之前,让我们使用已经为我们编写的装饰器。 从概念上讲,这类似于在尝试为自己编写函数之前学习如何使用现有函数(如print()len()

We’ll return to the Fibonacci numbers example from before:

我们将从之前的斐波那契数字示例中返回:

def fib(n):
if n < 2:
return n
return fib(n-1) + fib(n-2)

Now let’s add an LRU cache from the built-infunctoolsmodule (built-in meaning we don’t need to install any libraries beyond base Python, we just need to import it):

现在,让我们从内置的 functools 模块中添加一个LRU缓存(内置意味着我们不需要安装除基本Python之外的任何库,我们只需要导入它):

import functools@functools.lru_cache(maxsize=None)
def fib(n):
if n < 2:
return n
return fib(n-1) + fib(n-2)

…and that’s it! 🤯

…就是这样! 🤯

The test suite still passes ✅

测试套件仍通过

Single-responsibility principle

单一责任原则

The fib function, as written, is still just a Fibonacci number function. It is not a function combining Fibonacci numbers and caching.

如所写, fib函数仍然只是斐波那契数函数。 它不是将斐波那契数与缓存结合在一起的功能。

Open-closed principle

开闭原理

We left the original function “closed”. If someone went to look at the Git blame, it would still be fully attributed to the original author. But at the same time, we “opened” it up for extension, since now it’s using caching.

我们将原始功能“关闭”。 如果有人去看看Git的责任,那仍然可以完全归因于原始作者。 但是同时,我们“打开”了它进行扩展,因为现在它正在使用缓存。

Reality check: In case you are wondering if adding these two lines of code really “did anything”, I ran these two snippets using the %%timeit magic command and fib(25). The original code took 30.9 ms (30.9 milliseconds), and the code with caching took 6 µs (6 microseconds). There are 1000 microseconds in a millisecond, meaning the performance of the code with caching was over 1000x faster than the original—pretty impressive for an addition of 2 lines of code!

真实性检查:如果您想知道添加这两行代码是否真的“做了什么”,我可以使用%%timeit magic命令和fib(25)运行这两段代码。 原始代码花费30.9毫秒(30.9毫秒),带有缓存的代码花费6 µs(6微秒)。 一毫秒内有1000微秒,这意味着带有缓存的代码的性能比原始代码快1000倍-加上两行代码,这真是令人印象深刻!

所以...发生了什么事? (So…what just happened?)

Mechanically, we just imported a module (functools), then added an @ symbol plus a function from within that module.

从机械functools ,我们只是导入了一个模块( functools ),然后在该模块中添加了@符号和一个函数。

Conceptually, we “wrapped” the existing fib function in a cache decorator:

从概念上讲,我们将现有的fib函数“包装”在缓存装饰器中:

  1. The lru_cache decorator took in our function as an argument

    lru_cache装饰器将我们的函数作为参数

  2. It added caching functionality

    它增加了缓存功能

  3. It returned a new function composed of fib plus the caching logic

    它返回了一个由fib加上缓存逻辑组成的新函数

  4. Finally, it re-assigned the name fib to the new function, so we could continue using the original interface

    最后,它将名称fib重新分配给新功能,因此我们可以继续使用原始接口

使用已经写好的装饰器 (Using already-written Decorators)

Unfortunately, the Python developers do not maintain a global list of decorators available in base Python, but this GitHub repo contains a pretty good list of those decorators as well as decorators “in the wild” (i.e. located in libraries outside of Python itself):

不幸的是,Python开发人员并未维护基本Python中可用的装饰器的全局列表,但是此GitHub存储库包含了这些装饰器以及“狂野的”装饰器的很好的列表(即位于Python本身之外的库中):

(This repo has not been updated since 2017, let me know in the comments if you have found a better or complementary resource!)

(此存储库自2017年以来未更新,如果您找到了更好的或补充的资源,请在评论中告诉我!)

Reality check: If you’re curious how often and in what context you’ll encounter decorators “in the real world”…in my work in software engineering and data science, I most frequently encounter decorators in a backend/full-stack web development context, especially Flask and Dash. Those frameworks use decorators to allow you to write functions that will be wrapped in the full web server functionality. It is essentially impossible to create an application using either of those frameworks without using decorators. Using a Flask decorator looks something like this (the “minimal” application from the Quickstart guide):

现实检查:如果您好奇“在现实世界中”会遇到装饰器的频率以及在何种情况下……在我从事软件工程和数据科学的工作中,我经常在后端/全栈Web开发中遇到装饰器上下文,尤其是Flask和Dash 。 这些框架使用装饰器允许您编写将包装在完整Web服务器功能中的功能。 在不使用装饰器的情况下,使用这两个框架中的任何一个创建应用程序基本上是不可能的。 使用Flask装饰器看起来像这样(“ 快速入门”指南中的“最小”应用程序):

from flask import Flask
app = Flask(__name__)
@app.route('/')def hello_world():
return 'Hello, World!'

Using decorators that someone else has written is great! You can get some excellent functionality right “out of the box”: just import (if needed), add the @ symbol syntax, and your function is now decorated with added functionality. There are excellent caching, logging, and access control decorators already out there. It can be acceptable to leave decorators as an unopened “black box” if someone has already written a decorator that accomplishes what you need to accomplish!

使用别人写的装饰器很棒! 您可以立即“开箱即用”地获得一些出色的功能:只需导入(如果需要),添加@符号语法,您的函数现在就被添加了功能。 已经有出色的缓存,日志记录和访问控制装饰器。 如果有人已经编写了可以满足您需要完成的装饰器,那么可以将装饰器保留为未打开的“黑匣子”是可以接受的!

But sometimes, you might want to use this technique in a context where nobody has written a decorator that does the thing you want to do. So let’s go ahead and explore how you might write a custom decorator.

但是有时,您可能想在没有人编写装饰器来完成您想做的事情的情况下使用此技术。 因此,让我们继续探索如何编写自定义装饰器。

Python充当“一流”对象 (Python functions as “first-class” objects)

Before we can jump right in to writing custom decorators, we need to review how Python functions are “first-class” objects. This is a concept that will be very familiar if you’ve used JavaScript for front-end web development (think callbacks and event listeners) and very foreign if you’ve only used a language like Ruby or Java before, that doesn’t treat functions this way.

在继续编写自定义装饰器之前,我们需要回顾一下Python函数如何是“一流”对象 。 如果您使用JavaScript进行前端Web开发 (请考虑回调事件监听器 ),则此概念将非常熟悉;如果您以前仅使用Ruby或Java之类的语言,则该概念将非常陌生。以这种方式起作用。

Functions in Python:

Python函数:

  • Can be stored in data structures

    可以存储在数据结构中

  • Can be assigned to variables

    可以分配给变量

  • Can be passed to other functions

    可以传递给其他功能

  • Can be nested

    可以嵌套

  • Can capture local state

    可以捕获本地状态

(Check out this tutorial for more explanation and examples of these properties.)

(请查看本教程,以获取有关这些属性的更多说明和示例。)

We’re going to focus on those three bolded properties, since they are key for understanding how decorators work.

我们将重点介绍这三个粗体属性,因为它们对于理解装饰器的工作方式至关重要。

将函数分配给变量,并将函数传递给其他函数 (Assigning functions to variables, and passing functions to other functions)

Here are some short/trivial examples to show what these properties mean. First, let’s set up some functions:

以下是一些简短的示例,以说明这些属性的含义。 首先,让我们设置一些功能:

def squared(num):
return num ** 2def cubed(num):
return num ** 3print(squared(5)) # prints 25print(cubed(5)) # prints 125

Pretty straightforward. We have two functions, each of which takes a single argument num and returns a polynomial transformation of num. squared squares it, and cubed cubes it.

非常简单。 我们有两个功能,每个只需一个参数num和返回的多项式变换numsquared平方,然后将其cubed立方体。

Now to demonstrate the “first-class” functionality:

现在演示“一流”功能:

def func_plus_two(polynomial_func, num):
return polynomial_func(num) + 2first_func = squared
second_func = cubedprint(func_plus_two(first_func, 5)) # prints 27print(func_plus_two(second_func, 5)) # prints 127

So, we are assigning functions to variables, in this case assigning squared to first_func and cubed to second_func. (There is no particularly useful reason for doing this, it’s just for demonstration purposes.)

因此,我们将函数分配给变量 ,在这种情况下,将squared分配给first_func ,将cubed分配给second_func 。 (执行此操作没有特别有用的原因,仅用于演示目的。)

Then, maybe more interestingly, we are passing those functions to another function, first passing first_func then second_func into func_plus_two. That function calls the function that was passed into it, then returns the result of the original function + 2. So, instead of 25 and 125, we get 27 and 127, when num = 5. This structure (passing a function into another function) is also called a higher-order function, i.e. we could say “func_plus_two is a higher-order function”.

然后,也许更有趣的是,我们将这些函数传递给另一个函数 ,首先将first_func然后将first_func second_funcfunc_plus_two 。 该函数调用传递给它的函数,然后返回原始函数+ 2的结果。因此,当num = 5时,我们得到的不是27和125,而是27和127。这种结构(将一个函数传递给另一个函数)也称为高阶函数,即我们可以说“ func_plus_two是高阶函数”。

嵌套函数 (Nested functions)

It’s even more of a “stretch” to connect this example to something realistically useful, but bear with me, it becomes important when we actually get to writing decorators!

将这个示例与实际有用的东西联系起来甚至更“费力”,但是请耐心等待,当我们真正开始编写装饰器时,它变得很重要!

Consider this example of nested functions:

考虑以下嵌套函数示例:

def nth_degree(num, n):
def squared_inner(num):
return num ** 2
def cubed_inner(num):
return num ** 3 if n == 2:
return squared_inner(num)
elif n == 3:
return cubed_inner(num)

Now instead of having squared and cubed as functions in the global scope, we have functions squared_inner and cubed_inner that only exist within the scope of the nth_degree function. In other words, if we tried to run squared_inner(5) in the global scope, it wouldn’t return 25, it would throw a NameError: name 'squared_inner' is not defined. To invoke (i.e. call) this function, we could use code like:

现在,我们不再将squaredcubed作为全局范围内的函数,而拥有了squared_innercubed_inner函数,它们仅存在于nth_degree函数的范围内。 换句话说,如果我们尝试在全局范围内运行squared_inner(5) ,则不会返回25,它将抛出NameError: name 'squared_inner' is not defined 。 要调用 (即调用)此函数,我们可以使用如下代码:

nth_degree(4, 2) # 4 squared, prints 16nth_degree(2, 3) # 2 cubed, prints 8

In the first example, we passed in an n of 2, so nth_degree called the squared_inner function on 4 and returned 16. In the second example, we passed in an n of 3, so nth_degree called the cubed_inner function on 2 and returned 8. In reality we could easily re-write this code to avoid inner functions altogether, but I hope you understand the main takeaway that a function can have another function nested inside itself, then return something based on that nested function.

在第一个示例中,我们传入n为2,因此nth_degree在4上nth_degreesquared_inner函数,并返回了16。在第二个示例中,我们传入了n为3,因此, nth_degree cubed_inner了2上的调用了cubed_inner函数,并返回了8。实际上,我们可以轻松地重新编写此代码以避免完全使用内部函数,但是我希望您了解一个主要的收获,即一个函数可以在其内部嵌套另一个函数,然后基于该嵌套函数返回某些内容。

编写自己的逗号装饰器 (Writing our own comma-adding decorator)

Ok, now that we’ve completed the review of first-class Python functions, let’s write a ⭐ decorator ⭐️ with custom functionality!

好的,现在我们已经完成了对一流Python函数的审查,让我们编写一个具有自定义功能的dec装饰器️。

Let’s say your boss comes to you with this task:

假设您的老板来完成这项任务:

“Our users are having a hard time reading these large numbers. At a glance, is 1 million or 10 million? Let’s take out the guesswork and put in some commas, like they’re used to in Comma Style format in Excel. So they see something like 1,000,000 instead of .”

“我们的用户很难阅读这些大量数字。 乍一看,是是100万还是1000万? 让我们消除猜测,并放入一些逗号,就像它们习惯于Excel中的逗号样式格式一样。 因此他们看到的是1,000,000,而不是。”

(This is inspired by a real task I had to do as a software developer, back when I worked in software consulting at Crowe. Accounting folks love their commas!)

(这是受我在Crowe从事软件咨询工作时做为软件开发人员的一项实际任务的启发而来的。会计界人士喜欢他们的逗号!)

You are maintaining a library with 5 functions, all of which currently returns a string representation of an integer to be displayed in the user interface. You need to update the output format of those functions while continuing to follow the single-responsibility and open-close principles.

您正在维护一个具有5个函数的库,所有这些函数当前都返回要在用户界面中显示的整数的字符串表示形式。 您需要更新这些函数的输出格式,同时继续遵循单一职责和开放-关闭原则。

Since you work in a test-driven company, the software developer in test has already started the project by adapting the unit tests to match the new requirements.

由于您在测试驱动的公司工作,因此测试中的软件开发人员已经通过调整单元测试以适应新要求来启动了该项目。

Old unit tests:

旧单元测试:

assert one_million() == ""
assert one_billion() == ""
assert times_100(7) == "700"
assert minus_10000() == ""
assert multiply(300, 800) == ""

New unit tests:

新的单元测试:

assert one_million() == "1,000,000"
assert one_billion() == "1,000,000,000"
assert times_100(7) == "700"
assert minus_10000() == "140,000"
assert multiply(300, 800) == "240,000"

The decorator you decide to write has this overall structure:

您决定编写的装饰器具有以下总体结构:

def add_commas(func):
def add_commas_wrapper():
# call original function
# add in the commas
# return the result
return add_commas_wrapper

In other words, it is a wrapper function that takes in the original function as an argument, then returns an inner function that will call the original function and also add in the commas.

换句话说,它是一个包装函数,将原始函数作为参数,然后返回一个内部函数,该函数将调用原始函数并添加逗号。

最简单的装饰器版本(无参数) (Simplest version of decorator (no arguments))

Rather than trying to write a decorator with complete functionality all at once, let’s start with something very basic, which only handles the library functions without any arguments.

让我们从一个非常基本的东西开始,而不是一次全部编写具有完整功能的装饰器,它只处理没有任何参数的库函数。

def add_commas(func):
def add_commas_wrapper():
original_string = func()
# needs to be int for string formatting
original_int = int(original_string)
# we are ignoring locale, using default thousands sep
return f'{original_int:,}'
return add_commas_wrapper

Now that we have that function, we can modify the existing one_million and one_billion functions:

现在有了该函数,我们可以修改现有的one_millionone_billion函数:

def one_million():
return ""def one_billion():
return ""one_million = add_commas(one_million)
one_billion = add_commas(one_billion)

Now we are passing two of the unit tests:

现在,我们通过了两个单元测试:

assert one_million() == "1,000,000" ✅
assert one_billion() == "1,000,000,000" ✅
assert times_100(7) == "700" ✅ # "accidentally" passing, <1000
assert minus_10000() == "140,000" ❌ # returns ""
assert multiply(300, 800) == "240,000" ❌ # returns ""

添加语法糖 (Adding the syntactic sugar)

You might be wondering…what about that @ symbol we were using earlier? How is this a “decorator” like functools.lru_cache if there’s no @ symbol? Well, the @ symbol here is a form of syntactic sugar.

您可能想知道……我们之前使用的@符号是什么? 如果没有@符号,它如何像functools.lru_cache这样的“装饰器”? 好吧,这里的@符号是一种语法糖。

Syntactic sugar is a concept that extends beyond Python specifically. It means some kind of syntax that shortens a common bit of code, usually to make it easier to read and/or harder to mess up.

句法糖 这个概念专门超越了Python。 这意味着某种语法会缩短一段通用代码,通常会使其更易于阅读和/或更难弄乱。

Let’s revise the previous code snippet to use decorator syntactic sugar:

让我们修改前面的代码片段以使用装饰器语法糖:

@add_commas
def one_million():
return ""@add_commas
def one_billion():
return ""

It’s the same number of lines of code, and it’s actually doing the exact same thing as the previous snippet. But it’s a little cleaner (with fewer parentheses and repetitions of the name of the function), and someone looking at the function definitions (lines starting with def) will be immediately aware that some decoration is happening beyond what they can see in the function, in case later they’re trying to debug the function. In the previous syntax, the decoration is separated from the definition in a way that could end up being confusing, especially with more or longer functions than these trivial examples.

它的代码行数相同,并且实际上在做与上一片段完全相同的事情。 但这比较干净(函数名称的括号和重复次数更少),并且查看函数定义(以def开头的行)的人会立即意识到,正在发生一些修饰,超出了他们在函数中可以看到的范围,万一稍后他们试图调试该功能。 在以前的语法中,修饰与定义的分离方式最终可能会造成混淆,尤其是与这些琐碎示例相比,其功能更多或更长。

容纳论点 (Accommodating arguments)

What if we just try to add this decorator to the times_100 function?

如果我们只是尝试将此装饰器添加到times_100函数中怎么办?

@add_commas
def times_100(num):
return str(num * 100)print(times_100(7))
python爬虫可以爬取哪些有用的东西_python编程入门 (https://mushiming.com/)  第3张

😱 what happened? Well, if we look back at the definition of add_commas_wrapper, it’s:

happened发生了什么事? 好吧,如果我们回顾一下add_commas_wrapper的定义,那就是:

def add_commas(func):
def add_commas_wrapper():
original_string = func()
# needs to be int for string formatting
original_int = int(original_string)
# we are ignoring locale, using default thousands sep
return f'{original_int:,}'
return add_commas_wrapper

Just like the error message said, that function takes 0 arguments. How can we adapt it to work with the num argument of times_100 while not breaking the functionality of one_million and one_billion, which don’t take any arguments?

就像错误消息说的那样,该函数接受0个参数。 我们如何在不破坏不带任何参数的one_millionone_billion的功能的one_millionone_billion适应使用times_100num参数?

We could add some kind of custom logic, like the if statements in the nth_degree nested function, but that means we might need to repeatedly modify add_commas to handle different numbers of parameters. We shouldn’t need to do that, since really all this function needs to do is start with a string and put commas in that string…it shouldn’t matter how many arguments were passed in to the original function.

我们可以添加某种自定义逻辑,例如nth_degree嵌套函数中的if语句,但这意味着我们可能需要反复修改add_commas以处理不同数量的参数。 我们不需要这样做,因为实际上所有此功能所需要做的就是从一个字符串开始,并在该字符串中添加逗号……与向原始函数中传递多少参数无关紧要。

Fortunately there is a technique to accept an arbitrary number of arguments (i.e. variable-length arguments) in Python! The traditional syntax for this is *args, **kwargs, meaning 0 or more positional arguments (args) then 0 or more keyword arguments (kwargs). So that would work for one_million (0 arguments of any kind), times_100 (1 positional argument), multiply (2 positional arguments), and any other number of positional or keyword arguments.

幸运的是,有一种技术可以在Python中接受任意数量的参数(即, 变长参数 )! 传统的语法是*args, **kwargs ,表示0个或多个位置参数( args ),然后是0个或多个关键字参数( kwargs )。 因此,这将适用于one_million (任何类型的0个参数), times_100 (1个位置参数), multiply (2个位置参数)以及任何其他数量的位置或关键字参数。

Let’s add variable-length arguments:

让我们添加可变长度参数:

def add_commas(func):
def add_commas_wrapper(*args, **kwargs): original_string = func(*args, **kwargs)
# needs to be int for string formatting
original_int = int(original_string)
# we are ignoring locale, using default thousands sep
return f'{original_int:,}'
return add_commas_wrapper

And add the decorator syntax to all the functions:

并将装饰器语法添加到所有函数中:

@add_commasdef one_million():
return ""@add_commasdef one_billion():
return ""@add_commasdef times_100(num):
return str(num * 100)@add_commasdef minus_10000(num):
return str(num - 10000)@add_commasdef multiply(num1, num2):
return str(num1 * num2)

Now we are passing all tests!

现在我们通过了所有测试!

assert one_million() == "1,000,000" ✅
assert one_billion() == "1,000,000,000" ✅
assert times_100(7) == "700" ✅
assert minus_10000() == "140,000" ✅
assert multiply(300, 800) == "240,000" ✅

🎉 🎉 🎉 🎉 🎉

🎉🎉🎉🎉🎉

还有一件事 (One more thing)

If you want to use decorators in production, there’s one more thing you’ll want to include, although it’s irrelevant for passing these particular tests. You might have noticed with the error message earlier that it reported to be from add_commas_wrapper, not times_100. In that example, it wasn’t particularly relevant, but we could imagine another scenario when someone tries to use a float rather than an int, or does something else that breaks the logic inside of add_commas, when we would want to be able to introspect and see the original function causing the error, not the wrapper.

如果要在生产中使用装饰器,则还需要添加一件事,尽管与通过这些特定测试无关。 您可能已经注意到了错误消息,指出它来自add_commas_wrapper ,而不是来自times_100 。 在该示例中,它并不是特别相关,但是我们可以想象出另一种情况,当某人试图使用浮点数而不是int或做其他事情破坏了add_commas内部的逻辑时,我们希望能够进行内省并查看导致错误的原始函数,而不是包装器。

You can find more details about what’s happening and how to address it manually in this tutorial, but luckily there is yet again a function inside of the built-in functools module that can handle this for us! It is called wraps, and the documentation can be found here.

您可以在本教程中找到有关正在发生的事情以及如何手动解决它的更多详细信息,但幸运的是,内置functools模块内部仍然有一个函数可以为我们解决这个问题! 这称为wraps ,可以在此处找到文档。

The final version of our code with wraps added is:

添加了自动wraps的代码的最终版本是:

import functoolsdef add_commas(func):
@functools.wraps
def add_commas_wrapper(*args, **kwargs):
original_string = func(*args, **kwargs)
# needs to be int for string formatting
original_int = int(original_string)
# we are ignoring locale, using default thousands sep
return f'{original_int:,}'
return add_commas_wrapper@add_commasdef one_million():
return ""@add_commasdef one_billion():
return ""@add_commasdef times_100(num):
return str(num * 100)@add_commasdef minus_10000(num):
return str(num - 10000)@add_commasdef multiply(num1, num2):
return str(num1 * num2)

回顾 (Recap)

查看我们的更新代码 (Reviewing our updated code)

The test suite passes, meaning we have successfully implemented the specified “tacked-on” functionality ✅

测试套件通过,这意味着我们已经成功实现了指定的“附加”功能✅

Single-responsibility principle

单一责任原则

Our computational functions are still just doing computation (standing in for the more-sophisticated computations or queries you would do in a real application). We also have a wrapper function that adjusts the output format of the functions, and it’s not concerned with the number of arguments in the original function, it just always converts a string representation of an integer into a format with commas.

我们的计算功能仍然只是在做计算(代表您在实际应用中可能会做的更复杂的计算或查询)。 我们还有一个包装函数,用于调整函数的输出格式,它与原始函数中的参数数量无关,它只是始终将整数的字符串表示形式转换为带逗号的格式。

Open-closed principle

开闭原理

We did not make any changes to the contents or the call signature of the original functions. But, by adding 15 lines of additional code, we were able to “open” the code for extension to meet the new stakeholder requirements.

我们没有对原始函数的内容或调用签名进行任何更改。 但是,通过添加15行附加代码,我们能够“打开”代码进行扩展,以满足新的涉众需求。

回顾我们学到的东西 (Reviewing what we learned)

Some kinds of scenarios, which I call “tacked-on” scenarios, can be challenging to implement without violating the SOLID principles, particularly the single-responsibility principle and the open-closed principle. The decorator approach, and Python decorators in particular, are helpful for tackling this challenge.

不违反SOLID原则 (尤其是单一责任原则和开放-封闭原则)的情况下 ,实施某些情景(我称为“固定”情景 )可能会面临挑战。 装饰器方法(尤其是Python装饰器)有助于解决这一难题。

In some cases, someone has already written a decorator that handles the “tacked-on” scenario you need, especially for common scenarios like caching, logging, and access control. In other cases, you need to write your own decorator for custom functionality.

在某些情况下,有人已经编写了一个装饰程序来处理您需要的“固定”方案,尤其是对于诸如缓存,日志记录和访问控制之类的常见方案。 在其他情况下,您需要编写自己的装饰器以实现自定义功能。

In order to write your own decorator, you need to recall that Python functions are “first-class” objects, particularly that they can be assigned to variables, can be passed to other functions, and can be nested. With that knowledge in mind, and the help of syntactic sugar and functools.wraps to ensure proper introspection, you can write your own decorators to handle these “tacked-on” tasks while following the SOLID principles.

为了编写自己的装饰器,您需要回想起Python函数是“一流”对象 ,尤其是它们可以分配给变量,可以传递给其他函数并可以嵌套。 考虑到这些知识,并在语法糖functools.wraps的帮助下确保正确的自省,您可以在遵循SOLID原则的同时编写自己的装饰器来处理这些“棘手的”任务。

Thanks for reading, and let me know in the comments if you know of any other notable uses of decorators or additional tutorials!

感谢您的阅读,如果您知道装饰器或其他教程的其他显着用途,请在评论中告诉我!

翻译自: https://medium.com/swlh/decorators-in-python-why-and-how-to-use-them-and-write-your-own-c1da4ed9f3a9

使用python编写嗅探器

THE END

发表回复