develop_skills¶
- Author
captcha developers group
- Date
2019-08-05T08:10:11.588674+08:00
write_the_doc¶
- Author
RYefccd
- Date
2019-08-08T08:26:11.588674+08:00
文档环境及其依赖¶
pip install sphinx
quickstart¶
(sphinx) ubuntu@pytorch:~/sphinx$ sphinx-quickstart demo/
Welcome to the Sphinx 2.1.2 quickstart utility.
Please enter values for the following settings (just press Enter to
accept a default value, if one is given in brackets).
Selected root path: demo/
You have two options for placing the build directory for Sphinx output.
Either, you use a directory "_build" within the root path, or you separate
"source" and "build" directories within the root path.
> Separate source and build directories (y/n) [n]: y
The project name will occur in several places in the built documentation.
> Project name: myproject
> Author name(s): fccd
> Project release []: 0.0.1
If the documents are to be written in a language other than English,
you can select a language here by its language code. Sphinx will then
translate text that it generates into that language.
For a list of supported codes, see
https://www.sphinx-doc.org/en/master/usage/configuration.html#confval-language.
> Project language [en]:
Creating file demo/source/conf.py.
Creating file demo/source/index.rst.
Creating file demo/Makefile.
Creating file demo/make.bat.
Finished: An initial directory structure has been created.
You should now populate your master file demo/source/index.rst and create other documentation
source files. Use the Makefile to build the docs, like so:
make builder
where "builder" is one of the supported builders, e.g. html, latex or linkcheck.
(sphinx) ubuntu@pytorch:~/sphinx$ ls demo/
Makefile build make.bat source
(sphinx) ubuntu@pytorch:~/sphinx/demo$ make html
Running Sphinx v2.1.2
loading pickled environment... done
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 0 source files that are out of date
updating environment: 0 added, 0 changed, 0 removed
looking for now-outdated files... none found
no targets are out of date.
build succeeded.
The HTML pages are in build/html.

初始配置¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | # Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# http://www.sphinx-doc.org/en/master/config
# -- Path setup --------------------------------------------------------------
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
# import os
# import sys
# sys.path.insert(0, os.path.abspath('.'))
# -- Project information -----------------------------------------------------
project = 'myproject'
copyright = '2019, fccd'
author = 'fccd'
# The full version, including alpha/beta/rc tags
release = '0.0.1'
# -- General configuration ---------------------------------------------------
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
]
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = []
# -- Options for HTML output -------------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'alabaster'
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
|
修改配置¶
read the doc 风格文档¶
依赖:
pip install sphinx-rtd-theme
修改配置:
1 2 3 4 5 6 |
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
# html_theme = 'alabaster'
html_theme = 'sphinx_rtd_theme'
|

支持 markdown 格式¶
sphinx 默认支持 restructureText 格式, 如果需要支持 markdown 格式, 需要导入依赖和修改相关配置.
依赖:
pip install recommonmark
1 2 3 4 5 | # markdown support
from recommonmark.parser import CommonMarkParser
source_parsers = {'.md': CommonMarkParser}
source_suffix = ['.rst', '.md']
|
支持 markdown table 格式¶
感谢胡达聪提供协助
sphinx 默认支持 restructureText 格式, 如果需要支持 markdown 格式, 需要导入依赖和修改相关配置. 目前 recommonmark 不支持 markdown table 的渲染, 如果需要支持, 请把 sphinx_markdown_tables 添加到 conf.py 配置文件中.
依赖:
pip instal sphinx-markdown-tables
extensions = [
'sphinx_markdown_tables',
]
支持ipynb(notebook)文件格式¶
感谢胡达聪提供协助
依赖:
pip install nbsphinx
1 2 3 4 5 6 7 | # Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'nbsphinx',
'sphinx.ext.mathjax',
]
|
支持中文搜索¶
感谢黄奇鹏提供配置
依赖:
pip install jieba
1 2 | # 支持中文搜索
html_search_language = 'zh'
|
构建中文pdf¶
https://docs.readthedocs.io/en/stable/guides/pdf-non-ascii-languages.html
For docs that are not written in Chinese or Japanese,
and if your build fails from a Unicode error,
then try xelatex
as the latex_engine
instead of the default pdflatex
in your conf.py
:
latex_engine = 'xelatex'
When Read the Docs detects that your documentation is in Chinese or Japanese, it automatically adds some defaults for you.
For Chinese projects, it appends to your conf.py
these settings:
latex_engine = 'xelatex'
latex_use_xindy = False
latex_elements = {
'preamble': '\\usepackage[UTF8]{ctex}\n',
}
autobuild¶
感谢曹佳豪提供此技巧分享
在撰写文档时, 每次想要看到效果都要执行 make html 才能看到渲染的 html 文档. 为了能够 提升编辑文档的效率, 建议使用 autobuild 扩展. 一旦我们修改相关的文档和 conf.py 配置时, 就会自动触发构建.
依赖:
pip install sphinx-autobuild
使用:
sphinx-autobuild source build/html -H 0.0.0.0 -p 8000
最后在 http://localhost:8000 访问即可.
或者把这个放在 Makefile 中:
1 2 | livehtml:
mkdir -p $(BUILDDIR)/html/
|
make livehtml
日常撰写文档流程¶
开启自动文档构建
sphinx-autobuild source build/html -p 8000 -H 0.0.0.0 # 或者 make livehtml # 参见上面的 Makefile 配置
撰写文档
(sphinx) ubuntu@pytorch:~/sphinx/demo$ touch source/restructText_demo.rst (sphinx) ubuntu@pytorch:~/sphinx/demo$ touch source/markdown_StackEdit.md ... (sphinx) ubuntu@pytorch:~/sphinx/demo$ tree -l 2 . . ├── Makefile ├── make.bat └── source ├── _static ├── _templates ├── birthday-paradox.png ├── conf.py ├── index.rst ├── markdown_StackEdit.md └── restructText_demo.rst
保存(crtl+s)触发自动构建
在 http://localhost:8000/ 查看文档即可.

文档展示¶
rst 语法参考: https://3vshej.cn/rstSyntax/index.html
个人笔记: https://write-docs.readthedocs.io/en/latest/index.html
python-cookbook: https://python-cookbook-3rd-edition.readthedocs.io/zh_CN/latest/
unittest and pytest¶
- Author
RYefccd
- Date
2019-08-15T09:41:59.179550+08:00
测试¶
单元测试¶
1 2 3 4 5 6 7 | def my_demo_func(a, b):
tmp = []
if a > 6 and b > 9:
tmp.append("F")
else:
tmp.append("T")
print(tmp)
|
语句覆盖(Statement Coverage)¶
executable statements
in the source code at least once. It is used to calculate and
measure the number of statements in the source code which can be executed given the requirements.
传入参数 a, b。 观察语句覆盖情况。
假设给定 a=7, b=10, 代码执行覆盖如下:
1 2 3 4 5 6 7
def my_demo_func(a, b): tmp = [] if a > 6 and b > 9: tmp.append("F") else: tmp.append("T") print(tmp)
Statement Coverage: 5/7 = 71%
Unused Statements
Dead Code
Unused Branches
判定覆盖(Decision Coverage)¶
Boolean expression
. In this coverage, expressions can sometimes get complicated. Therefore, it is very hard to achieve 100% coverage.
对 a > 6 and b > 9
整个 Boolean expression 构造整体表达式为真或者为假的判定逻辑。
假设给定 a=7, b=10, 代码执行覆盖如下: (
a > 6 and b > 9
is True)1 2 3 4 5 6 7
def my_demo_func(a, b): tmp = [] if a > 6 and b > 9: tmp.append("F") else: tmp.append("T") print(tmp)
Statement Coverage: 5/7 = 71%
假设给定 a=1, b=10, 代码执行覆盖如下:(
a > 6 and b > 9
is False)1 2 3 4 5 6 7
def my_demo_func(a, b): tmp = [] if a > 6 and b > 9: tmp.append("F") else: tmp.append("T") print(tmp)
Statement Coverage: 6/7 = 85%
分支覆盖(Branch Coverage)¶
<table border="1" class="docutils"> <thead> <tr> <th>In the branch coverage, every outcome from a code module is tested. For example, if the outcomes are binary, you need to test both True and False outcomes.</th> </tr> </thead> <tbody> <tr> <td>It helps you to ensure that every possible branch from each decision condition is executed at least a single time.</td> </tr> <tr> <td></td> </tr> <tr> <td>By using Branch coverage method, you can also measure the fraction of independent code segments. It also helps you to find out which is sections of code don't have any branches.</td> </tr> <tr> <td></td> </tr> <tr> <td>The formula to calculate Branch Coverage:</td> </tr> </tbody> </table>

1 2 3 4 5 6 7 8 9 | def my_demo_func(a, b):
tmp = []
if a > 6 and b > 9:
tmp.append("F")
elif: a > 2:
pass
else:
tmp.append("T")
print(tmp)
|
在实际测试中, 分支覆盖是我们最为关注的. 哪些分支没有被覆盖, 是因为什么原因没有被覆盖......
Allows you to validate-all the branches in the code
Helps you to ensure that no branched lead to any abnormality of the program's operation
Branch coverage method removes issues which happen because of statement coverage testing
Allows you to find those areas which are not tested by other testing methods
It allows you to find a quantitative measure of code coverage
Branch coverage ignores branches inside the Boolean expressions
条件覆盖(Condition Coverage)¶

对于 a > 6 and b > 9
整个 Boolean expression, 我们有两个条件 a > 6 和 b > 9.
test |
a > 6 |
b > 9 |
---|---|---|
a=3, b=3 |
F |
F |
a=3, b=13 |
F |
T |
a=9, b=3 |
T |
F |
a=9, b=13 |
T |
T |
条件覆盖
pytest¶
方便的 assert 语句(不需要记忆各种 self.assert* 断言函数)
自动发现测试模块和测试函数
模块化的 fixture, 可以更加容易组织测试结构。
兼容 unittest 测试用例, 无缝对接原有测试用例。
example¶
(server18) ryefccd@fccd:~/workspace/pytest_demo$ tree -L 2
.
├── myproject
│ ├── handler.py
│ ├── __init__.py
│ ├── mathexample.py
│ └── __pycache__
├── requirement_dev.txt
└── tests
├── __init__.py
├── __pycache__
├── test_math_opration.py
└── test_tornado_client.py
4 directories, 7 files
普通模块¶
功能代码
mathexample.py¶1 2 3 4 5 6 7 8 9 10 11 12 13
''' Created on 2019年8月15日 @author: ryefccd ''' def add_two(num1, num2): return num1 + num2 def sub_two(num1, num2): return num1 - num2
功能测试
test_math_opration.py¶1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
''' Created on 2019年8月15日 @author: ryefccd ''' import pytest from myproject import mathexample @pytest.fixture(scope='module') def resource_a_setup(request): print('\nresources_a_setup()') def resource_a_teardown(): print('\nresources_a_teardown()') request.addfinalizer(resource_a_teardown) # 可以在这里回收数据库连接 return 1234567 def test_add_two(resource_a_setup): add = mathexample.add_two(resource_a_setup, 2) assert add == 1234567 + 2 def test_sub_two(): substract = mathexample.sub_two(1, 2) assert substract == -1 print("fccdny") # @pytest.mark.skip(msg='failure') def test_add_two_failure(): add = mathexample.add_two(1, 2) assert add == 4 if __name__ == '__main__': pass
执行测试
(server18) ryefccd@fccd:~/workspace/pytest_demo$ pytest tests/test_math_opration.py Test session starts (platform: linux, Python 3.5.2, pytest 5.0.1, pytest-sugar 0.9.2) rootdir: /home/ryefccd/workspace/pytest_demo plugins: sugar-0.9.2, metadata-1.8.0, allure-pytest-2.7.1, xdist-1.29.0, cov-2.7.1, forked-1.0.2, tornado-0.8.0, html-1.20.0 collecting ... tests/test_math_opration.py ✓✓ 67% ██████▋ ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― test_add_two_failure ――――――――――――――――――― def test_add_two_failure(): add = mathexample.add_two(1, 2) > assert add == 4 E assert 3 == 4 tests/test_math_opration.py:35: AssertionError tests/test_math_opration.py ⨯ 100% ██████████ Results (0.12s): 2 passed 1 failed - tests/test_math_opration.py:33 test_add_two_failure
web 框架¶
依赖 pytest-tornado
pip install pytest-tornado
功能代码
handler.py¶1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
''' Created on 2018年8月19日 @author: ryefccd ''' import json import asyncio import aioredis import tornado.web from myproject.mathexample import add_two, sub_two SERVER_REDIS_ADDRESS = ['192.168.1.200', 6379] class MainHandler(tornado.web.RequestHandler): def get(self): a = int(self.get_argument("a", "6")) b = int(self.get_argument("b", "2")) num_sum = add_two(a, b) num_delta = sub_two(a, b) res = {"sum": num_sum, "delta": num_delta, "a": a, "b": b} self.write(json.dumps(res)) application = tornado.web.Application([ (r"/", MainHandler), ]) def init_app(ioloop, application): redis_conn = std_loop.run_until_complete( aioredis.create_redis_pool(SERVER_REDIS_ADDRESS, db=0)) application.settings["REDIS_CONN"] = redis_conn if __name__ == '__main__': std_loop = asyncio.get_event_loop() init_app(std_loop, application) http_server = tornado.httpserver.HTTPServer(application) http_server.listen(8989) std_loop.run_forever()
功能测试
test_tornado_client.py¶1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
''' Created on 2019年8月15日 @author: ryefccd ''' import json import pytest from myproject import handler @pytest.fixture def app(): return handler.application @pytest.fixture(scope="function") def db_init(io_loop, app): std_ioloop = io_loop.asyncio_loop handler.init_app(io_loop, app) # 填充数据库数据 yield # 清楚数据库数据 conn = app.settings["REDIS_CONN"] conn.close() std_ioloop.run_until_complete(conn.wait_closed()) @pytest.mark.gen_test def test_tornao_request_success(http_client, base_url): url = base_url + "?a=7&b=2" print("url:", url) print("base_url:", base_url) print("http_client:", http_client) response = yield from http_client.fetch(url) assert response.code == 200 @pytest.mark.gen_test def test_tornao_request_fail(http_client, base_url): url = base_url + "?a=7&b=2" print("url:", url) print("base_url:", base_url) print("http_client:", http_client) response = yield from http_client.fetch(url) res = json.loads(response.body.decode()) print(res) assert response.code == 200 assert res["sum"] == 10 if __name__ == '__main__': pass
执行测试
(server18) ryefccd@fccd:~/workspace/pytest_demo$ pytest tests/test_tornado_client.py Test session starts (platform: linux, Python 3.5.2, pytest 5.0.1, pytest-sugar 0.9.2) rootdir: /home/ryefccd/workspace/pytest_demo plugins: sugar-0.9.2, metadata-1.8.0, allure-pytest-2.7.1, xdist-1.29.0, cov-2.7.1, forked-1.0.2, tornado-0.8.0, html-1.20.0 collecting ... tests/test_tornado_client.py ✓ 50% █████ ―――――――――――――――――――――――――――――――――――――――――――――――――― test_tornao_request_fail ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― http_client = <tornado.simple_httpclient.SimpleAsyncHTTPClient object at 0x7f1393bfd358>, base_url = 'http://localhost:34575' @pytest.mark.gen_test def test_tornao_request_fail(http_client, base_url): url = base_url + "?a=7&b=2" print("url:", url) print("base_url:", base_url) print("http_client:", http_client) response = yield from http_client.fetch(url) res = json.loads(response.body.decode()) print(res) assert response.code == 200 > assert res["sum"] == 10 E assert 9 == 10 tests/test_tornado_client.py:49: AssertionError --------------------------------------------------------------------- Captured stdout call -------------------------------------------------------------------------- url: http://localhost:34575?a=7&b=2 base_url: http://localhost:34575 http_client: <tornado.simple_httpclient.SimpleAsyncHTTPClient object at 0x7f1393bfd358> {'a': 7, 'delta': 5, 'sum': 9, 'b': 2} tests/test_tornado_client.py ⨯ 100% ██████████ Results (0.16s): 1 passed 1 failed - tests/test_tornado_client.py:39 test_tornao_request_fail
pytest 使用技巧¶
pytest --help # 查看帮助
- -v
详细的输出信息
- -s
不捕获标准输出(测试用例中的 print 会打印出来)
- -l
当用例错误时, 打印测试函数内局部变量信息
- -k EXPRESSION
执行用例包含"EXPRESSION"的用例
- -x, --exitfirst
当遇到错误时停止测试(当维护很多测试用例时, 最迫切需要的功能)
- --lf, --last-failed
跑上一次错误的测试用例
- --ff, --failed-first
跑所有的用例, 但是优先上一次错误的用例
- --pdb
错误的测试用例陷入 pdb 调试环境
参考资料: pytest introduction
Written with StackEdit by RYefccd in 2019-11-27T14:00:58.752295+08:00.
logging¶
简介¶
日志就是追踪在软件运行时产生的事件的方法. 一般由该软件的开发人员将日志记录调用添加到其代码中, 以指示已发生某些事件.
对于我们公司来说, 开发借助日志进行调试, 测试使用日志校验逻辑和功能, 运维监控日志提供预警, 数据依赖日志分析行为偏好.
作用如下:
信息搜集
故障排查
采样统计信息
审计
因此, 日志记录是非常有必要的. 作为开发者, 我们需要重视并做好日志记录过程.
logging 及其组件¶
在 Python 中有一个标准的 logging 模块,我们可以使用它来进行标注的日志记录,利用它我们可以更方便地进行日志记录,同时还可以做更方便的级别区分以及一些额外日志信息的记录,如时间、运行模块信息等。
Loggers
Filters
Handlers
Formatters
LogRecord
组件交互图:
logging flow
Logger¶
Logger 是用来执行日志流程的主类. 日志内容的产生, 转化, 过滤在此进行. Logger 默认是一个树型的继承体系. 默认就是的 logger 又叫做 root logger. 其他的 logger 均继承此 root logger. Logger 的层级通过 "." 来进行区分
如下, spam 是 spam.foo 的父 logger, 而 spam.foo 是 spam.foo.bar 的父 logger. (可以理解为一颗前缀树)
spam=logging.getLogger("spam")
spam_foo=logging.getLogger("spam.foo")
spam_foo_bar=logging.getLogger("spam.foo.bar")
spam_foo_bar.info("the message 1")
spam.info("the message 2")
层级¶
Logger 只负责生成日志内容, 并传递至父类的各级 Logger 中. 以上面的代码举例子,
"the message 1" 会在四个 Logger 都生成一个消息记录(LogRecord).
"the message 2" 会在两个 Logger 都生成一个消息记录(LogRecord).
如果要阻止向父类 Logger 传递日志内容, 请把 propagate 属性置为 False.
level¶
Logger 会过滤掉小于当前等级的日志内容. 默认是日志级别是 WARNING.
Level | Numeric value |
---|---|
CRITICAL |
50 |
ERROR |
40 |
WARNING |
30 |
INFO |
20 |
DEBUG |
10 |
NOTSET |
0 |
Handlers¶
Logger 负责日志的产生, 而日志的最终输出就由 Handlers 负责进行. 下面是输出到文件的例子:
import logging
logger = logging.getLogger("example")
logger.setLevel(level=logging.INFO)
handler = logging.FileHandler('example.log')
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
logger.addHandler(handler)
logger.info('info logs')
logger.debug('debugging logs')
logger.warning('warning logs')
logger.info('final info logs')
#日志输出
2019-11-26 16:39:35,005 - example - INFO - info logs
2019-11-26 16:39:35,005 - example - WARNING - warning logs
2019-11-26 16:39:35,005 - example - INFO - final info logs
logging:
StreamHandler: 日志输出流, 可以到标准输出(/dev/stdout), 标准错误输出(/dev/stderr)
FileHandler: 如上所示, 日志输出到文件.
NullHandler
: 空Handler, 只记录日志, 不输出日志.
logging.handlers
RotatingHandler: 按大小转储日志文件.
TimeRotatingHandler: 按时间周期转储日志文件.
SysLogHandler: 把日志输出到 syslog.
SMTPHandler: 把日志输出至指定邮件地址.
HTTPHandler: 通过 HTTP 协议输出日志.
除了这些常用的 Handler, 还有基于 TCP, UDP 的, 以及基于不同组件输出日志. 继承基类 Handler, 还可以实现更多的定制化的Handler, 比如你可以实现一个Handler 直接把日志输出至 kafka, 或者数据库等.
Formatters && LogRecord¶
控制日志的输出目的地是 Handler, 输出的格式就由 Formatters 指定. 上面例子中的格式如下:
# Format:
'%(asctime)s - %(name)s - %(levelname)s - %(message)s'
结果日志输出如下:
2019-11-26 16:39:35,005 - example - INFO - info logs
2019-11-26 16:39:35,005 - example - WARNING - warning logs
2019-11-26 16:39:35,005 - example - INFO - final info logs
%(asctime)s 输出本地时间 %(name)s 输出logger的名字 %(levelname)s 输出日志的级别 %(message)s 输出实际的日志内容
Format 和 LogRecord 的属性(Attribute)是一一对应的. 请参考: https://docs.python.org/3/library/logging.html#logrecord-attributes
日志模块详述¶
使用 logging 替代 print¶
很多人习惯在开发时使用 print 替代日志, 然后再代码发布时移除这些 print 语句. 随着代码层级的加深和逻辑的复杂, 使用 print 很难满足多变的日志格式的需求, 甚至你可能忘了去掉这些 print 语句. 各种信息混在一起, 使得日志的排查愈加困难.
日志(Logger)是单实例的¶
同一个 Logger 在不同模块获得都是一个实例. 如下所示:
spam=logging.getLogger("spam")
如果在 a, b 模块都是用此语句获得 spam Logger, 则这是同一个实例.
在日志中记录相关异常栈¶
把异常栈记录到日志中:
import logging
import traceback
try:
c = 7 / 0
except Exception as e:
logging.error("Exception occurred:%s", traceback.format_exc())
请使用 pythonic 的做法:
import logging
try:
c = 7 / 0
except Exception as e:
logging.error("Exception occurred", exc_info=True)
线上环境使用日志转储¶
在生产环境中, 程序会一直产生日志, 为了防止磁盘被日志文件占满. 可以按照文件大小转储或时间转储日志文件, 这样可以节省维护的压力.
使用合理的日志级别¶
合理安排输出日志的级别, 避免日志洪流. 尽可能思考输出不同等级的日志. 便于后期不同需求的人来排查逻辑.
库程序使用 NullHandler¶
如果是提供别人使用的程序库, 一般会把 warnning 信息输出到 stderr(标准错误输出). 但是如果应用的标准错误输出有其他的用途, 不希望任何引用的第三方库把日志输出到标准错误输出时或者输出到其他的任何输出位置, 那么可以使用一个占位符的"空"Handler, 它接收到日志后什么也不做. 如下所示,
# foo.py
import logging
logging.getLogger('foo').addHandler(logging.NullHandler())
参考:https://docs.python.org/3/howto/logging.html#configuring-logging-for-a-library
It is strongly advised that you do not add any handlers other than NullHandler
to your library’s loggers. This is because the configuration of handlers is the prerogative of the application developer who uses your library. The application developer knows their target audience and what handlers are most appropriate for their application: if you add handlers ‘under the hood’, you might well interfere with their ability to carry out unit tests and deliver logs which suit their requirements.
使用 name 作为 Logger 名字¶
这样可以使得 Logger 的名字和模块的名字统一. 可以用 Formatters 在日志中显示记录日志的模块, 行号. 便于定位程序逻辑的位置. 并且如上面的例子所示, 恰好也满足日志的继承层级,
spam_foo=logging.getLogger("spam.foo")
这样可以只给父类 Logger 设置 Handler, 那么模块中所有子子模块输出的日志都能输出.
在应用中使用 basicConfig 日志配置¶
在应用中有一个方便的初始化 root logger 的方法, logging.basicConfig.可以 在应用程序启动时调用此方法, 然后按照 logger 的继承体系, 就可以搜集到所有logger输出的日志信息.
如果 root logger 没有定义任何 handler, 日志函数 debug, info, warning, error, critical 将会自动调用此函数. 默认输出到标准错误(stderr)输出上.
import logging
logging.basicConfig(filename='example.log',level=logging.DEBUG)
多进程写日志¶
因为我们的线上服务 tornado 使用多进程启动. 一旦在开始时配置了日志, 又使用日志大小转储. 那么多个进程就会往一个日志文件里面写日志. 直到 某一个进程写入一个新的日志记录时发现需要转储, 它便转储当前日志文件, 并重新创建一个新的日志文件. 但是其他进程并不知道, 仍然维持之前被重命名转储的那个文件(linux中不是靠文件名来识别文件, 而是文件句柄, 一个数字).当另一个进程也写一条日志时, 发现也需要转储, 会重新转储当前日志, 并重新写新的日志.结果就是会导致日志的混乱.
单进程写日志¶
代码如下:
import os
import sys
import logging
from logging.handlers import RotatingFileHandler
rotate_hd = RotatingFileHandler(filename="/tmp/whtest/test.log",
maxBytes=1*1024, backupCount=2)
f_format = logging.Formatter('%(asctime)s - %(name)s - %(process)d - %(message)s')
rotate_hd.setFormatter(f_format)
logging.basicConfig(handlers=[rotate_hd], level=logging.DEBUG)
logging.debug('This is a debug message')
logging.info('This is an info message')
logging.warning('This is a warning message')
logging.error('This is an error message')
logging.critical('This is a critical message')
pid = os.getpid()
for i in range(100):
logging.info("info message")
结果如下:
ryefccd@fccd:/tmp/whtest$ ll
总用量 152
drwxrwxr-x 2 ryefccd ryefccd 4096 11月 27 12:10 ./
drwxrwxrwt 15 root root 135168 11月 27 12:09 ../
-rw-rw-r-- 1 ryefccd ryefccd 990 11月 27 12:10 test.log
-rw-rw-r-- 1 ryefccd ryefccd 990 11月 27 12:10 test.log.1
-rw-rw-r-- 1 ryefccd ryefccd 990 11月 27 12:10 test.log.2
多进程写日志¶
代码如下:
import os
import sys
import logging
from logging.handlers import RotatingFileHandler
rotate_hd = RotatingFileHandler(filename="/tmp/whtest/test.log",
maxBytes=1*1024, backupCount=2)
f_format = logging.Formatter('%(asctime)s - %(name)s - %(process)d - %(message)s')
rotate_hd.setFormatter(f_format)
logging.basicConfig(handlers=[rotate_hd], level=logging.DEBUG)
logging.debug('This is a debug message')
logging.info('This is an info message')
logging.warning('This is a warning message')
logging.error('This is an error message')
logging.critical('This is a critical message')
# pid = os.getpid()
for _ in range(3):
pid = os.fork()
for i in range(100):
logging.info("info message")
结果如下:
--- Logging error ---
Traceback (most recent call last):
File "/home/ryefccd/python3.5/python3.5.2/lib/python3.5/logging/handlers.py", line 72, in emit
self.doRollover()
File "/home/ryefccd/python3.5/python3.5.2/lib/python3.5/logging/handlers.py", line 169, in doRollover
os.rename(sfn, dfn)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/whtest/test.log.1' -> '/tmp/whtest/test.log.2'
Call stack:
File "logging_test.py", line 21, in <module>
logging.info("info message")
Message: 'info message'
Arguments: ()
ryefccd@fccd:/tmp/whtest$ ll
总用量 152
drwxrwxr-x 2 ryefccd ryefccd 4096 11月 27 12:20 ./
drwxrwxrwt 15 root root 135168 11月 27 12:19 ../
-rw-rw-r-- 1 ryefccd ryefccd 540 11月 27 12:20 test.log
-rw-rw-r-- 1 ryefccd ryefccd 972 11月 27 12:20 test.log.1
-rw-rw-r-- 1 ryefccd ryefccd 540 11月 27 12:20 test.log.2
最终方案¶
最简单, 也是最直接可靠的办法就是一个进程写一个日志. 在fork之后初始化日志并更根据进程号写不同的日志文件.
RotatingFileHandler 进行转储时不仅判断大小还要判断当前的文件名. 优先判断当前文件的文件名, 如果不是原始的文件名, 则修改为往最开始的文件名中写入日志.(没来得及做)
best practice¶
日志记录要有意义.
日志记录最好包含上下文信息.
日志要结构化以及设计不同的层级, 便于解析和阅读
日志不能包含太少或者太多的信息.
复杂的应用可以分不同的应用或者模块输出到不同的日志文件中.
让日志适应开发, 测试, 线上不同的环境, 方便不同人员通过日志协同.
引用¶
Written with StackEdit.
debug¶
pdb(ipdb)¶
demo¶
start.py
import foo
import bar
tmp1 = "123"
tmp2 = "3f6"
print("tmp1 is num: %s by foo.is_num func" % foo.is_num(tmp1))
print("tmp2 is num: %s by foo.is_num func" % foo.is_num(tmp2))
print("tmp1 is num: %s by bar.is_num func" % bar.is_num(tmp1))
print("tmp2 is num: %s by bar.is_num func" % bar.is_num(tmp2))
foo.py
def is_num(x):
flag = x.isnumeric()
return flag
bar.py
def is_num(x):
start = ord('0')
end = ord('9')
flag = True
for i in x:
num = ord(i)
if not start <= num <= end:
flag = False
break
return flag
introduction¶
进入调试的两种方法
import pdb; pdb.set_trace()
python -m pdb start.py
use ipdb
pip install ipdb
python -m ipdb start.py
printing expressions
> /home/ryefccd/env/server18/debug/start.py(6)<module>()
5 tmp1 = "123"
----> 6 tmp2 = "3f6"
7
ipdb> p tmp1
'123'
ipdb> p tmp2
*** NameError: name 'tmp2' is not defined
stepping through code with
n
(next) ands
(step)
ipdb> ll
1
2 import foo
3 import bar
4
5 tmp1 = "123"
6 tmp2 = "3f6"
7
----> 8 print("tmp1 is num: %s by foo.is_num func" % foo.is_num(tmp1))
9 print("tmp2 is num: %s by foo.is_num func" % foo.is_num(tmp2))
10
11 print("tmp1 is num: %s by bar.is_num func" % bar.is_num(tmp1))
12 print("tmp2 is num: %s by bar.is_num func" % bar.is_num(tmp2))
ipdb> n
tmp1 is num: %s by foo.is_num func True
> /home/ryefccd/env/server18/debug/start.py(9)<module>()
8 print("tmp1 is num: %s by foo.is_num func" % foo.is_num(tmp1))
----> 9 print("tmp2 is num: %s by foo.is_num func" % foo.is_num(tmp2))
10
ipdb> s
--Call--
> /home/ryefccd/env/server18/debug/foo.py(1)is_num()
----> 1 def is_num(x):
2 flag = x.isnumeric()
3 return flag
using breakpoints
ipdb> w
/home/ryefccd/python3.5/python3.5.2/lib/python3.5/bdb.py(431)run()
430 try:
--> 431 exec(cmd, globals, locals)
432 except BdbQuit:
<string>(1)<module>()
> /home/ryefccd/env/server18/debug/start.py(11)<module>()
10
---> 11 print("tmp1 is num: %s by bar.is_num func" % bar.is_num(tmp1))
12 print("tmp2 is num: %s by bar.is_num func" % bar.is_num(tmp2))
ipdb> ll
...
8 print("tmp1 is num: %s by foo.is_num func" % foo.is_num(tmp1))
9 print("tmp2 is num: %s by foo.is_num func" % foo.is_num(tmp2))
10
---> 11 print("tmp1 is num: %s by bar.is_num func" % bar.is_num(tmp1))
12 print("tmp2 is num: %s by bar.is_num func" % bar.is_num(tmp2))
ipdb> b bar:3
Breakpoint 1 at /home/ryefccd/env/server18/debug/bar.py:3
ipdb> c
> /home/ryefccd/env/server18/debug/bar.py(3)is_num()
2 start = ord('0')
1---> 3 end = ord('9')
4 flag = True
continuing execution with
unt
(until)
ipdb> ll
1 def is_num(x):
2 start = ord('0')
1---> 3 end = ord('9')
4 flag = True
5 for i in x:
6 num = ord(i)
7 if not start <= num <= end:
8 flag = False
9 break
10 return flag
ipdb> l
ipdb> unt 8
> /home/ryefccd/env/server18/debug/bar.py(10)is_num()
8 flag = False
9 break
---> 10 return flag
ipdb> p flag
True
displaying expressions
> /home/ryefccd/env/server18/debug/bar.py(3)is_num()
2 start = ord('0')
1---> 3 end = ord('9')
4 flag = True
ipdb> ll
1 def is_num(x):
2 start = ord('0')
1---> 3 end = ord('9')
4 flag = True
5 for i in x:
6 num = ord(i)
7 if not start <= num <= end:
8 flag = False
9 break
10 return flag
ipdb> display i, num
display i, num: ** raised NameError: name 'i' is not defined **
ipdb> b 7
Breakpoint 2 at /home/ryefccd/env/server18/debug/bar.py:7
ipdb> b
Num Type Disp Enb Where
1 breakpoint keep yes at /home/ryefccd/env/server18/debug/bar.py:3
breakpoint already hit 1 time
2 breakpoint keep yes at /home/ryefccd/env/server18/debug/bar.py:7
ipdb> c
> /home/ryefccd/env/server18/debug/bar.py(7)is_num()
6 num = ord(i)
2---> 7 if not start <= num <= end:
8 flag = False
display i, num: ('1', 49) [old: ** raised NameError: name 'i' is not defined **]
ipdb> c
> /home/ryefccd/env/server18/debug/bar.py(7)is_num()
6 num = ord(i)
2---> 7 if not start <= num <= end:
8 flag = False
display i, num: ('2', 50) [old: ('1', 49)]
ipdb> c
> /home/ryefccd/env/server18/debug/bar.py(7)is_num()
6 num = ord(i)
2---> 7 if not start <= num <= end:
8 flag = False
display i, num: ('3', 51) [old: ('2', 50)]
finding the caller of a function(where)
ipdb> w
/home/ryefccd/python3.5/python3.5.2/lib/python3.5/bdb.py(431)run()
430 try:
--> 431 exec(cmd, globals, locals)
432 except BdbQuit:
<string>(1)<module>()
/home/ryefccd/env/server18/debug/start.py(11)<module>()
10
---> 11 print("tmp1 is num: %s by bar.is_num func" % bar.is_num(tmp1))
12 print("tmp2 is num: %s by bar.is_num func" % bar.is_num(tmp2))
> /home/ryefccd/env/server18/debug/bar.py(7)is_num()
6 num = ord(i)
2---> 7 if not start <= num <= end:
8 flag = False
cheatsheet¶
帮助
Use h(elp) or ? to list all commands.
控制
n(ext) -> Continue execution until the next line in the current function is reached or it returns.
s(tep) -> Execute the current line, stop at the first possible occasion (either in a function that is called or on the next line in the current function).
r(eturn) -> Continue execution until the current function returns.
u( p) and d(own) -> Move the current frame count (default one) levels up/down in the stack trace (to an older/newer frame).
c(ont(inue)) can be useful if you have multiple breakpoints, it continues execution until a next breakpoint is encountered.
unt(il) [lineno] -> Without argument, continue execution until the line with a number greater than the current one is reached. -> useful to get out of a for loop.
b(reak) [lineno] and cl(ear) to set / clear a break point in the current file (it even accepts a condition).
打印上下文
l(ist) -> List source code for the current file. Without arguments, list 11 lines around the current line or continue the previous listing.
w(here) -> Print a stack trace, with the most recent frame at the bottom. An arrow indicates the current frame, which determines the context of most commands. -> handy for web frameworks
bt -> Get a stack trace of the functions that have been called so far.
pp expression -> Like the p command, except the value of the expression is pretty-printed using the pprint module -> very useful for nested data structures.
相比 gdb 的缺陷¶
不能附加在一个已经开始运行的进程中.
参考资料¶
https://realpython.com/python-debugging-pdb/#displaying-expressions
gdb¶
gdb 是 gnu 自由软件维护的一个调试工具. 支持多种语言的调试.
使用场景¶
There are types of bugs that are difficult to debug from within Python:
segfaults (not uncaught Python exceptions)
hung processes (in cases where you can't get a Python traceback or debug with pdb)
out of control daemon processes
安装¶
ubuntu
sudo apt-get install gdb python3-dbg
centos
sudo yum install gdb python-debuginfo
使用¶
实际情况中, 一般是在上面所说的异常时通过下面的命令进入进程.
gdb python3 -p 进程号
注意, 这个 python 一定是要上面装了调试信息的那个Python解释器.
首先开启进程
ryefccd@fccd:~/env/server18/debug$ python3.4 start.py
tmp1 is num: True by foo.is_num func
tmp2 is num: False by foo.is_num func
然后附加到进程进行调试
ryefccd@fccd:~/env/server18/debug$ ps -ef f|grep python3.4
ryefccd 18600 17607 0 17:02 pts/39 S+ 0:00 | | \_ grep --color=auto python3.4
ryefccd 18477 24591 0 17:01 pts/41 S+ 0:00 | \_ python3.4 start.py
ryefccd@fccd:~/env/server18/debug$ sudo gdb python3.4 -p 18477
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.3) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python3.4...Reading symbols from /usr/lib/debug//usr/bin/python3.4m...done.
done.
Attaching to program: /usr/bin/python3.4, process 18477
Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libpthread-2.19.so...done.
done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Loaded symbols for /lib/x86_64-linux-gnu/libpthread.so.0
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.19.so...done.
done.
Loaded symbols for /lib/x86_64-linux-gnu/libc.so.6
Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libdl-2.19.so...done.
done.
Loaded symbols for /lib/x86_64-linux-gnu/libdl.so.2
Reading symbols from /lib/x86_64-linux-gnu/libutil.so.1...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libutil-2.19.so...done.
done.
Loaded symbols for /lib/x86_64-linux-gnu/libutil.so.1
Reading symbols from /lib/x86_64-linux-gnu/libexpat.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libexpat.so.1
Reading symbols from /lib/x86_64-linux-gnu/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libz.so.1
Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libm-2.19.so...done.
done.
Loaded symbols for /lib/x86_64-linux-gnu/libm.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.19.so...done.
done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x00007f26034c28f3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:81
81 ../sysdeps/unix/syscall-template.S: 没有那个文件或目录.
(gdb) py-bt
Traceback (most recent call first):
File "/home/ryefccd/env/server18/debug/bar.py", line 6, in is_num
time.sleep(30)
File "start.py", line 11, in <module>
print("tmp1 is num: %s by bar.is_num func" % bar.is_num(tmp1))
(gdb) py-locals
x = '123'
start = 48
end = 57
flag = True
time = <module at remote 0x7f2603a5c548>
参考链接¶
索引与高级查询¶
这部分内容由曹佳豪已经由佳豪分享. 之后会整理成 rst 文档, 便于后续的维护和更新.
2018-05-25 gt-day 问题答案参考–pandas¶
[1]:
import pandas as pd
[4]:
import io
datas_csv = io.StringIO("""
学年,学号,科目,分数
2011-2012,1,Chinese,97
2011-2012,2,Chinese,80
2011-2012,3,Chinese,85
2011-2012,4,Chinese,86
2011-2012,5,Chinese,91
2011-2012,6,Chinese,79
2011-2012,7,Chinese,91
2011-2012,1,English,70
2011-2012,2,English,80
2011-2012,3,English,94
2011-2012,4,English,72
2011-2012,5,English,94
2011-2012,6,English,96
2011-2012,7,English,77
2011-2012,1,math,72
2011-2012,2,math,90
2011-2012,3,math,89
2011-2012,5,math,72
2011-2012,6,math,91
2011-2012,7,math,66
2011-2012,4,math,60
2012-2013,6,Chinese,95
2012-2013,1,Chinese,85
2012-2013,7,Chinese,83
2012-2013,3,Chinese,78
2012-2013,2,Chinese,76
2012-2013,5,Chinese,76
2012-2013,4,Chinese,72
2012-2013,5,English,95
2012-2013,2,English,91
2012-2013,6,English,90
2012-2013,4,English,86
2012-2013,3,English,78
2012-2013,1,English,72
2012-2013,7,English,72
2012-2013,5,math,94
2012-2013,7,math,94
2012-2013,3,math,92
2012-2013,4,math,89
2012-2013,1,math,88
2012-2013,2,math,76
2012-2013,6,math,70
""")
[5]:
df = pd.read_csv(datas_csv)
df
[5]:
学年 | 学号 | 科目 | 分数 | |
---|---|---|---|---|
0 | 2011-2012 | 1 | Chinese | 97 |
1 | 2011-2012 | 2 | Chinese | 80 |
2 | 2011-2012 | 3 | Chinese | 85 |
3 | 2011-2012 | 4 | Chinese | 86 |
4 | 2011-2012 | 5 | Chinese | 91 |
5 | 2011-2012 | 6 | Chinese | 79 |
6 | 2011-2012 | 7 | Chinese | 91 |
7 | 2011-2012 | 1 | English | 70 |
8 | 2011-2012 | 2 | English | 80 |
9 | 2011-2012 | 3 | English | 94 |
10 | 2011-2012 | 4 | English | 72 |
11 | 2011-2012 | 5 | English | 94 |
12 | 2011-2012 | 6 | English | 96 |
13 | 2011-2012 | 7 | English | 77 |
14 | 2011-2012 | 1 | math | 72 |
15 | 2011-2012 | 2 | math | 90 |
16 | 2011-2012 | 3 | math | 89 |
17 | 2011-2012 | 5 | math | 72 |
18 | 2011-2012 | 6 | math | 91 |
19 | 2011-2012 | 7 | math | 66 |
20 | 2011-2012 | 4 | math | 60 |
21 | 2012-2013 | 6 | Chinese | 95 |
22 | 2012-2013 | 1 | Chinese | 85 |
23 | 2012-2013 | 7 | Chinese | 83 |
24 | 2012-2013 | 3 | Chinese | 78 |
25 | 2012-2013 | 2 | Chinese | 76 |
26 | 2012-2013 | 5 | Chinese | 76 |
27 | 2012-2013 | 4 | Chinese | 72 |
28 | 2012-2013 | 5 | English | 95 |
29 | 2012-2013 | 2 | English | 91 |
30 | 2012-2013 | 6 | English | 90 |
31 | 2012-2013 | 4 | English | 86 |
32 | 2012-2013 | 3 | English | 78 |
33 | 2012-2013 | 1 | English | 72 |
34 | 2012-2013 | 7 | English | 72 |
35 | 2012-2013 | 5 | math | 94 |
36 | 2012-2013 | 7 | math | 94 |
37 | 2012-2013 | 3 | math | 92 |
38 | 2012-2013 | 4 | math | 89 |
39 | 2012-2013 | 1 | math | 88 |
40 | 2012-2013 | 2 | math | 76 |
41 | 2012-2013 | 6 | math | 70 |
[3]:
# 问题1: 展示用户每一门课程的历史最高分数 (长格式展示)
df.groupby(['学号', '科目'])['分数'].max()
[3]:
学号 科目
1 Chinese 97
English 72
math 88
2 Chinese 80
English 91
math 90
3 Chinese 85
English 94
math 92
4 Chinese 86
English 86
math 89
5 Chinese 91
English 95
math 94
6 Chinese 95
English 96
math 91
7 Chinese 91
English 77
math 94
Name: 分数, dtype: int64
[4]:
# 问题1: 展示用户每一门课程的历史最高分数 (宽格式(透视)展示)
pd.pivot_table(df, index='学号', columns='科目', values='分数', aggfunc=max)
[4]:
科目 | Chinese | English | math |
---|---|---|---|
学号 | |||
1 | 97 | 72 | 88 |
2 | 80 | 91 | 90 |
3 | 85 | 94 | 92 |
4 | 86 | 86 | 89 |
5 | 91 | 95 | 94 |
6 | 95 | 96 | 91 |
7 | 91 | 77 | 94 |
[5]:
# 问题2: 以学年, 学号, 科目排序
df.sort_values(['学年', '学号', '科目'])
[5]:
学年 | 学号 | 科目 | 分数 | |
---|---|---|---|---|
0 | 2011-2012 | 1 | Chinese | 97 |
7 | 2011-2012 | 1 | English | 70 |
14 | 2011-2012 | 1 | math | 72 |
1 | 2011-2012 | 2 | Chinese | 80 |
8 | 2011-2012 | 2 | English | 80 |
15 | 2011-2012 | 2 | math | 90 |
2 | 2011-2012 | 3 | Chinese | 85 |
9 | 2011-2012 | 3 | English | 94 |
16 | 2011-2012 | 3 | math | 89 |
3 | 2011-2012 | 4 | Chinese | 86 |
10 | 2011-2012 | 4 | English | 72 |
20 | 2011-2012 | 4 | math | 60 |
4 | 2011-2012 | 5 | Chinese | 91 |
11 | 2011-2012 | 5 | English | 94 |
17 | 2011-2012 | 5 | math | 72 |
5 | 2011-2012 | 6 | Chinese | 79 |
12 | 2011-2012 | 6 | English | 96 |
18 | 2011-2012 | 6 | math | 91 |
6 | 2011-2012 | 7 | Chinese | 91 |
13 | 2011-2012 | 7 | English | 77 |
19 | 2011-2012 | 7 | math | 66 |
22 | 2012-2013 | 1 | Chinese | 85 |
33 | 2012-2013 | 1 | English | 72 |
39 | 2012-2013 | 1 | math | 88 |
25 | 2012-2013 | 2 | Chinese | 76 |
29 | 2012-2013 | 2 | English | 91 |
40 | 2012-2013 | 2 | math | 76 |
24 | 2012-2013 | 3 | Chinese | 78 |
32 | 2012-2013 | 3 | English | 78 |
37 | 2012-2013 | 3 | math | 92 |
27 | 2012-2013 | 4 | Chinese | 72 |
31 | 2012-2013 | 4 | English | 86 |
38 | 2012-2013 | 4 | math | 89 |
26 | 2012-2013 | 5 | Chinese | 76 |
28 | 2012-2013 | 5 | English | 95 |
35 | 2012-2013 | 5 | math | 94 |
21 | 2012-2013 | 6 | Chinese | 95 |
30 | 2012-2013 | 6 | English | 90 |
41 | 2012-2013 | 6 | math | 70 |
23 | 2012-2013 | 7 | Chinese | 83 |
34 | 2012-2013 | 7 | English | 72 |
36 | 2012-2013 | 7 | math | 94 |
[6]:
# 问题3: 以学年, 科目, 学号排序
df.sort_values(['学年', '科目', '学号'])
[6]:
学年 | 学号 | 科目 | 分数 | |
---|---|---|---|---|
0 | 2011-2012 | 1 | Chinese | 97 |
1 | 2011-2012 | 2 | Chinese | 80 |
2 | 2011-2012 | 3 | Chinese | 85 |
3 | 2011-2012 | 4 | Chinese | 86 |
4 | 2011-2012 | 5 | Chinese | 91 |
5 | 2011-2012 | 6 | Chinese | 79 |
6 | 2011-2012 | 7 | Chinese | 91 |
7 | 2011-2012 | 1 | English | 70 |
8 | 2011-2012 | 2 | English | 80 |
9 | 2011-2012 | 3 | English | 94 |
10 | 2011-2012 | 4 | English | 72 |
11 | 2011-2012 | 5 | English | 94 |
12 | 2011-2012 | 6 | English | 96 |
13 | 2011-2012 | 7 | English | 77 |
14 | 2011-2012 | 1 | math | 72 |
15 | 2011-2012 | 2 | math | 90 |
16 | 2011-2012 | 3 | math | 89 |
20 | 2011-2012 | 4 | math | 60 |
17 | 2011-2012 | 5 | math | 72 |
18 | 2011-2012 | 6 | math | 91 |
19 | 2011-2012 | 7 | math | 66 |
22 | 2012-2013 | 1 | Chinese | 85 |
25 | 2012-2013 | 2 | Chinese | 76 |
24 | 2012-2013 | 3 | Chinese | 78 |
27 | 2012-2013 | 4 | Chinese | 72 |
26 | 2012-2013 | 5 | Chinese | 76 |
21 | 2012-2013 | 6 | Chinese | 95 |
23 | 2012-2013 | 7 | Chinese | 83 |
33 | 2012-2013 | 1 | English | 72 |
29 | 2012-2013 | 2 | English | 91 |
32 | 2012-2013 | 3 | English | 78 |
31 | 2012-2013 | 4 | English | 86 |
28 | 2012-2013 | 5 | English | 95 |
30 | 2012-2013 | 6 | English | 90 |
34 | 2012-2013 | 7 | English | 72 |
39 | 2012-2013 | 1 | math | 88 |
40 | 2012-2013 | 2 | math | 76 |
37 | 2012-2013 | 3 | math | 92 |
38 | 2012-2013 | 4 | math | 89 |
35 | 2012-2013 | 5 | math | 94 |
41 | 2012-2013 | 6 | math | 70 |
36 | 2012-2013 | 7 | math | 94 |
[7]:
# 问题4: 以学年, 科目, 分数排序
df.sort_values(['学年', '科目', '分数'])
[7]:
学年 | 学号 | 科目 | 分数 | |
---|---|---|---|---|
5 | 2011-2012 | 6 | Chinese | 79 |
1 | 2011-2012 | 2 | Chinese | 80 |
2 | 2011-2012 | 3 | Chinese | 85 |
3 | 2011-2012 | 4 | Chinese | 86 |
4 | 2011-2012 | 5 | Chinese | 91 |
6 | 2011-2012 | 7 | Chinese | 91 |
0 | 2011-2012 | 1 | Chinese | 97 |
7 | 2011-2012 | 1 | English | 70 |
10 | 2011-2012 | 4 | English | 72 |
13 | 2011-2012 | 7 | English | 77 |
8 | 2011-2012 | 2 | English | 80 |
9 | 2011-2012 | 3 | English | 94 |
11 | 2011-2012 | 5 | English | 94 |
12 | 2011-2012 | 6 | English | 96 |
20 | 2011-2012 | 4 | math | 60 |
19 | 2011-2012 | 7 | math | 66 |
14 | 2011-2012 | 1 | math | 72 |
17 | 2011-2012 | 5 | math | 72 |
16 | 2011-2012 | 3 | math | 89 |
15 | 2011-2012 | 2 | math | 90 |
18 | 2011-2012 | 6 | math | 91 |
27 | 2012-2013 | 4 | Chinese | 72 |
25 | 2012-2013 | 2 | Chinese | 76 |
26 | 2012-2013 | 5 | Chinese | 76 |
24 | 2012-2013 | 3 | Chinese | 78 |
23 | 2012-2013 | 7 | Chinese | 83 |
22 | 2012-2013 | 1 | Chinese | 85 |
21 | 2012-2013 | 6 | Chinese | 95 |
33 | 2012-2013 | 1 | English | 72 |
34 | 2012-2013 | 7 | English | 72 |
32 | 2012-2013 | 3 | English | 78 |
31 | 2012-2013 | 4 | English | 86 |
30 | 2012-2013 | 6 | English | 90 |
29 | 2012-2013 | 2 | English | 91 |
28 | 2012-2013 | 5 | English | 95 |
41 | 2012-2013 | 6 | math | 70 |
40 | 2012-2013 | 2 | math | 76 |
39 | 2012-2013 | 1 | math | 88 |
38 | 2012-2013 | 4 | math | 89 |
37 | 2012-2013 | 3 | math | 92 |
35 | 2012-2013 | 5 | math | 94 |
36 | 2012-2013 | 7 | math | 94 |
[ ]:
2018-05-25 gt-day 问题答案参考–Spark¶
[1]:
from pyspark.sql import SparkSession
from pyspark.sql import functions as F
[2]:
spark = SparkSession.builder.getOrCreate()
[3]:
# 读取源数据
df = spark.read.csv('/Users/cjh/Downloads/demo_data.csv', header=True)
df.show(n=100)
+---------+----+-------+----+
| 学年|学号| 科目|分数|
+---------+----+-------+----+
|2011-2012| 1|Chinese| 97|
|2011-2012| 2|Chinese| 80|
|2011-2012| 3|Chinese| 85|
|2011-2012| 4|Chinese| 86|
|2011-2012| 5|Chinese| 91|
|2011-2012| 6|Chinese| 79|
|2011-2012| 7|Chinese| 91|
|2011-2012| 1|English| 70|
|2011-2012| 2|English| 80|
|2011-2012| 3|English| 94|
|2011-2012| 4|English| 72|
|2011-2012| 5|English| 94|
|2011-2012| 6|English| 96|
|2011-2012| 7|English| 77|
|2011-2012| 1| math| 72|
|2011-2012| 2| math| 90|
|2011-2012| 3| math| 89|
|2011-2012| 5| math| 72|
|2011-2012| 6| math| 91|
|2011-2012| 7| math| 66|
|2011-2012| 4| math| 60|
|2012-2013| 6|Chinese| 95|
|2012-2013| 1|Chinese| 85|
|2012-2013| 7|Chinese| 83|
|2012-2013| 3|Chinese| 78|
|2012-2013| 2|Chinese| 76|
|2012-2013| 5|Chinese| 76|
|2012-2013| 4|Chinese| 72|
|2012-2013| 5|English| 95|
|2012-2013| 2|English| 91|
|2012-2013| 6|English| 90|
|2012-2013| 4|English| 86|
|2012-2013| 3|English| 78|
|2012-2013| 1|English| 72|
|2012-2013| 7|English| 72|
|2012-2013| 5| math| 94|
|2012-2013| 7| math| 94|
|2012-2013| 3| math| 92|
|2012-2013| 4| math| 89|
|2012-2013| 1| math| 88|
|2012-2013| 2| math| 76|
|2012-2013| 6| math| 70|
+---------+----+-------+----+
[4]:
# spark Dataframe的排序默认为升序
# 学生每一门课程的历史最高分数(长格式)
df.groupBy('学号', '科目').agg(F.max('分数').alias('最高分')).sort('学号').show(n=100)
+----+-------+------+
|学号| 科目|最高分|
+----+-------+------+
| 1|English| 72|
| 1| math| 88|
| 1|Chinese| 97|
| 2|Chinese| 80|
| 2|English| 91|
| 2| math| 90|
| 3| math| 92|
| 3|Chinese| 85|
| 3|English| 94|
| 4|English| 86|
| 4| math| 89|
| 4|Chinese| 86|
| 5|Chinese| 91|
| 5| math| 94|
| 5|English| 95|
| 6|Chinese| 95|
| 6|English| 96|
| 6| math| 91|
| 7|Chinese| 91|
| 7|English| 77|
| 7| math| 94|
+----+-------+------+
[5]:
# 学生每一门课程的历史最高分数(宽格式)
df.groupBy('学号').pivot('科目', values=['Chinese', 'English', 'math']).agg(F.max('分数')).sort('学号').show()
+----+-------+-------+----+
|学号|Chinese|English|math|
+----+-------+-------+----+
| 1| 97| 72| 88|
| 2| 80| 91| 90|
| 3| 85| 94| 92|
| 4| 86| 86| 89|
| 5| 91| 95| 94|
| 6| 95| 96| 91|
| 7| 91| 77| 94|
+----+-------+-------+----+
[6]:
# 以学年, 学号, 科目排序
df.sort('学年', '学号', '科目').show(n=100)
+---------+----+-------+----+
| 学年|学号| 科目|分数|
+---------+----+-------+----+
|2011-2012| 1|Chinese| 97|
|2011-2012| 1|English| 70|
|2011-2012| 1| math| 72|
|2011-2012| 2|Chinese| 80|
|2011-2012| 2|English| 80|
|2011-2012| 2| math| 90|
|2011-2012| 3|Chinese| 85|
|2011-2012| 3|English| 94|
|2011-2012| 3| math| 89|
|2011-2012| 4|Chinese| 86|
|2011-2012| 4|English| 72|
|2011-2012| 4| math| 60|
|2011-2012| 5|Chinese| 91|
|2011-2012| 5|English| 94|
|2011-2012| 5| math| 72|
|2011-2012| 6|Chinese| 79|
|2011-2012| 6|English| 96|
|2011-2012| 6| math| 91|
|2011-2012| 7|Chinese| 91|
|2011-2012| 7|English| 77|
|2011-2012| 7| math| 66|
|2012-2013| 1|Chinese| 85|
|2012-2013| 1|English| 72|
|2012-2013| 1| math| 88|
|2012-2013| 2|Chinese| 76|
|2012-2013| 2|English| 91|
|2012-2013| 2| math| 76|
|2012-2013| 3|Chinese| 78|
|2012-2013| 3|English| 78|
|2012-2013| 3| math| 92|
|2012-2013| 4|Chinese| 72|
|2012-2013| 4|English| 86|
|2012-2013| 4| math| 89|
|2012-2013| 5|Chinese| 76|
|2012-2013| 5|English| 95|
|2012-2013| 5| math| 94|
|2012-2013| 6|Chinese| 95|
|2012-2013| 6|English| 90|
|2012-2013| 6| math| 70|
|2012-2013| 7|Chinese| 83|
|2012-2013| 7|English| 72|
|2012-2013| 7| math| 94|
+---------+----+-------+----+
[7]:
# 以学年, 科目, 学号排序
df.sort('学年', '科目', '学号').show(n=100)
+---------+----+-------+----+
| 学年|学号| 科目|分数|
+---------+----+-------+----+
|2011-2012| 1|Chinese| 97|
|2011-2012| 2|Chinese| 80|
|2011-2012| 3|Chinese| 85|
|2011-2012| 4|Chinese| 86|
|2011-2012| 5|Chinese| 91|
|2011-2012| 6|Chinese| 79|
|2011-2012| 7|Chinese| 91|
|2011-2012| 1|English| 70|
|2011-2012| 2|English| 80|
|2011-2012| 3|English| 94|
|2011-2012| 4|English| 72|
|2011-2012| 5|English| 94|
|2011-2012| 6|English| 96|
|2011-2012| 7|English| 77|
|2011-2012| 1| math| 72|
|2011-2012| 2| math| 90|
|2011-2012| 3| math| 89|
|2011-2012| 4| math| 60|
|2011-2012| 5| math| 72|
|2011-2012| 6| math| 91|
|2011-2012| 7| math| 66|
|2012-2013| 1|Chinese| 85|
|2012-2013| 2|Chinese| 76|
|2012-2013| 3|Chinese| 78|
|2012-2013| 4|Chinese| 72|
|2012-2013| 5|Chinese| 76|
|2012-2013| 6|Chinese| 95|
|2012-2013| 7|Chinese| 83|
|2012-2013| 1|English| 72|
|2012-2013| 2|English| 91|
|2012-2013| 3|English| 78|
|2012-2013| 4|English| 86|
|2012-2013| 5|English| 95|
|2012-2013| 6|English| 90|
|2012-2013| 7|English| 72|
|2012-2013| 1| math| 88|
|2012-2013| 2| math| 76|
|2012-2013| 3| math| 92|
|2012-2013| 4| math| 89|
|2012-2013| 5| math| 94|
|2012-2013| 6| math| 70|
|2012-2013| 7| math| 94|
+---------+----+-------+----+
[28]:
# 以学年, 科目, 分数
df.sort('学年', '科目', '分数').show(n=100)
+---------+----+-------+----+
| 学年|学号| 科目|分数|
+---------+----+-------+----+
|2011-2012| 6|Chinese| 79|
|2011-2012| 2|Chinese| 80|
|2011-2012| 3|Chinese| 85|
|2011-2012| 4|Chinese| 86|
|2011-2012| 5|Chinese| 91|
|2011-2012| 7|Chinese| 91|
|2011-2012| 1|Chinese| 97|
|2011-2012| 1|English| 70|
|2011-2012| 4|English| 72|
|2011-2012| 7|English| 77|
|2011-2012| 2|English| 80|
|2011-2012| 3|English| 94|
|2011-2012| 5|English| 94|
|2011-2012| 6|English| 96|
|2011-2012| 4| math| 60|
|2011-2012| 7| math| 66|
|2011-2012| 5| math| 72|
|2011-2012| 1| math| 72|
|2011-2012| 3| math| 89|
|2011-2012| 2| math| 90|
|2011-2012| 6| math| 91|
|2012-2013| 4|Chinese| 72|
|2012-2013| 2|Chinese| 76|
|2012-2013| 5|Chinese| 76|
|2012-2013| 3|Chinese| 78|
|2012-2013| 7|Chinese| 83|
|2012-2013| 1|Chinese| 85|
|2012-2013| 6|Chinese| 95|
|2012-2013| 1|English| 72|
|2012-2013| 7|English| 72|
|2012-2013| 3|English| 78|
|2012-2013| 4|English| 86|
|2012-2013| 6|English| 90|
|2012-2013| 2|English| 91|
|2012-2013| 5|English| 95|
|2012-2013| 6| math| 70|
|2012-2013| 2| math| 76|
|2012-2013| 1| math| 88|
|2012-2013| 4| math| 89|
|2012-2013| 3| math| 92|
|2012-2013| 5| math| 94|
|2012-2013| 7| math| 94|
+---------+----+-------+----+
[ ]: