Let_go
人生如棋,我愿为卒。行动虽慢,可谁曾见我后退一步。

python自动化生成Word报告

2019/04/29 python脚本

python自动化生成Word报告


前言

这里简单记录下如何使用python脚本去解析一个xml文件并把解析出来的数据以变量的形式插入到docx文件。这样做的主要原因是在工作中需要把某些工具的输出结果(.xml文件)转为docx文件并以报告的形式输出。一般这种报告都存在某种固定的格式,只是一些数据是需要动态填入的。所以我们可以把这些需要动态填入的字段以变量的形式表示,然后通过python脚本解析xml文件并把数据插入到以变量表示的字段处,最后生成最终文档。这样我们就不用做一些无趣的复制粘贴工作,大大提高了工作效率。

XML解析

首先第一步,通过python如何解析一个xml文件?
目前我知道的有2种方法。
1.通过python标准库SAX(simple API for XML)解析器解析。SAX用事件驱动模型,通过在解析XML的过程中触发一个个事件并调用用户定义的回调函数来处理XML文件,这种方法解析XML的好处就是以流式读取XML文件比较快,并且占内存少,但需要用户定义回到函数。
2.通过DOM(Document Object Model)方式将XML文件载入到内存中,并解析为一棵树的形式,通过对树的操作来实现解析XML文件。这种方式由于需要把XML数据全部映射到内存中,所以会比较慢,并且比较耗内存。但感觉比SAX要灵活。

使用SAX解析XML

1.SAX是基于事件驱动实现的,利用SAX解析XML文档会涉及到两个部分:解析器和事件处理器。
解析器负责解析XML文件,并在关键时候向事件处理器发送时间比如在元素开始或元素结束时。
时间处理器则负责对接收到的事件做响应,主要是调用用户注册的回调函数。
以下列出一些比较关键的函数:
1.文档启动时:startDocument()方法
2.到达文档结尾时:endDocument()方法
3.遇到XML开始标签时:startElement(name,attrs)方法
4.遇到XML结尾标签时:endElement(name)方法
5.创建新的解析器对象:make_parser()方法
6.创建一个SAX解析器并解析XML文档:parser()方法
实例:
目标XML文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
<?xml version="1.0" encoding="UTF-8"?>
<results version="2">
<errors>
<error id="arrayIndexOutOfBounds" severity="error" msg="Array &apos;a[10]&apos; accessed at index 10, which is out of bounds." verbose="Array &apos;a[10]&apos; accessed at index 10, which is out of bounds." cwe="119">
<location file=" /cpp/arrayindexoutofbounds.cpp" line="11" info="Array index out of bounds"/>
<location file=" /cpp/arrayindexoutofbounds.cpp" line="7" info="Assignment &apos;max=10&apos;, assigned value is 10"/>
<symbol>a</symbol>
</error>
<error id="returnDanglingLifetime" severity="error" msg="Returning pointer to local variable &apos;sz&apos; that will be invalid when returning." verbose="Returning pointer to local variable &apos;sz&apos; that will be invalid when returning." cwe="562">
<location file=" /cpp/autovar.cpp" line="5"/>
<location file=" /cpp/autovar.cpp" line="3" info="Variable created here."/>
<location file=" /cpp/autovar.cpp" line="5" info="Array decayed to pointer here."/>
</error>
<error id="bufferAccessOutOfBounds" severity="error" msg="Buffer is accessed out of bounds: sz" verbose="Buffer is accessed out of bounds: sz">
<location file=" /cpp/bufferaccessoutofbounds.cpp" line="5"/>
<symbol>sz</symbol>
</error>
<error id="nullPointer" severity="error" msg="Null pointer dereference" verbose="Null pointer dereference" cwe="476">
<location file=" /cpp/nonthreadsafefunc.cpp" line="5" info="Null pointer dereference"/>
</error>
<error id="resourceLeak" severity="error" msg="Resource leak: pFile" verbose="Resource leak: pFile" cwe="775">
<location file=" /cpp/resourceleak.cpp" line="8"/>
<symbol>pFile</symbol>
</error>
<error id="stlOutOfBounds" severity="error" msg="When ii==foo.size(), foo[ii] is out of bounds." verbose="When ii==foo.size(), foo[ii] is out of bounds." cwe="788">
<location file=" /cpp/stloutofbounds.cpp" line="7"/>
<symbol>foo</symbol>
</error>
</errors>
</results>

解析脚本:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
#!/usr/bin/python
# -*- coding: UTF-8 -*-
import xml.sax # 需要先导入xml.sax库
class XMLHandler(xml.sax.ContentHandler):
def __init__(self):
self.CurrentData = ""
self.location = ""
self.symbol = ""
def startElement(self,tag,attributes):
self.CurrentData = tag
if tag == "error":
print "\n\n\n********** error **********"
XMLHandler.echoInfo(self,"id",attributes,"id:")
XMLHandler.echoInfo(self,"severity",attributes,"severity:")
XMLHandler.echoInfo(self,"msg",attributes,"msg:")
XMLHandler.echoInfo(self,"verbose",attributes,"verbose:")
XMLHandler.echoInfo(self,"cwe",attributes,"cwe:")
if tag == "location":
XMLHandler.echoInfo(self,"file",attributes,"location.file:")
XMLHandler.echoInfo(self,"line",attributes,"location.line:")
XMLHandler.echoInfo(self,"info",attributes,"location.info:")
# def characters(self,content):
# if self.CurrentData == "location":
# file = content["file"]
# print "location.file:",file
# elif self.CurrentData == "symbol":
# self.symbol = content
# def endElement(self,tag):
# if self.CurrentData == "location":
# print "location:",self.location
# elif self.CurrentData == "symbol":
# print "symbol:",self.symbol
def echoInfo(self,tag,attributes,str):
if tag in attributes:
data = attributes[tag]
print str,data
else:
print str + " None"
if( __name__ == "__main__"):
parser = xml.sax.make_parser() # 创建一个XMLReader
parser.setFeature(xml.sax.handler.feature_namespaces,0)
Handler = XMLHandler()
parser.setContentHandler(Handler)
parser.parse("log.xml") # 解析目标xml

输出

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
********** error **********
id: arrayIndexOutOfBounds
severity: error
msg: Array 'a[10]' accessed at index 10, which is out of bounds.
verbose: Array 'a[10]' accessed at index 10, which is out of bounds.
cwe: 119
location.file: /cpp/arrayindexoutofbounds.cpp
location.line: 11
location.info: Array index out of bounds
location.file: /cpp/arrayindexoutofbounds.cpp
location.line: 7
location.info: Assignment 'max=10', assigned value is 10
[................................skip...................................]
********** error **********
id: stlOutOfBounds
severity: error
msg: When ii==foo.size(), foo[ii] is out of bounds.
verbose: When ii==foo.size(), foo[ii] is out of bounds.
cwe: 788
location.file: /cpp/stloutofbounds.cpp
location.line: 7
location.info: None

使用DOM解析XML

文本对象模型(Document Object Model):一个DOM解析器在解析XML文件时需要把整个XML文件一次性载入到内存中,把文档中所有元素都保存在内存中的一棵树结构中,后续可以通过DOM提供的API去解析这颗树,通过不同的函数来读取或修改文档中的内容,这样做的一个缺点就是比较耗内存。
实例:
同样的目标,不同的解析

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#!/usr/bin/python
# -*- coding: UTF-8 -*-
from xml.dom.minidom import parse
import xml.dom.minidom
# 通过minidom解析器打开目标XML文件
DOMTree = xml.dom.minidom.parse("log.xml")
collection = DOMTree.documentElement
if collection.hasAttribute("errors"):
print "Root element: %s" % collection.getAttribute("errors")
# 获取所有元素
error = collection.getElementsByTagName("error")
for err in error:
print "********* error **********"
if err.hasAttribute("id"):
print "id: %s" % err.getAttribute("id")
if err.hasAttribute("severity"):
print "severity: %s" % err.getAttribute("severity")
if err.hasAttribute("msg"):
print "msg: %s" % err.getAttribute("msg")
if err.hasAttribute("verbose"):
print "verbose: %s" % err.getAttribute("verbose")
if err.hasAttribute("cwe"):
print "cwe: %s" % err.getAttribute("cwe")
location = err.getElementsByTagName("location")
for loc in location:
if loc.hasAttribute("file"):
print "file: %s" % loc.getAttribute("file")
else:
print "file: " + 'null'
if loc.hasAttribute("line"):
print "line: %s" % loc.getAttribute("line")
else:
print "line: " + 'null'
if loc.hasAttribute("info"):
print "info: %s" % loc.getAttribute("info")
else:
print "info: " + 'null'

输出

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
********* error **********
id: arrayIndexOutOfBounds
severity: error
msg: Array 'a[10]' accessed at index 10, which is out of bounds.
verbose: Array 'a[10]' accessed at index 10, which is out of bounds.
cwe: 119
file: /cpp/arrayindexoutofbounds.cpp
line: 11
info: Array index out of bounds
file: /cpp/arrayindexoutofbounds.cpp
line: 7
info: Assignment 'max=10', assigned value is 10
[..................................skip......................................]
********* error **********
id: stlOutOfBounds
severity: error
msg: When ii==foo.size(), foo[ii] is out of bounds.
verbose: When ii==foo.size(), foo[ii] is out of bounds.
cwe: 788
file: /cpp/stloutofbounds.cpp
line: 7
info: null

Word报告自动生成

这里需要使用python的一个库(docxtpl),该库可以按指定的word模板填充内容设定好的符号字段,一般用来把一些工具跑出的结果填充到word模板中完成工作报告的生成。
首先通过以下命令安装该第三方库:

1
pip install docxtpl

使用:
WORD模板文件:

Python脚本文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#!/usr/bin/python
# -*- coding: UTF-8 -*-
import time
from docxtpl import DocxTemplate, InlineImage
class CreateDocx():
def __init__(self,TemplateFileName,NewFileName):
self.TemplateFileName = TemplateFileName
self.NewFileName = NewFileName
def post(self):
tpl = DocxTemplate(self.TemplateFileName) # 加载模板文件
localtime = time.asctime( time.localtime(time.time()))
context = {'date_1':localtime,'version_1':'v1.0','total_1':'0xFFFF','error_1':'0xFFFF','warning_1':'0xFFFF','style_1':'0xFFFF'}
tpl.render(context) # 填充数据
tpl.save(self.NewFileName + '_' + str(time.time()) + '.docx') # 保存目标文件
def writeData(self):
pass
if( __name__ == "__main__"):
Docx = CreateDocx('./CodeScan.docx','CodeScan');
Docx.post();

WORD结果文件:

总结

简单记录下,不然感觉自己每天都不知道干了些啥。唉。。。。。。。。。
完整的SAX API链接:https://docs.python.org/3/library/xml.sax.html
完整的DOM API链接:https://docs.python.org/3/library/xml.dom.html

Author: Let_go

Link: http://github.com/2019/04/29/python自动化生成Word报告/

Copyright: All articles in this blog are licensed under CC BY-NC-SA 3.0 unless stating additionally.

< PreviousPost
Android-Root总结
NextPost >
代码审计之CppCheck
CATALOG
  1. 1. python自动化生成Word报告
    1. 1.1. 前言
    2. 1.2. XML解析
      1. 1.2.1. 使用SAX解析XML
      2. 1.2.2. 使用DOM解析XML
    3. 1.3. Word报告自动生成
    4. 1.4. 总结