python自动化生成Word报告  
前言 这里简单记录下如何使用python脚本去解析一个xml文件并把解析出来的数据以变量的形式插入到docx文件。这样做的主要原因是在工作中需要把某些工具的输出结果(.xml文件)转为docx文件并以报告的形式输出。一般这种报告都存在某种固定的格式,只是一些数据是需要动态填入的。所以我们可以把这些需要动态填入的字段以变量的形式表示,然后通过python脚本解析xml文件并把数据插入到以变量表示的字段处,最后生成最终文档。这样我们就不用做一些无趣的复制粘贴工作,大大提高了工作效率。
XML解析 首先第一步,通过python如何解析一个xml文件? 目前我知道的有2种方法。 1.通过python标准库SAX(simple API for XML)解析器解析。SAX用事件驱动模型,通过在解析XML的过程中触发一个个事件并调用用户定义的回调函数来处理XML文件,这种方法解析XML的好处就是以流式读取XML文件比较快,并且占内存少,但需要用户定义回到函数。 2.通过DOM(Document Object Model)方式将XML文件载入到内存中,并解析为一棵树的形式,通过对树的操作来实现解析XML文件。这种方式由于需要把XML数据全部映射到内存中,所以会比较慢,并且比较耗内存。但感觉比SAX要灵活。
使用SAX解析XML 1.SAX是基于事件驱动实现的,利用SAX解析XML文档会涉及到两个部分:解析器和事件处理器。 解析器负责解析XML文件,并在关键时候向事件处理器发送时间比如在元素开始或元素结束时。 时间处理器则负责对接收到的事件做响应,主要是调用用户注册的回调函数。 以下列出一些比较关键的函数: 1.文档启动时:startDocument()方法 2.到达文档结尾时:endDocument()方法 3.遇到XML开始标签时:startElement(name,attrs)方法 4.遇到XML结尾标签时:endElement(name)方法 5.创建新的解析器对象:make_parser()方法 6.创建一个SAX解析器并解析XML文档:parser()方法 实例: 目标XML文件1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
<?xml version="1.0"  encoding="UTF-8" ?>
<results version="2" >
    <errors>
        <error id="arrayIndexOutOfBounds"  severity="error"  msg="Array 'a[10]' accessed at index 10, which is out of bounds."  verbose="Array 'a[10]' accessed at index 10, which is out of bounds."  cwe="119" >
            <location file=" /cpp/arrayindexoutofbounds.cpp"  line="11"  info="Array index out of bounds" />
            <location file=" /cpp/arrayindexoutofbounds.cpp"  line="7"  info="Assignment 'max=10', assigned value is 10" />
            <symbol>a</symbol>
        </error>
        <error id="returnDanglingLifetime"  severity="error"  msg="Returning pointer to local variable 'sz' that will be invalid when returning."  verbose="Returning pointer to local variable 'sz' that will be invalid when returning."  cwe="562" >
            <location file=" /cpp/autovar.cpp"  line="5" />
            <location file=" /cpp/autovar.cpp"  line="3"  info="Variable created here." />
            <location file=" /cpp/autovar.cpp"  line="5"  info="Array decayed to pointer here." />
        </error>
        <error id="bufferAccessOutOfBounds"  severity="error"  msg="Buffer is accessed out of bounds: sz"  verbose="Buffer is accessed out of bounds: sz" >
            <location file=" /cpp/bufferaccessoutofbounds.cpp"  line="5" />
            <symbol>sz</symbol>
        </error>
        <error id="nullPointer"  severity="error"  msg="Null pointer dereference"  verbose="Null pointer dereference"  cwe="476" >
            <location file=" /cpp/nonthreadsafefunc.cpp"  line="5"  info="Null pointer dereference" />
        </error>
        <error id="resourceLeak"  severity="error"  msg="Resource leak: pFile"  verbose="Resource leak: pFile"  cwe="775" >
            <location file=" /cpp/resourceleak.cpp"  line="8" />
            <symbol>pFile</symbol>
        </error>
        <error id="stlOutOfBounds"  severity="error"  msg="When ii==foo.size(), foo[ii] is out of bounds."  verbose="When ii==foo.size(), foo[ii] is out of bounds."  cwe="788" >
            <location file=" /cpp/stloutofbounds.cpp"  line="7" />
            <symbol>foo</symbol>
        </error>
    </errors>
</results>
解析脚本:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
#!/usr/bin/python
# -*- coding: UTF-8 -*-
import xml.sax                          # 需要先导入xml.sax库
class XMLHandler(xml.sax.ContentHandler):
	def __init__(self):
		self.CurrentData = ""
		self.location = ""
		self.symbol = ""
		
	def startElement(self,tag,attributes):
		self.CurrentData = tag
		if tag == "error":
			print "\n\n\n********** error **********"
			XMLHandler.echoInfo(self,"id",attributes,"id:")
			XMLHandler.echoInfo(self,"severity",attributes,"severity:")
			XMLHandler.echoInfo(self,"msg",attributes,"msg:")
			XMLHandler.echoInfo(self,"verbose",attributes,"verbose:")
			XMLHandler.echoInfo(self,"cwe",attributes,"cwe:")
			
		if tag == "location":
			XMLHandler.echoInfo(self,"file",attributes,"location.file:")
			XMLHandler.echoInfo(self,"line",attributes,"location.line:")
			XMLHandler.echoInfo(self,"info",attributes,"location.info:")			
#	def characters(self,content):
#		if self.CurrentData == "location":
#			file = content["file"]
#			print "location.file:",file
#		elif self.CurrentData == "symbol":
#			self.symbol = content
#	def endElement(self,tag):
#		if self.CurrentData == "location":
#			print "location:",self.location
#		elif self.CurrentData == "symbol":
#			print "symbol:",self.symbol
	def echoInfo(self,tag,attributes,str):
		if tag in attributes:
			data = attributes[tag]
			print str,data
		else:
			print str + " None"
if( __name__ == "__main__"):
	parser = xml.sax.make_parser()          # 创建一个XMLReader
	parser.setFeature(xml.sax.handler.feature_namespaces,0)
	Handler = XMLHandler()
	parser.setContentHandler(Handler)
	parser.parse("log.xml")					# 解析目标xml
输出1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
********** error **********
id: arrayIndexOutOfBounds
severity: error
msg: Array 'a[10]' accessed at index 10, which is out of bounds.
verbose: Array 'a[10]' accessed at index 10, which is out of bounds.
cwe: 119
location.file:  /cpp/arrayindexoutofbounds.cpp
location.line: 11
location.info: Array index out of bounds
location.file:  /cpp/arrayindexoutofbounds.cpp
location.line: 7
location.info: Assignment 'max=10', assigned value is 10
[................................skip...................................]
********** error **********
id: stlOutOfBounds
severity: error
msg: When ii==foo.size(), foo[ii] is out of bounds.
verbose: When ii==foo.size(), foo[ii] is out of bounds.
cwe: 788
location.file:  /cpp/stloutofbounds.cpp
location.line: 7
location.info: None
使用DOM解析XML 文本对象模型(Document Object Model):一个DOM解析器在解析XML文件时需要把整个XML文件一次性载入到内存中,把文档中所有元素都保存在内存中的一棵树结构中,后续可以通过DOM提供的API去解析这颗树,通过不同的函数来读取或修改文档中的内容,这样做的一个缺点就是比较耗内存。 实例: 同样的目标,不同的解析1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#!/usr/bin/python
# -*- coding: UTF-8 -*-
from xml.dom.minidom import parse
import xml.dom.minidom
# 通过minidom解析器打开目标XML文件
DOMTree = xml.dom.minidom.parse("log.xml")
collection = DOMTree.documentElement
if collection.hasAttribute("errors"):
	print "Root element: %s" % collection.getAttribute("errors")
	
# 获取所有元素
error = collection.getElementsByTagName("error")
for err in error:
	print "********* error **********"
	if err.hasAttribute("id"):
		print "id: %s" % err.getAttribute("id") 
	if err.hasAttribute("severity"):
		print "severity: %s" % err.getAttribute("severity") 
	if err.hasAttribute("msg"):
		print "msg: %s" % err.getAttribute("msg") 
	if err.hasAttribute("verbose"):
		print "verbose: %s" % err.getAttribute("verbose") 
	if err.hasAttribute("cwe"):
		print "cwe: %s" % err.getAttribute("cwe")
	location = err.getElementsByTagName("location")
	for loc in location:
		if loc.hasAttribute("file"):
			print "file: %s" % loc.getAttribute("file")
		else:
			print "file: " + 'null'
		if loc.hasAttribute("line"):
			print "line: %s" % loc.getAttribute("line")
		else:
			print "line: " + 'null'
		if loc.hasAttribute("info"):
			print "info: %s" % loc.getAttribute("info")
		else:
			print "info: " + 'null'
输出1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
********* error **********
id: arrayIndexOutOfBounds
severity: error
msg: Array 'a[10]' accessed at index 10, which is out of bounds.
verbose: Array 'a[10]' accessed at index 10, which is out of bounds.
cwe: 119
file:  /cpp/arrayindexoutofbounds.cpp
line: 11
info: Array index out of bounds
file:  /cpp/arrayindexoutofbounds.cpp
line: 7
info: Assignment 'max=10', assigned value is 10
[..................................skip......................................]
********* error **********
id: stlOutOfBounds
severity: error
msg: When ii==foo.size(), foo[ii] is out of bounds.
verbose: When ii==foo.size(), foo[ii] is out of bounds.
cwe: 788
file:  /cpp/stloutofbounds.cpp
line: 7
info: null
Word报告自动生成 这里需要使用python的一个库(docxtpl),该库可以按指定的word模板填充内容设定好的符号字段,一般用来把一些工具跑出的结果填充到word模板中完成工作报告的生成。 首先通过以下命令安装该第三方库:
使用: WORD模板文件:
Python脚本文件:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#!/usr/bin/python
# -*- coding: UTF-8 -*-
import time
from docxtpl import DocxTemplate, InlineImage
class CreateDocx():
	def __init__(self,TemplateFileName,NewFileName):
		self.TemplateFileName = TemplateFileName
		self.NewFileName = NewFileName
	
	def post(self):
		tpl = DocxTemplate(self.TemplateFileName)			# 加载模板文件
		localtime = time.asctime( time.localtime(time.time()))
		context = {'date_1':localtime,'version_1':'v1.0','total_1':'0xFFFF','error_1':'0xFFFF','warning_1':'0xFFFF','style_1':'0xFFFF'}
		tpl.render(context)						# 填充数据
		tpl.save(self.NewFileName + '_' + str(time.time()) + '.docx')	# 保存目标文件
	
	def writeData(self):
		pass
if( __name__ == "__main__"):
	Docx = CreateDocx('./CodeScan.docx','CodeScan');
	Docx.post();
WORD结果文件:
总结 简单记录下,不然感觉自己每天都不知道干了些啥。唉。。。。。。。。。 完整的SAX API链接:https://docs.python.org/3/library/xml.sax.html  完整的DOM API链接:https://docs.python.org/3/library/xml.dom.html