Python实现图片格式转换与OCR文字识别全攻略

作者：新兰2025.10.10 19:21浏览量：1

简介：本文详解如何使用Python实现图片格式转换与OCR文字识别，涵盖Pillow库转换格式、Tesseract OCR识别文字及完整代码示例，助力开发者高效处理图像数据。

Python实现图片格式转换与OCR 文字识别全攻略

在数字化办公和数据处理场景中，图片格式转换与文字识别是两项高频需求。Python凭借其丰富的生态库，能够高效实现图片格式转换（如JPG转PNG）和OCR（光学字符识别）功能。本文将系统介绍如何使用Python完成这两项任务，并提供可复用的代码示例。

一、图片格式转换：Pillow库的深度应用

1.1 Pillow库的核心功能

Pillow是Python中最常用的图像处理库之一，支持超过30种图片格式的读写操作。其核心功能包括：

格式转换：支持JPG、PNG、BMP、GIF等常见格式互转
图像处理：裁剪、旋转、调整大小、色彩空间转换
元数据操作：读取/修改图片的EXIF信息

1.2 基础格式转换实现

from PIL import Image
def convert_image_format(input_path, output_path, target_format):
    """
    图片格式转换函数
    :param input_path: 输入图片路径
    :param output_path: 输出图片路径
    :param target_format: 目标格式（如'PNG', 'JPEG'）
    """
    try:
        with Image.open(input_path) as img:
            # 保存为指定格式
            img.save(output_path, format=target_format)
            print(f"转换成功：{input_path} → {output_path}")
    except Exception as e:
        print(f"转换失败：{str(e)}")
# 示例：将JPG转为PNG
convert_image_format("input.jpg", "output.png", "PNG")

1.3 高级转换技巧

批量转换：使用os.listdir()遍历文件夹实现批量处理
```python
import os

def batch_convert(input_dir, output_dir, target_format):
if not os.path.exists(output_dir):
os.makedirs(output_dir)

for filename in os.listdir(input_dir):
    if filename.lower().endswith(('.jpg', '.jpeg')):
        input_path = os.path.join(input_dir, filename)
        output_path = os.path.join(output_dir, 
                                  os.path.splitext(filename)[0] + f".{target_format.lower()}")
        convert_image_format(input_path, output_path, target_format)


- **质量参数控制**（针对JPEG）：
```python
img.save("output.jpg", format="JPEG", quality=95)  # 质量范围1-100

二、OCR文字识别：Tesseract的集成应用

2.1 Tesseract OCR安装与配置

安装Tesseract：
- Windows：下载安装包并添加到PATH
- Mac：brew install tesseract
- Linux：sudo apt install tesseract-ocr（基础版）
安装Python包装库：
```
pip install pytesseract
```

配置路径（Windows需设置）：

import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

2.2 基础文字识别实现

import pytesseract
from PIL import Image
def ocr_image(image_path, lang='chi_sim+eng'):
    """
    图片文字识别函数
    :param image_path: 图片路径
    :param lang: 语言包（中文简体+英文）
    :return: 识别结果文本
    """
    try:
        with Image.open(image_path) as img:
            text = pytesseract.image_to_string(img, lang=lang)
            return text
    except Exception as e:
        print(f"OCR识别失败：{str(e)}")
        return None
# 示例：识别图片中的中英文
result = ocr_image("text_image.png")
print(result)

2.3 识别优化技巧

预处理增强：二值化、去噪、对比度调整
```python
def preprocess_image(image_path):
img = Image.open(image_path)
转换为灰度图
img = img.convert(‘L’)
二值化处理
threshold = 150
img = img.point(lambda x: 0 if x < threshold else 255)
return img

使用预处理后的图片

processed_img = preprocess_image(“text_image.png”)
text = pytesseract.image_to_string(processed_img)


- **区域识别**：指定识别区域（坐标格式为左上x,左上y,右下x,右下y）
```python
def ocr_region(image_path, box, lang='eng'):
    img = Image.open(image_path)
    region = img.crop(box)
    return pytesseract.image_to_string(region, lang=lang)
# 示例：识别图片中(100,100,300,200)区域的文字
region_text = ocr_region("image.png", (100, 100, 300, 200))

三、完整项目实现：格式转换+OCR一体化

3.1 项目架构设计

project/
├── input/          # 原始图片
├── output/         # 转换后的图片
├── processed/      # OCR处理后的图片
└── main.py         # 主程序

3.2 完整代码实现

import os
from PIL import Image
import pytesseract
class ImageProcessor:
    def __init__(self):
        # 配置Tesseract路径（根据系统调整）
        self.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
        pytesseract.pytesseract.tesseract_cmd = self.tesseract_cmd
    def convert_format(self, input_path, output_dir, target_format):
        """转换图片格式并保存"""
        if not os.path.exists(output_dir):
            os.makedirs(output_dir)
        filename = os.path.basename(input_path)
        new_filename = os.path.splitext(filename)[0] + f".{target_format.lower()}"
        output_path = os.path.join(output_dir, new_filename)
        with Image.open(input_path) as img:
            img.save(output_path, format=target_format)
        return output_path
    def ocr_image(self, image_path, output_dir=None, lang='chi_sim+eng'):
        """识别图片文字"""
        if output_dir:
            # 保存处理后的图片
            processed_dir = os.path.join(output_dir, "processed")
            if not os.path.exists(processed_dir):
                os.makedirs(processed_dir)
            # 预处理图片
            img = Image.open(image_path)
            img = img.convert('L')
            threshold = 150
            img = img.point(lambda x: 0 if x < threshold else 255)
            processed_path = os.path.join(processed_dir, os.path.basename(image_path))
            img.save(processed_path)
        else:
            img = Image.open(image_path)
        text = pytesseract.image_to_string(img, lang=lang)
        return text
    def process_batch(self, input_dir, output_base_dir, target_format="PNG"):
        """批量处理文件夹中的图片"""
        convert_dir = os.path.join(output_base_dir, "converted")
        ocr_dir = os.path.join(output_base_dir, "ocr_results")
        results = []
        for filename in os.listdir(input_dir):
            if filename.lower().endswith(('.png', '.jpg', '.jpeg', '.bmp')):
                input_path = os.path.join(input_dir, filename)
                # 1. 格式转换
                converted_path = self.convert_format(input_path, convert_dir, target_format)
                # 2. OCR识别
                text = self.ocr_image(converted_path, ocr_dir)
                results.append({
                    "original": filename,
                    "converted": os.path.basename(converted_path),
                    "text": text
                })
        return results
# 使用示例
if __name__ == "__main__":
    processor = ImageProcessor()
    results = processor.process_batch(
        input_dir="input",
        output_base_dir="output",
        target_format="PNG"
    )
    # 打印识别结果
    for result in results:
        print(f"\n文件名: {result['original']}")
        print(f"转换后: {result['converted']}")
        print("识别结果:")
        print(result['text'][:200] + "...")  # 只显示前200字符

四、实际应用场景与优化建议

4.1 典型应用场景

文档数字化：将纸质文件扫描件转为可编辑文本
数据采集：从网页截图、报表图片中提取结构化数据
自动化流程：结合RPA实现发票、合同自动处理

4.2 性能优化建议

语言包选择：
- 中文识别：下载chi_sim.traineddata
- 多语言混合：使用lang='chi_sim+eng'
处理速度提升：
- 对大图片先缩放再识别
- 使用多线程处理批量任务
准确率提升：
- 针对特定场景训练定制模型
- 结合OpenCV进行更复杂的预处理

五、常见问题解决方案

5.1 识别准确率低

原因：图片质量差、字体特殊、语言包缺失

解决方案：

# 使用PSM模式（页面分割模式）
text = pytesseract.image_to_string(
    img, 
    lang='chi_sim+eng',
    config='--psm 6'  # 假设为统一文本块
)

5.2 格式转换失败

常见原因：
- 图片损坏
- 不支持的格式
- 内存不足

调试建议：

try:
    img = Image.open(input_path)
    img.verify()  # 验证图片完整性
except Exception as e:
    print(f"图片验证失败：{str(e)}")

六、进阶功能扩展

6.1 结合OpenCV实现高级预处理

import cv2
import numpy as np
def cv_preprocess(image_path):
    # 读取图片
    img = cv2.imread(image_path)
    # 转为灰度图
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    # 去噪
    denoised = cv2.fastNlMeansDenoising(gray, None, 10, 7, 21)
    # 二值化
    _, binary = cv2.threshold(denoised, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
    return binary
# 使用OpenCV处理后转为PIL图像
processed = cv_preprocess("image.png")
pil_img = Image.fromarray(processed)
text = pytesseract.image_to_string(pil_img)

6.2 集成到Web服务

使用Flask创建简单的OCR API：

from flask import Flask, request, jsonify
import base64
from io import BytesIO
app = Flask(__name__)
@app.route('/ocr', methods=['POST'])
def ocr_api():
    data = request.json
    img_data = base64.b64decode(data['image'].split(',')[1])
    img = Image.open(BytesIO(img_data))
    text = pytesseract.image_to_string(img, lang='chi_sim+eng')
    return jsonify({"text": text})
if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

七、总结与展望

Python在图片处理和OCR领域展现出强大的能力，通过Pillow和Tesseract的组合，可以高效实现格式转换和文字识别功能。实际开发中需要注意：

根据场景选择合适的预处理方法
合理配置语言包提升识别准确率
考虑批量处理时的性能优化

未来发展方向包括：

深度学习模型（如CRNN）的集成
实时视频流OCR处理
云端OCR服务的集成方案

本文提供的代码和方案可直接应用于实际项目，开发者可根据具体需求进行调整和扩展。

发表评论

开发者关注产品榜

最热文章

关于作者

被阅读数
被赞数
被收藏数

活动

咨询

Python实现图片格式转换与OCR文字识别全攻略

Python实现图片格式转换与OCR文字识别全攻略

一、图片格式转换：Pillow库的深度应用

1.1 Pillow库的核心功能

1.2 基础格式转换实现

1.3 高级转换技巧

二、OCR文字识别：Tesseract的集成应用

2.1 Tesseract OCR安装与配置

2.2 基础文字识别实现

2.3 识别优化技巧

转换为灰度图

二值化处理

使用预处理后的图片

三、完整项目实现：格式转换+OCR一体化

3.1 项目架构设计

3.2 完整代码实现

四、实际应用场景与优化建议

4.1 典型应用场景

4.2 性能优化建议

五、常见问题解决方案

5.1 识别准确率低

5.2 格式转换失败

六、进阶功能扩展

6.1 结合OpenCV实现高级预处理

6.2 集成到Web服务

七、总结与展望

相关文章推荐

文心一言接入指南：通过百度智能云千帆大模型平台API调用

从 MLOps 到 LMOps 的关键技术嬗变

Sugar BI教你怎么做数据可视化 - 拓扑图，让节点连接信息一目了然

更轻量的百度百舸，CCE Stack 智算版发布

打造合规数据闭环，加速自动驾驶技术研发

LMOps 工具链与千帆大模型平台

发表评论

开发者关注产品榜

百度千帆·大模型服务及Agent开发平台

百度千帆·数据智能平台

秒哒-生成式应用开发平台

百度智能云客悦智能客服平台

最热文章

关于作者

Python实现图片格式转换与OCR 文字识别全攻略