Python图片处理全攻略：格式转换与OCR文字识别实战指南

作者：很菜不狗2025.10.10 19:21浏览量：2

简介：本文详细介绍如何使用Python实现图片格式转换与OCR文字识别，涵盖Pillow库的格式转换方法、Tesseract OCR的安装配置及优化技巧，提供完整代码示例和实用建议。

Python图片处理全攻略：格式转换与OCR 文字识别实战指南

一、图片格式转换：Pillow库的强大能力

图片格式转换是图像处理的基础需求，Python的Pillow库（PIL）提供了简单高效的解决方案。作为Python Imaging Library的分支，Pillow支持包括JPEG、PNG、BMP、GIF、TIFF等数十种格式的相互转换。

1.1 基础格式转换实现

from PIL import Image
def convert_image_format(input_path, output_path, output_format):
    """
    图片格式转换函数
    :param input_path: 输入图片路径
    :param output_path: 输出图片路径
    :param output_format: 目标格式（如'JPEG', 'PNG'）
    """
    try:
        with Image.open(input_path) as img:
            # 保存为指定格式
            img.save(output_path, format=output_format)
            print(f"转换成功：{input_path} -> {output_path}")
    except Exception as e:
        print(f"转换失败：{str(e)}")
# 使用示例
convert_image_format('input.png', 'output.jpg', 'JPEG')

1.2 高级转换技巧

批量转换：使用os.listdir()遍历文件夹实现批量处理
质量参数控制：JPEG格式可通过quality参数调整压缩质量（1-100）
透明度处理：PNG转JPEG时需指定背景色或忽略透明通道

def batch_convert(input_dir, output_dir, output_format, quality=95):
    import os
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
    for filename in os.listdir(input_dir):
        if filename.lower().endswith(('.png', '.jpg', '.jpeg', '.bmp')):
            input_path = os.path.join(input_dir, filename)
            output_path = os.path.join(output_dir, 
                                      os.path.splitext(filename)[0] + f'.{output_format.lower()}')
            try:
                with Image.open(input_path) as img:
                    if output_format.upper() == 'JPEG' and img.mode in ('RGBA', 'LA'):
                        # 处理带透明通道的图片
                        background = Image.new('RGB', img.size, (255, 255, 255))
                        background.paste(img, mask=img.split()[-1])
                        background.save(output_path, format='JPEG', quality=quality)
                    else:
                        img.save(output_path, format=output_format, quality=quality)
            except Exception as e:
                print(f"处理{filename}失败：{str(e)}")

二、OCR文字识别：Tesseract的深度应用

Tesseract OCR是由Google维护的开源OCR引擎，支持100多种语言，是Python中实现文字识别的首选方案。

2.1 安装与基础配置

安装Tesseract：
- Windows：下载安装包并添加到PATH
- Mac：brew install tesseract
- Linux：sudo apt install tesseract-ocr（基础版）或sudo apt install tesseract-ocr-[lang]（特定语言）
安装Python包装器：
```
pip install pytesseract
```

2.2 基础识别实现

import pytesseract
from PIL import Image
def ocr_with_tesseract(image_path, lang='chi_sim+eng'):
    """
    基础OCR识别函数
    :param image_path: 图片路径
    :param lang: 语言包（中文简体+英文）
    :return: 识别结果文本
    """
    try:
        img = Image.open(image_path)
        text = pytesseract.image_to_string(img, lang=lang)
        return text
    except Exception as e:
        print(f"OCR识别失败：{str(e)}")
        return None
# 使用示例
result = ocr_with_tesseract('text_image.png')
print(result)

2.3 识别优化技巧

图像预处理：
- 二值化处理：增强文字与背景对比度
- 降噪：使用高斯模糊或中值滤波
- 旋转校正：检测并修正倾斜图片

import cv2
import numpy as np
def preprocess_image(image_path):
    # 读取图片（使用OpenCV保持通道顺序）
    img = cv2.imread(image_path)
    # 转换为灰度图
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    # 二值化处理
    _, binary = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
    # 可选：降噪处理
    denoised = cv2.medianBlur(binary, 3)
    return denoised
def advanced_ocr(image_path, lang='chi_sim+eng'):
    processed_img = preprocess_image(image_path)
    # 临时保存处理后的图片供Tesseract使用
    temp_path = 'temp_processed.png'
    cv2.imwrite(temp_path, processed_img)
    text = pytesseract.image_to_string(Image.open(temp_path), lang=lang)
    return text

区域识别：通过指定识别区域提高准确率

def ocr_specific_area(image_path, coordinates, lang='chi_sim+eng'):
 """
 识别图片指定区域
 :param coordinates: (x1, y1, x2, y2) 左上和右下坐标
 """
 img = Image.open(image_path)
 area = img.crop(coordinates)
 return pytesseract.image_to_string(area, lang=lang)

三、完整项目实现：格式转换+OCR流水线

import os
from PIL import Image
import pytesseract
import cv2
import numpy as np
class ImageProcessor:
    def __init__(self, ocr_lang='chi_sim+eng'):
        self.ocr_lang = ocr_lang
    def convert_format(self, input_path, output_path, output_format, quality=95):
        """格式转换核心方法"""
        try:
            with Image.open(input_path) as img:
                if output_format.upper() == 'JPEG' and img.mode in ('RGBA', 'LA'):
                    background = Image.new('RGB', img.size, (255, 255, 255))
                    background.paste(img, mask=img.split()[-1])
                    background.save(output_path, format='JPEG', quality=quality)
                else:
                    img.save(output_path, format=output_format, quality=quality)
                return True
        except Exception as e:
            print(f"格式转换错误：{str(e)}")
            return False
    def preprocess_for_ocr(self, image_path):
        """OCR专用预处理"""
        img = cv2.imread(image_path)
        gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
        _, binary = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
        return binary
    def recognize_text(self, image_path):
        """文字识别核心方法"""
        processed = self.preprocess_for_ocr(image_path)
        temp_path = 'temp_ocr.png'
        cv2.imwrite(temp_path, processed)
        try:
            text = pytesseract.image_to_string(Image.open(temp_path), lang=self.ocr_lang)
            os.remove(temp_path)  # 清理临时文件
            return text
        except Exception as e:
            print(f"OCR识别错误：{str(e)}")
            return None
    def process_pipeline(self, input_path, output_format, output_text_path=None):
        """完整处理流水线"""
        # 1. 格式转换
        base_name = os.path.splitext(os.path.basename(input_path))[0]
        converted_path = f"{base_name}_converted.{output_format.lower()}"
        if not self.convert_format(input_path, converted_path, output_format):
            return None
        # 2. 文字识别
        text = self.recognize_text(converted_path)
        # 3. 保存识别结果
        if output_text_path and text:
            with open(output_text_path, 'w', encoding='utf-8') as f:
                f.write(text)
        return {
            'converted_image': converted_path,
            'recognized_text': text,
            'status': 'success'
        }
# 使用示例
processor = ImageProcessor(ocr_lang='chi_sim+eng')
result = processor.process_pipeline(
    input_path='input_doc.png',
    output_format='JPEG',
    output_text_path='output_text.txt'
)
print(result)

四、实际应用建议

性能优化：
- 对大图片先缩放再识别（建议宽度不超过2000px）
- 使用多线程处理批量任务
- 对于固定格式的文档，可预先训练Tesseract模型
准确率提升：
- 中文识别建议使用chi_sim（简体）或chi_tra（繁体）
- 复杂背景图片可尝试调整二值化阈值
- 表格类图片建议先检测表格线再分区识别
错误处理：
- 添加文件存在性检查
- 实现重试机制应对临时性错误
- 记录处理日志便于问题追踪

五、扩展功能实现

PDF转图片再OCR：
```python
from pdf2image import convert_from_path

def pdf_to_text(pdf_path, output_text_path, dpi=300, lang=’chi_sim+eng’):
images = convert_from_path(pdf_path, dpi=dpi)
full_text = []
processor = ImageProcessor(lang)

for i, image in enumerate(images):
    temp_path = f'temp_page_{i}.png'
    image.save(temp_path, 'PNG')
    text = processor.recognize_text(temp_path)
    if text:
        full_text.append(text)
    os.remove(temp_path)
if full_text:
    with open(output_text_path, 'w', encoding='utf-8') as f:
        f.write('\n'.join(full_text))


2. **多语言混合识别**：
```python
# 安装额外语言包（如日文）
# sudo apt install tesseract-ocr-jpn
# 然后使用lang='jpn+eng'

六、常见问题解决方案

Tesseract找不到路径：
- Windows需设置pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
中文识别率低：
- 确认已安装中文语言包
- 尝试lang='chi_sim'（简体）或'chi_tra'（繁体）
- 对低质量图片增加预处理步骤
内存不足：
- 处理大文件时分块读取
- 及时关闭不再使用的图片对象
- 增加系统交换空间

通过本文介绍的完整方案，开发者可以快速构建起从图片格式转换到文字识别的完整处理流程。实际项目中，建议根据具体需求调整预处理参数和识别策略，对于关键业务场景，可考虑结合商业OCR API实现更高准确率的识别。

发表评论

开发者关注产品榜

最热文章

关于作者

被阅读数
被赞数
被收藏数

活动

咨询

开发者热搜

Python图片处理全攻略：格式转换与OCR文字识别实战指南

Python图片处理全攻略：格式转换与OCR 文字识别实战指南

一、图片格式转换：Pillow库的强大能力

1.1 基础格式转换实现

1.2 高级转换技巧

二、OCR文字识别：Tesseract的深度应用

2.1 安装与基础配置

2.2 基础识别实现

2.3 识别优化技巧

三、完整项目实现：格式转换+OCR流水线

四、实际应用建议

五、扩展功能实现

六、常见问题解决方案

相关文章推荐

文心一言接入指南：通过百度智能云千帆大模型平台API调用

从 MLOps 到 LMOps 的关键技术嬗变

Sugar BI教你怎么做数据可视化 - 拓扑图，让节点连接信息一目了然

更轻量的百度百舸，CCE Stack 智算版发布

打造合规数据闭环，加速自动驾驶技术研发

LMOps 工具链与千帆大模型平台

发表评论

开发者关注产品榜

百度千帆·大模型服务及Agent开发平台

百度千帆·数据智能平台

秒哒-生成式应用开发平台

百度智能云客悦智能客服平台

最热文章

关于作者