Extraindo texto do PDF com pdfminer

pip3 install pdfminer2

import io

from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage


def convert_pdf_to_txt(path):
    rsrcmgr = PDFResourceManager()
    retstr = io.StringIO()
    codec = 'utf-8'
    laparams = LAParams()
    device = TextConverter(rsrcmgr, retstr, codec=codec, laparams=laparams)
    fp = open(path, 'rb')
    interpreter = PDFPageInterpreter(rsrcmgr, device)
    password = ""
    maxpages = 0
    caching = True
    pagenos = set()

    for page in PDFPage.get_pages(fp, pagenos, maxpages=maxpages,
                                  password=password,
                                  caching=caching,
                                  check_extractable=True):
        interpreter.process_page(page)



    fp.close()
    device.close()
    text = retstr.getvalue()
    retstr.close()
    return text

path = "beautiful soup.pdf"
print(convert_pdf_to_txt(path))

Este artigo foi útil ?

SimNão

extração pdf

Last modified: 2 de abril de 2020

Previous: Instalar selenium com Python3 no Ubuntu

Next: Localizar e substituir Mysql

12 Replies to “Extraindo texto do PDF com pdfminer”

n777slotjililogin disse:

3 de dezembro de 2025 às 12:51

Finally a straightforward login for n777slotjili! No more fumbling around trying to remember passwords. Fast and easy, just how I like it. Get logged in here: n777slotjililogin

Reply
120betlogin disse:

3 de dezembro de 2025 às 23:19

Dead easy login page for 120bet. Bookmarked 120betlogin so I can jump straight in whenever I fancy a bet. No messing about: 120betlogin

Reply
666casinologin disse:

12 de dezembro de 2025 às 10:22

666casinologin made it easy to get in and play quick. Appreciate that. The login process was slick, no hassles. Worth checking out for that alone! 666casinologin

Reply
n188game disse:

4 de janeiro de 2026 às 18:32

n188game, time to get my game on! Hoping for some awesome gameplay and even better wins. Let’s see what this site has in store. Game on!n188game

Reply
gà đòn c1 disse:

5 de janeiro de 2026 às 09:29

For the gà đòn enthusiasts, I’m seeing people talk about gà đòn c1. Might be something worth checking out, right?

Reply
winwinslot777 disse:

7 de janeiro de 2026 às 01:02

Hey y’all! If you’re a slot fan, Winwinslot777 is calling your name! Amazing payouts and a really cool theme. Check them out here: winwinslot777

Reply
aaaaph disse:

9 de fevereiro de 2026 às 22:42

AAAAPH… Hmm, never heard of it before. Always up for trying something new. Let’s see what they are offering. Fingers crossed! aaaaph

Reply
a7xmexico disse:

9 de fevereiro de 2026 às 22:42

A7xmexico? If it’s anything like the band A7X im gonna have a good time. Checking it out, and you might wanna too a7xmexico

Reply
8betcasino disse:

9 de fevereiro de 2026 às 22:43

Just poking around 8betcasino. Nothing to lose, maybe something to win! 8betcasino

Reply
apuestas casino caliente disse:

26 de março de 2026 às 00:11

Apuestas Casino Caliente offers a lot of different betting options. From sports to casino games, they have it all. Good place to diversify your bets. Check out the betting options: apuestas casino caliente

Reply
80jili cc disse:

26 de março de 2026 às 00:11

Trying my luck on 80jili cc. Games are pretty standard but hopefully the payouts are good! Join me, maybe we can both win: 80jili cc

Reply
phl63 casino login disse:

26 de março de 2026 às 00:11

Ready to hit the PHL63 casino? Make sure you login through the right portal! Here’s the link I use: phl63 casino login. Fingers crossed for some big wins!

Reply

Rafaelrjp Blog

Extraindo texto do PDF com pdfminer

12 Replies to “Extraindo texto do PDF com pdfminer”

Deixe um comentário Cancelar resposta

Categorias

Arquivos

Meta