20200318 2208

 

00 개요

1/ PDF의 텍스트를 인식하고 싶다. PDF를 바로 읽는 OCR은 없다. 이미지로 변환해야 한다.

 

01 PDF with Text 경우는 Text를 추출할 수 있다.

 

코드

#NoEnv  ; Recommended for performance and compatibility with future AutoHotkey releases.

; #Warn  ; Enable warnings to assist with detecting common errors.

SendMode Input  ; Recommended for new scripts due to its superior speed and reliability.

SetWorkingDir %A_ScriptDir%  ; Ensures a consistent starting directory.

IfNotExist, %A_ScriptDir%\pdftotext.exe

{

MsgBox, 16, ConvertPDF2TXT, pdftotext.exe not in Script directory

exitapp

}

Loop, Files, %A_ScriptDir%\*.pdf

{

NameTXT := SubStr(A_LoopFileName, 1, StrLen(A_LoopFileName)-4)

msgbox, 1:%A_LoopFileFullPath%`n2:%A_LoopFileName%

IfExist, %A_ScriptDir%\%NameTXT%.txt

{

MsgBox, 36, ConvertPDF2TXT, File %A_LoopFileName%.txt exist, overwrite?

ifmsgbox, yes

{

RunWait, %comspec% /c ""%A_ScriptDir%\pdftotext.exe" "-table" "%A_LoopFileFullPath%" "%A_ScriptDir%\%NameTXT%.txt"" , , hide

ZaehlerOk .= 1

}

ifmsgbox, no

ZaehlerNOk .= 1

}

else

{

RunWait, %comspec% /c ""%A_ScriptDir%\pdftotext.exe" "-table" "%A_LoopFileFullPath%" "%A_ScriptDir%\%NameTXT%.txt"" , , hide    

ZaehlerOk .= 1

}

 

}

MsgBox, 0, ConvertPDF2TXT, %ZaehlerOk% File(s) converted`n%ZaehlerNOk% File(s) ignored

return

exitapp

 

‘pdftotext 프로그램을 설치해두어야 한다.

 

02 이미지 텍스트 인식 프로그램

http://capture2text.sourceforge.net/

Capture2Text is free and licensed under the terms of the GNU General Public License.

 

command line이 가능하니까 AH로 처리 가능

 

03 PDF > JPG 프로그램

https://www.weenysoft.com/pdf-to-image-converter-command-line.html

https://www.weenysoft.com/free-pdf-to-image-converter.html

안된다.

 

 

'[PA] 업무자동화 > [AH]Autohotkey' 카테고리의 다른 글

AH 엑셀 날짜 형식  (0) 2020.03.24
AH Files to zip  (0) 2020.03.23
AH 1을 01로 표시하기  (0) 2020.03.23
AH Unzip with password  (0) 2020.03.23
AH Unzip password file  (0) 2020.03.19
Posted by Weneedu
,


출처: https://privatedevelopnote.tistory.com/81 [개인노트]