PDF - problems viewing search terms in preview window

X1® Professional Client Version - Raptor builds (both Beta and Release Candidates)

Moderator: Mods

PDF - problems viewing search terms in preview window

Postby artberg » Wed Aug 03, 2011 4:15 pm

I've been working on this problem with Greg Dawes. He suggested that I post my issue here.

I have several hundred pdf documents which I need to be able to search. The documents were created by scanning paper copies. I ran OCR using Adobe Acrobat. X1 searches the documents correctly, but does not display or highlight the search terms in the preview window. That means I cannot easily go through the documents found in the search to see which one(s) I want.

The problem seems to be with the scanning. I have taken several word processing documents and printed them directly to pdf. Those documents display correctly. The problem appears only in scanned documents.

Has anyone else encountered this problem. If so, do you know of a fix or workaround.

Thanks for any help you can give.
artberg
 
Posts: 2
Joined: Wed Aug 03, 2011 3:58 pm

Re: PDF - problems viewing search terms in preview window

Postby Kenward » Thu Aug 04, 2011 7:43 am

I see this all the time. It is a consequence of the way in which you create these files.

The "print to pdf" files are "native" PDF files. The text is the document.

The scanned and OCR'd files are actually images with a text overlay. Unfortunately, the text and the image don't always overlap.

It may be that Acrobat's OCR isn't as good as "proper" OCR software. If you open these files in Acrobat, can you find the term you want? If so, this at least tells you that it is accurate enough in the OCR process.

Unfortunately, X1's response to these files varies. Some files display highlights in X1, others do not.

You are fortunate in having only hundreds of these files. I have 12,000 or so.

You may have more luck if you throw these files at better OCR software. I use either OmniPage, or its embedded version in PaperPort.
MK
X1 Search 8.5.2 - Build 6001si (64-bit)
Windows 10 Pro 64-bit | Windows 10 Home 32-bit
No, I have nothing to do with X1, just a user since 2004.
Kenward
X1 Guru
X1 Guru
 
Posts: 4065
Joined: Tue Apr 20, 2004 2:35 am
Location: UK

Re: PDF - problems viewing search terms in preview window

Postby artberg » Thu Aug 04, 2011 8:22 am

Thanks for your prompt response to my post.

Actually, I probably have as many pdf files as you do, since I scan everything that comes into or goes out of my office. The hundreds I mentioned are only those in a current project.

Yes, I have no problem finding search terms using Adobe. So the scanning/OCR process seems to be working.

Since you seem to have success using Paperport for your OCR, I tried scanning a document as a "searchable pdf" rather than as an image file. Unfortunately, X1 still didn't highlight the search terms. However, following your suggestion, I did the OCR through paperport using the OmniPro option. It worked! However, since I've never used that feature before and didn't really know what I was doing, the process seemed cumbersome. I set up a new converter option as "pdf with image substitutes." That process gave me two copies of the file, tif and pdf. The pdf one worked in X1. Is there a way not to get the tif? Perhaps I was using the wrong converter option. Can you perform the OmniPage convertions on multiple files as a batch without having to look at each one individually?

Anyway, thanks to your suggestions I'm ahead of where I was yesterday, although not entirely where I want to be. If you have any other suggestions, I'd be grateful.

Art




Can you describe in more detail how you use paperport to get files that work?
artberg
 
Posts: 2
Joined: Wed Aug 03, 2011 3:58 pm

Re: PDF - problems viewing search terms in preview window

Postby Kenward » Thu Aug 04, 2011 11:15 am

Glad you are getting somewhere.

I also own the latest version of OmniPage.

I usually leave it to PaperPort to create the searchable PDF files. It is just a lot easier.

Maybe I will see if OmniPage does a better job.
MK
X1 Search 8.5.2 - Build 6001si (64-bit)
Windows 10 Pro 64-bit | Windows 10 Home 32-bit
No, I have nothing to do with X1, just a user since 2004.
Kenward
X1 Guru
X1 Guru
 
Posts: 4065
Joined: Tue Apr 20, 2004 2:35 am
Location: UK

Re: PDF - problems viewing search terms in preview window

Postby Chris Wheaton » Thu Aug 04, 2011 11:35 am

Art - I've been watching this dialog and have been assisting Mr. Dawes a bit on your support case. Could you do a favor for me? Please, if possible, send a response to the case you are working with Mr. Dawes and attach a copy of the PDF you OCRd using Paperport / Omnipro. I want to compare it to the originals. Thank you.
Sincerely,

Chris Wheaton
Technical Support
Customer Care Rep.
______________________

www.X1.com
Chris Wheaton
X1 Rep
X1 Rep
 
Posts: 299
Joined: Tue Dec 23, 2008 1:00 pm
Location: Pasadena, CA

Re: PDF - problems viewing search terms in preview window

Postby Kenward » Thu Aug 04, 2011 1:55 pm

The file at the end of this link displays and searches fine in PDF readers, but X1 will not highlight the word "graphene".

By contrast, this file does highlight.

Both created by scanning into PaperPort. Both claim to be the same version of PDF.
MK
X1 Search 8.5.2 - Build 6001si (64-bit)
Windows 10 Pro 64-bit | Windows 10 Home 32-bit
No, I have nothing to do with X1, just a user since 2004.
Kenward
X1 Guru
X1 Guru
 
Posts: 4065
Joined: Tue Apr 20, 2004 2:35 am
Location: UK

Re: PDF - problems viewing search terms in preview window

Postby DaVo » Tue Aug 01, 2017 5:02 am

Hi all,

I' currently trying out X1 Search in Version 8.5.2. Unfortunately, I'm experiencing the same problem as described here 6 years ago. As I need the indexer maily for searching within scanned PDFs, the fact that X1 is still showing this bahaviour is a show-stopper to me.
So I would really love to hear, that this is just a user-error and not a bug, that's still not fixed.

I'm concurrently trying out Copernic Desktop Search and Lookeen, and these two DO the highlighting correly on the same files. I just like X1 Search more regarding UI and usability.

Hope to hear "good news" :)

DaVo
DaVo
 
Posts: 4
Joined: Tue Aug 01, 2017 4:54 am

Re: PDF - problems viewing search terms in preview window

Postby Kenward » Wed Aug 02, 2017 8:22 am

DaVo wrote:Unfortunately, I'm experiencing the same problem as described here 6 years ago.


Which problem?

Failure to find and highlight text in scanned files?

X1 is rumoured to be working on new versions of the file viewer technology it uses. I don't know if that will address this issue. I am more interested in getting it to copy text from a PDF file as text and not gibberish.
MK
X1 Search 8.5.2 - Build 6001si (64-bit)
Windows 10 Pro 64-bit | Windows 10 Home 32-bit
No, I have nothing to do with X1, just a user since 2004.
Kenward
X1 Guru
X1 Guru
 
Posts: 4065
Joined: Tue Apr 20, 2004 2:35 am
Location: UK

Re: PDF - problems viewing search terms in preview window

Postby DaVo » Mon Aug 07, 2017 4:45 am

Yes, exactly.

But I have to say, that I'm trying out X1 Search by now and compare it to competitors like Copernic Desktop Search and Lookeen. And BOTH of the competitors DO highlight the search terms in the same scanned files correctly. I really like X1 Search more due to the better GUI and the faster search results, but as I'm mainly searching in my "paperless-office-self-scanned-PDFs", this bug/behaviour/missing feature of X1 could still force me to use the alternatives. :(

Is there anyone who could tell something about this rumor you stated?
DaVo
 
Posts: 4
Joined: Tue Aug 01, 2017 4:54 am

Re: PDF - problems viewing search terms in preview window

Postby Kenward » Tue Aug 08, 2017 3:47 am

MK
X1 Search 8.5.2 - Build 6001si (64-bit)
Windows 10 Pro 64-bit | Windows 10 Home 32-bit
No, I have nothing to do with X1, just a user since 2004.
Kenward
X1 Guru
X1 Guru
 
Posts: 4065
Joined: Tue Apr 20, 2004 2:35 am
Location: UK


Return to X1 Professional Client - 6.7.x (Raptor)

Who is online

Users browsing this forum: No registered users and 9 guests