OCR pdf (s) are unrecognized by X1

Do you want to see something in X1? Do you dislike something about X1? Let us know!

Moderator: Mods

OCR pdf (s) are unrecognized by X1

Postby pdf dependent » Mon Jul 24, 2006 12:09 pm

OCR pdfs are unrecognized by X1 These pdfs are text searchable in Acrobat. OCR is done by ABBY in a package by Fujistu. (don't confuse file name search for text search, it is text search that does not work).
pdf dependent
 
Posts: 14
Joined: Mon Jul 24, 2006 11:55 am

Postby BillChapman » Mon Jul 24, 2006 12:40 pm

I have no problem finding text in my OCR'd PDF files with X1.
BillChapman
X1 Super User
X1 Super User
 
Posts: 136
Joined: Wed Nov 17, 2004 7:19 pm

Which software do you use for your ocr and what version of

Postby pdf dependent » Mon Jul 24, 2006 12:54 pm

adobe do you use?
pdf dependent
 
Posts: 14
Joined: Mon Jul 24, 2006 11:55 am

Postby BillChapman » Mon Jul 24, 2006 1:11 pm

Yes, Acrobat Professional 7.08.
BillChapman
X1 Super User
X1 Super User
 
Posts: 136
Joined: Wed Nov 17, 2004 7:19 pm

Postby Kenward » Tue Jul 25, 2006 8:32 am

Silly question, but you have, of course, run the Adobe OCR on the files you want to index? And checked that they really do have the text attached?
MK
X1 Search 8.6.1 - Build 6003fa (64-bit)
Windows 10 Pro 64-bit | Windows 10 Home 32-bit
No, I have nothing to do with X1, just a user since 2004.
Kenward
X1 Guru
X1 Guru
 
Posts: 4149
Joined: Tue Apr 20, 2004 2:35 am
Location: UK

Postby BillChapman » Tue Jul 25, 2006 9:30 am

Yes, the OCR'd files are indexed, and I can find text in them with X1.
BillChapman
X1 Super User
X1 Super User
 
Posts: 136
Joined: Wed Nov 17, 2004 7:19 pm

do they really have text attached

Postby pdf dependent » Tue Jul 25, 2006 9:45 am

Frankly I'm not sure how to determine that other than to run and acrobat text search, which I have and which does find text.
pdf dependent
 
Posts: 14
Joined: Mon Jul 24, 2006 11:55 am

Postby BillChapman » Tue Jul 25, 2006 11:29 am

Another thing you might want to do is to add the flags column to your Files list display. The contents of that column will tell you the status of each file (indexed, skipped, etc.). My OCR'd PDF files are identified as indexed, and indeed I am able to search for and find text within them. Those in which X1 is unable to find text are marked as skipped.
BillChapman
X1 Super User
X1 Super User
 
Posts: 136
Joined: Wed Nov 17, 2004 7:19 pm

Updated the ABBy Software (scansnap) and it is now working

Postby pdf dependent » Tue Jul 25, 2006 1:40 pm

However it will not highlight my search times. Search terms are highlighted in other pdfs
pdf dependent
 
Posts: 14
Joined: Mon Jul 24, 2006 11:55 am

Re: Updated the ABBy Software (scansnap) and it is now worki

Postby Kenward » Tue Jul 25, 2006 2:12 pm

pdf dependent wrote:However it will not highlight my search times. Search terms are highlighted in other pdfs


This is probably something to do with the way in which the PDF files store text and image. The viewer (which I'd guess is a bought in part of X1 has to match up the two.

In a regular PDF file, there is no image/overlay issue.

Not even sure that Acrobat is that good at handling OCR'd files.
MK
X1 Search 8.6.1 - Build 6003fa (64-bit)
Windows 10 Pro 64-bit | Windows 10 Home 32-bit
No, I have nothing to do with X1, just a user since 2004.
Kenward
X1 Guru
X1 Guru
 
Posts: 4149
Joined: Tue Apr 20, 2004 2:35 am
Location: UK

Postby BillChapman » Tue Jul 25, 2006 3:50 pm

I see now what you are saying. X1 finds and displays the OCR'd PDFs, but doesn't highlight the text being searched for. That is the way it works on my system too. The files are indexed, X1 does search them and does find and display the ones with the text I'm searching for, but it doesn't highlight that text as it does in PDFs created directly from web pages, Word files, etc.
BillChapman
X1 Super User
X1 Super User
 
Posts: 136
Joined: Wed Nov 17, 2004 7:19 pm

Postby Kenward » Wed Jul 26, 2006 1:49 am

BillChapman wrote:I see now what you are saying. X1 finds and displays the OCR'd PDFs, but doesn't highlight the text being searched for. That is the way it works on my system too. The files are indexed, X1 does search them and does find and display the ones with the text I'm searching for, but it doesn't highlight that text as it does in PDFs created directly from web pages, Word files, etc.



PDF files are a world unto themselves. They come in various flavours. Not all of them are compatible. While Adobe "owns" the standard, it is on in that you do not have to pay them royalties if you produce software that works on PDF files. There is a good explanation here:

http://en.wikipedia.org/wiki/.pdf

I am a regular user, and beta test, of Nuance PaperPort software. This scans, creates and manages PDF files. The variations between different files causes constant anguish. In particular, people get very confused by the difference between indexed and searchable PDF files.

An indexed file is one that PaperPort can find, but that it cannot search within. A searchable file means that you can find where words appear within a document.

To get searchable files in PaperPort, you also have to have third party OCR software, such as OmniPage, which also comes from Nuance.

The use of third party search tools, such as X1, is a regular topic in the PaperPort community. A lot of people use X1 and its variants, so there is plenty of experience of getting the two to work together.
MK
X1 Search 8.6.1 - Build 6003fa (64-bit)
Windows 10 Pro 64-bit | Windows 10 Home 32-bit
No, I have nothing to do with X1, just a user since 2004.
Kenward
X1 Guru
X1 Guru
 
Posts: 4149
Joined: Tue Apr 20, 2004 2:35 am
Location: UK

Postby askwong » Tue Aug 22, 2006 11:21 pm

There are scanner hardware packages that come included with both OCR and X1 program.

Dynamite Desktop Document Scanners
08.23.06

www.pcmag.com/article2/0,1895,2006864,00.asp
www.pcmag.com/article2/0,1895,1992804,00.asp

...
And what a nice collection of software it is: ScanSoft PaperPort 10 for document management, ScanSoft OmniPage Pro 14 for industrial-strength optical character recognition (OCR), X1 Enterprise Client 5.2 for indexing files and retrieving them by searching for any text in the file, NewSoft Presto! Bizcard 5 for business cards, and a Twain driver so you can scan from virtually any program with a scan command. Rounding the assortment off is Arcsoft Scrapbook Suite, which includes a photo-editing program, but I can't recommend scanning photos on a sheet-fed scanner; the rollers tend to leave marks on the originals.

Some of these programs are a generation behind the latest and greatest versions, but even at one generation behind, PaperPort, OmniPage, and (the current) X1 are a terrific trio for small-office document management. Between them, you can scan, OCR scan, organize, and index your files, then find them as quickly as you can type the text you're looking for. And the Visioneer One Touch software, which sits in the system tray, lets you bring up a menu from which you can easily pick where to scan to—e-mail, a fax program, a printer, a searchable PDF file, and more.
...
askwong
X1 Super User
X1 Super User
 
Posts: 118
Joined: Mon May 22, 2006 11:29 pm

Postby Kenward » Wed Aug 23, 2006 1:30 am

Some of these programs are a generation behind the latest and greatest versions, but even at one generation behind, PaperPort, OmniPage, and (the current) X1 are a terrific trio for small-office document management.


And if the only thing you want to do is to create searchable PDF files, you can save money by staying "behind the latest and greatest versions". For example, I have it on good authority – that is, someone in the company – that old versions of OmniPage work with PaperPort just fine to create searchable PDFs.

That could save quite a lot of dollars as the earlier versions are out there at knock-down prices.
MK
X1 Search 8.6.1 - Build 6003fa (64-bit)
Windows 10 Pro 64-bit | Windows 10 Home 32-bit
No, I have nothing to do with X1, just a user since 2004.
Kenward
X1 Guru
X1 Guru
 
Posts: 4149
Joined: Tue Apr 20, 2004 2:35 am
Location: UK


Return to Feature Requests and Gripes

Who is online

Users browsing this forum: No registered users and 40 guests

cron