Best Regards, Phil
From: For users of askSam: A Free-form Information Manager
[mailto:[log in to unmask]] On Behalf Of Phil Schnyder
Sent: Wednesday, February 12, 2003 5:02 PM
To: [log in to unmask]
Subject: Re: Betr.: failed import of pdf file
TextPipe only works during import (and only on Text and HTML files). But it
does have pre-set functions to remove blank lines, remove spaces at the
beginning of lines, remove HTML tags, etc.
So although you can create filters for specific types of files, there are
also generic functions that will work on any type of file.
Hope this helps.
Perry, FL (The Silicon Swamp)
askSam: Turn Information into Searchable Databases
Try it at: http://www.askSam.com
On Wed, 12 Feb 2003 10:46:05 -0800, Bonnie Britt wrote:
>This is a characteristic of PDF files, and there's no obvious way
>to "make it pretty" in askSam or any other software outside of
>its native PDF format without wasting amazing amounts of time. If
>it is a text-based PDF (and not a graphic) I import the PDF file,
>warts and all, into askSam, then link to the original on the hard
>drive as a backup. I don't bother to clean up the pdf import
>since the text is there only to be searched. If and when it does
>come up of interest in a search, then it makes sense to look at
>the original for context, tables and images. If you wanted to
>clean it up a bit to save space etc., you could search through
>that imported document with ^p^p^p and replace that with ^p^p to
>eliminate some excess blank lines. That takes only a second.
>Is anyone using the new Textpipe thingie, and if so, does it save time when
>cleanup is desirable? Is it useful only when importing documents formatted
>in the same manner? Or, does it require new sets of instructions for new
>of documents formatted every which way?
>From: "Frank Thomas" <[log in to unmask]>
>| But there is a nother reason that made me hesitate to import pdf-files:
>| You need to post-treat the text quite a lot as there are numerous empty
>| lines, change of character size or style etc. And of course the
>| question of tables and of images.