Sorry about the earlier blank post...
One of the best filters that come with TextPipe is a filter that extracts
email addresses. If you have a file that has email addresses throughout the
document in irregular format (e.g., some addresses showing the name before
the email address or email addresses buried along with other text), you run
it through TextPipe and you end up with a clean listing of email addresses
only. I found this filter to be very useful.
Best Regards, Phil
______________________________
Phillip Rodokanakis
Consultant and Authorized Reseller
askSam Systems, Inc.
Tel.: 703-766-0500 (Office)
FAX: 703-736-0817
Email: [log in to unmask]
-----Original Message-----
From: For users of askSam: A Free-form Information Manager
[mailto:[log in to unmask]] On Behalf Of Phil Schnyder
Sent: Wednesday, February 12, 2003 5:02 PM
To: [log in to unmask]
Subject: Re: Betr.: failed import of pdf file
Bonnie,
TextPipe only works during import (and only on Text and HTML files). But it
does have pre-set functions to remove blank lines, remove spaces at the
beginning of lines, remove HTML tags, etc.
So although you can create filters for specific types of files, there are
also generic functions that will work on any type of file.
Hope this helps.
Phil
askSam Systems
Perry, FL (The Silicon Swamp)
http://www.askSam.com/
850-584-6590
__________________________________________________________________
askSam: Turn Information into Searchable Databases
Try it at: http://www.askSam.com
On Wed, 12 Feb 2003 10:46:05 -0800, Bonnie Britt wrote:
>This is a characteristic of PDF files, and there's no obvious way
>to "make it pretty" in askSam or any other software outside of
>its native PDF format without wasting amazing amounts of time. If
>it is a text-based PDF (and not a graphic) I import the PDF file,
>warts and all, into askSam, then link to the original on the hard
>drive as a backup. I don't bother to clean up the pdf import
>since the text is there only to be searched. If and when it does
>come up of interest in a search, then it makes sense to look at
>the original for context, tables and images. If you wanted to
>clean it up a bit to save space etc., you could search through
>that imported document with ^p^p^p and replace that with ^p^p to
>eliminate some excess blank lines. That takes only a second.
>
>Is anyone using the new Textpipe thingie, and if so, does it save time when
>cleanup is desirable? Is it useful only when importing documents formatted
>in the same manner? Or, does it require new sets of instructions for new
>kinds
>of documents formatted every which way?
>Bonnie Britt
>
>
>From: "Frank Thomas" <[log in to unmask]>
>
>| But there is a nother reason that made me hesitate to import pdf-files:
>| You need to post-treat the text quite a lot as there are numerous empty
>| lines, change of character size or style etc. And of course the
>| question of tables and of images.
>| Thanks
>| /Frank
>|
|