LISTSERV mailing list manager LISTSERV 16.0

Help for ASKSAM-L Archives


ASKSAM-L Archives

ASKSAM-L Archives


ASKSAM-L@LISTSERV.VT.EDU


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

ASKSAM-L Home

ASKSAM-L Home

ASKSAM-L  February 2003

ASKSAM-L February 2003

Subject:

Re: Betr.: failed import of pdf file

From:

JulianFlor <[log in to unmask]>

Reply-To:

For users of askSam: A Free-form Information Manager

Date:

Thu, 13 Feb 2003 13:36:10 +0100

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (105 lines)

I would suggest using an extra tool like a good text editor (notetab,
Ultraedit, Textpad) to copy the context of a asksam doc (that is, the
content of the pdf file) into this editor and have it clean these
text by "search&replace". Activating the option "regular expressions"
would convert any double, triple or what ever space into just one:

Search for: "  +"
(two spaces and the +-sign, without "")

replace by: " "
(ONE space)

Reformatting broken lines (deleting line feeds at the end of line)
(in Notetab)
highligt the text you want to reformat and press Ctrl-J

Convert many blank lines into one (without regular expression
activated) would be:

search: ^p^p^p
replace by: ^p^p

and repeat this operation if necessary (askSam itself is able to do
this, but I'm not very lucky with its global operator - it is not
very stable - sometimes it works, sometimes not.

Julian

PS: I know that this is only a uncomfortable workaround, but as
someone said - PDF files are very difficult to export/reformat by
almost any external tool. The only exception I know of is dtSearch,
(Version 5.xx or higher) where you can toggle the option "show search
results in the internal viewer" and what you get is a simple, but
clean text output (extracted from the original pdf file no matter how
complex the original layout may be).




On Wed, 12 Feb 2003 17:01:53 -0500, Phil Schnyder wrote:
>Bonnie,
>
>TextPipe only works during import (and only on Text and HTML files).
>But it does have pre-set functions to remove blank lines, remove
>spaces at the beginning of lines, remove HTML tags, etc.
>
>So although you can create filters for specific types of files,
>there are also generic functions that will work on any type of file.
>
>Hope this helps.
>
>Phil
>
>
>askSam Systems
>Perry, FL (The Silicon Swamp)
>http://www.askSam.com/
>850-584-6590
>__________________________________________________________________
>askSam: Turn Information into Searchable
>Databases
>Try it at: http://www.askSam.com
>
>
>
>On Wed, 12 Feb 2003 10:46:05 -0800, Bonnie Britt wrote:
>>This is a characteristic of PDF files, and there's no obvious way
>>to "make it pretty" in askSam or any other software outside of
>>its native PDF format without wasting amazing amounts of time. If
>>it is a text-based PDF (and not a graphic) I import the PDF file,
>>warts and all, into askSam, then link to the original on the hard
>>drive as a backup. I don't bother to clean up the pdf import
>>since the text is there only to be searched. If and when it does
>>come up of interest in a search, then it makes sense to look at
>>the original for context, tables and images. If you wanted to
>>clean it up a bit to save space etc., you could search through
>>that imported document with ^p^p^p and replace that with ^p^p to
>>eliminate some excess blank lines. That takes only a second.
>>
>>Is anyone using the new Textpipe thingie, and if so, does it save
>>time when
>>cleanup is desirable? Is it useful only when importing documents
>>formatted
>>in the same manner? Or, does it require new sets of instructions
>>for new
>>kinds
>>of documents formatted every which way?
>>Bonnie Britt
>>
>>
>>From: "Frank Thomas" <[log in to unmask]>
>>
>>| But there is a nother reason that made me hesitate to import pdf-
>>files:
>>| You need to post-treat  the text quite a lot as there are
>>numerous empty
>>| lines, change of character size or style etc.  And of course the
>>| question of  tables and of images.
>>| Thanks
>>| /Frank
>>|


--
JulianFlor, [log in to unmask] on 13.02.2003

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

ATOM RSS1 RSS2



LISTSERV.VT.EDU

CataList Email List Search Powered by the LISTSERV Email List Manager