[Tshwanelex-l] importing extra PCDATA not in original xml import file
David Joffe
david.joffe at tshwanedje.com
Wed Nov 7 12:27:20 EST 2012
Hi Sargon,
Hmm, it seems as though the XML importer is picking up the
'whitespace' - i.e. the 'indentation' characters - as content, e.g.
if you have:
<Sense>
<Definition>
then it is incorrectly picking up the spacing character(s) in front
of the <Definition> as content, instead of ignoring them. I'm not
sure why, it shouldn't be doing this (we'll have a look at why it's
happening), but a temporary workaround, if possible, is to modify
the XML to be imported to remove the extra spacing, e.g. have it all
on one line per entry:
<Sense><Definition>...
- David
On 4 Nov 2012 at 11:30, Sargon Hasso wrote:
From: "Sargon Hasso" <dshasso at gmail.com>
To: <tshwanelex-l at mailman.yale.edu>
Date sent: Sun, 4 Nov 2012 11:30:29 -0600
Subject: Re: [Tshwanelex-l] importing extra PCDATA not in original xml
import file
> I must have missed attaching the xml file.
> From: Sargon Hasso [mailto:dshasso at gmail.com]
> Sent: Sunday, November 04, 2012 11:13 AM
> To: 'tshwanelex-l at mailman.yale.edu'
> Subject: importing extra PCDATA not in original xml import file
> I am importing lemma entries from an xml file and I followed instructions in the Tlex manual;
> however, I am seeing an extra blank entry after each sense marked up as PCDATA and manifests
> itself as a blank entry, e.g. ' '.
>
>
> I am enclosing my xml file for reference. How do I get rid of this extra entry?
> This xml file is just experimental and I am planning to import more than 6000 entries so it is not
> few entries that I could manually clean up.
> This is an Enlglish-Syriac-Arabic dictionary. Syriac, like Arabic, is RTL script.
> Regards,
> Sargon
More information about the Tshwanelex-l
mailing list