[Tshwanelex-l] importing extra PCDATA not in original xml import file

David Joffe david.joffe at tshwanedje.com
Wed Nov 7 12:27:20 EST 2012


Hi Sargon,

Hmm, it seems as though the XML importer is picking up the 
'whitespace' - i.e. the 'indentation' characters - as content, e.g. 
if you have:

<Sense>
    <Definition>

then it is incorrectly picking up the spacing character(s) in front 
of the <Definition> as content, instead of ignoring them. I'm not 
sure why, it shouldn't be doing this (we'll have a look at why it's 
happening), but a temporary workaround, if possible, is to modify 
the XML to be imported to remove the extra spacing, e.g. have it all 
on one line per entry:

<Sense><Definition>...

 - David


On 4 Nov 2012 at 11:30, Sargon Hasso wrote:

From:	"Sargon Hasso" <dshasso at gmail.com>
To:	<tshwanelex-l at mailman.yale.edu>
Date sent:	Sun, 4 Nov 2012 11:30:29 -0600
Subject:	Re: [Tshwanelex-l] importing extra PCDATA not in original xml
	import file

>     I must have missed attaching the xml file.
>     From: Sargon Hasso [mailto:dshasso at gmail.com]
>     Sent: Sunday, November 04, 2012 11:13 AM
>     To: 'tshwanelex-l at mailman.yale.edu'
>     Subject: importing extra PCDATA not in original xml import file
>     I am importing lemma entries from an xml file and I followed instructions in the Tlex manual; 
>     however, I am seeing an extra blank entry after each sense marked up as PCDATA and manifests 
>     itself as a blank entry, e.g. ' '.
> 
> 
>     I am enclosing my xml file for reference. How do I get rid of this extra entry?
>     This xml file is just experimental and I am planning to import more than 6000 entries so it is not 
>     few entries that I could manually clean up.
>     This is an Enlglish-Syriac-Arabic dictionary. Syriac, like Arabic, is RTL script.
>     Regards,
>     Sargon




More information about the Tshwanelex-l mailing list