[Tshwanelex-l] importing extra PCDATA not in original xml import file
Sargon Hasso
dshasso at gmail.com
Wed Nov 7 20:08:31 EST 2012
Yep, that did it. I noticed that all white spaces (with the exception of line feeds) were space characters. I used xmlpad to construct the sample XML file. In notepad++ I was able to see all the space characters and I converted those to tabs, since I noticed when I export from tlex tabs are used for indentations, I was able to import cleanly.
Sargon
On Nov 7, 2012, at 11:27 AM, "David Joffe" <david.joffe at tshwanedje.com> wrote:
> Hi Sargon,
>
> Hmm, it seems as though the XML importer is picking up the
> 'whitespace' - i.e. the 'indentation' characters - as content, e.g.
> if you have:
>
> <Sense>
> <Definition>
>
> then it is incorrectly picking up the spacing character(s) in front
> of the <Definition> as content, instead of ignoring them. I'm not
> sure why, it shouldn't be doing this (we'll have a look at why it's
> happening), but a temporary workaround, if possible, is to modify
> the XML to be imported to remove the extra spacing, e.g. have it all
> on one line per entry:
>
> <Sense><Definition>...
>
> - David
>
>
> On 4 Nov 2012 at 11:30, Sargon Hasso wrote:
>
> From: "Sargon Hasso" <dshasso at gmail.com>
> To: <tshwanelex-l at mailman.yale.edu>
> Date sent: Sun, 4 Nov 2012 11:30:29 -0600
> Subject: Re: [Tshwanelex-l] importing extra PCDATA not in original xml
> import file
>
>> I must have missed attaching the xml file.
>> From: Sargon Hasso [mailto:dshasso at gmail.com]
>> Sent: Sunday, November 04, 2012 11:13 AM
>> To: 'tshwanelex-l at mailman.yale.edu'
>> Subject: importing extra PCDATA not in original xml import file
>> I am importing lemma entries from an xml file and I followed instructions in the Tlex manual;
>> however, I am seeing an extra blank entry after each sense marked up as PCDATA and manifests
>> itself as a blank entry, e.g. ' '.
>>
>>
>> I am enclosing my xml file for reference. How do I get rid of this extra entry?
>> This xml file is just experimental and I am planning to import more than 6000 entries so it is not
>> few entries that I could manually clean up.
>> This is an Enlglish-Syriac-Arabic dictionary. Syriac, like Arabic, is RTL script.
>> Regards,
>> Sargon
>
>
More information about the Tshwanelex-l
mailing list