Re: Your probably right but I think->
I wish I hadn't posted that. I still have to consider adding to the compressed data,
Let's just forget I posted at all, shall we?
Your probably right but I think->
I can post this with all the crap already on here...
>The i18n movement which started some years ago solves a lot, but not everything.
>With it, only output is guaranteed to match the best gettext will find. What about the input?
>Multibyte strings, produced by input parsers like kinput2 or ami in an 8bit or 7bit
> are hard to handle and crack easily (if you press the delete button, it removes only half a
> kinput2 and ami cannot run together in one terminal, because code pages intersect. Start
> sequences are one solution, but a bad one and one especially not meant for the long run.
> Imagine a document full of different languages; if I want a function that gives a line length
> this doc, it will be the hell,
I say that this is too bad. But if you do make one, make one that I can easily store like this.
Byte order be damned - eveyone has to do the work.
>and I haven't even mentioned what will happen when new languages
> with new start and end sequences are implemented.
I don't see what will happen.
>Also, we have so many applications which handle text and formatting.
> Integration of multiple language parsers into them may take 5 times more than implementing
> the problem-specific algorithms. I think something like Microsoft's IME, a central
> solution, is needed here. Unfortunately, IME is not Open Source, and is therefore
Are we supposed to look at what is out there?
What does IME do?
I don't have the time to check this out, but:
Lets look at it like this::
I want to keep as much of my text file in memory at once as I can. I am writing a spell
and it can't take up my whole machine. I want to be able to look up Western words and
They are in my database and I want that in memory as much as I can fit.
: I want to store as Multi Byte Characters (like MBCS only 64 bit).
: most of my Data can be compressed because I am a Pentium 4 on a programmers desk
CPU cycles to burn.
: I need to be able to traverse this with a pointer.
: I am going to traverse it only once anyway, no need to expand it.
: My data is in records (equiv. to LineFeed, '\0' at end of string, etc.)
C++ will do the job, but I have to use functions on my pointer class.
I want to write this code but once, and be able to read this stuff forwards and
backwards in any \
Sounds like CORBA implementation of an interface called the characterPtr is in order that
everyone will use.
Things I need:
I use things like / and \ and + and ~ that everyone with a computer likes to use.
These would be nice to fit in with 1 byte characters so I can say in my compression that:
the next 116 characters are going to be single byte characters.
These would also be nice to fit in with 2 byte data of the next set of 115 charactrer.
My program want to recognize this character, so I will call it:
1111 1111 1111 11XX (substitute the ascii code for '/' for XX) (8 bytes)
it can be stored as :
And, I already know how many bytes long it is, and the Motorolla or Intel storage format (my
'compression' told me this).
When I look at it using my characterPtr class, it always looks like 1111 1111 1111 11XX.
It is always kept stored compressed. I have to traverse it.
Problem : I don't use CORBA, or I can't because I program in bash.
Solution : write the equivalent of the characterPtr in your language to access te whole
of the newly defined character set that is always, wherever it is stored or transmitted,
using the same compression scheme:
If I have data that is 100 strings each an average length of 16 characters and all
characters are the 7 bit,
each string takes up about 20 bytes is encoded The extra 4 characters tell me length of
the string and that
it is all single byte and stored in intel format (don't matter for single byte, but I know
Everyone in the world knows how to read those extra 4 characters and so to them (us) this
data looks like
sets of 64 bit characters.
Solution 2 : Write a String class that uses the base of this pointer so that it can traverse
the compression backwardly.
The computer can no longer define the language, we must do that part.