The byte 0x0A means 0x00 *only* when it is found in nano's internal
representation of a file's data, not when it occurs in a file name.
This fixes the second part of https://savannah.gnu.org/bugs/?49867.
That is: elide a second test from the most travelled path: a valid
character. This adds a second call of mblen() when parse_mbchar()
is called on a terminating zero, but that should never happen.
It is quicker to do a handful of superfluous compares at the end of
each line than it is to compute and keep track of and compare the
remaining line length the whole time.
The typical line is some sixty characters long, the typical search
string ten characters -- with a shorter search string the speedup is
even higher: some fifteen percent. Only when the string is longer
than half the average line length does searching become slower with
this new method.
All this for a UTF-8 locale. For a C locale it makes no difference.
Now that mbstrncasecmp() does the right thing, there is no need any
more to verify that only a valid multibyte sequence was matched.
(See https://savannah.gnu.org/bugs/?45579 for a test case.)
Also, this will make it possible to search for invalid sequences.
(Currently it isn't possible to enter a search string with invalid
characters, but... a user might edit the search history file. And
if pasting at the prompt is implemented, it will be trivial to enter
invalid sequences if you have a file that contains them.)
Persisting might lead to count 'n' reaching zero, which would mean that
the needle has matched, which is wrong when one of the strings contains
an invalid or incomplete multibyte sequence.
That is: don't run towlower() on the two differing bytes when having
reached the end of one of the strings.
This fixes https://savannah.gnu.org/bugs/?48700.
In the bargain, don't do the conversion to lowercase twice.
Furthermore, persist when encountering invalid byte sequences --
until finding bytes that differ.
The needle is never part of the hay -- it is always a separate string.
(And even if needle and haystack were identical, the routine works fine,
the case does not need special treatment.)
This allows the user to specify which other characters, besides the
default alphanumeric ones, should be considered as part of a word, so
that word operations like Ctrl+Left and Ctrl+Right will pass them by.
Using this option overrides the option --wordbounds.
This fulfills https://savannah.gnu.org/bugs/?47283.
Invalid multibyte sequences get depicted with the Replacement Character,
and unassigned codepoints are shown as if they were a space. Both have
a width of one.
When running in a non-UTF locale, and when strncasecmp() suffers from
the same defect as strncmp(), make sure not to pass a length with the
high bit set.
Instead of parsing every multibyte character twice, first with
parse_mbchar() and then with mbtowc(), just let mbtowc() do all
the work. This makes searching for a fixed string twice as fast.
This also gets rid of four variables and lots of memory allocations.
(And, more importantly: it stops nano messing up the internal state
of the multibyte-to-wide character conversion, and thus would make
the calls to mbtowc_reset() superfluous.)
(The measurable effect (during long searches, for example) is zero, though.)
git-svn-id: svn://svn.savannah.gnu.org/nano/trunk/nano@5773 35c25a1d-7b9e-4130-9fde-d3aeb78583b8
Apparently there is /some/ state somewhere after all. Don't have time
now to figure out where exactly.
git-svn-id: svn://svn.savannah.gnu.org/nano/trunk/nano@5369 35c25a1d-7b9e-4130-9fde-d3aeb78583b8
but only as far back as such a character can possibly be.
Speedup suggested by Mark Majeres.
git-svn-id: svn://svn.savannah.gnu.org/nano/trunk/nano@5147 35c25a1d-7b9e-4130-9fde-d3aeb78583b8