Discussion:
[Development] Hardcoded strings and qstricmp comparison
Andy Shaw
2018-11-14 08:34:50 UTC
Permalink
Hi!

In connection to a support case I have been working on it was brought to my attention that there may be some problems that are connected to using qstricmp and other functions that are expecting latin1 strings for one reason or another. The reason that this might be a problem is because we are encoding our source code as UTF-8 and there is a theoretical problem that due to this that we are not protecting the strings correctly to ensure that they are treated as latin1 when we explicitly write them internally. It could be that reality is that this will never be a problem, and if that is the case then please give me the background on that so I can pass this on too.

For user code I get that we can just say that they should do something like:

qstricmp(str, QLatin1String("a").latin1());

and that would be ensuring it is correctly seen as a latin1 encoded string. If this is how it should be done, then shouldn’t we change our usage of it in the Qt code as well to do the same thing? Or am I missing something?

Kind Regards,
Andy
Edward Welbourne
2018-11-14 10:03:22 UTC
Permalink
... there may be some problems that are connected to using qstricmp
and other functions that are expecting latin1 strings for one reason
or another. The reason that this might be a problem is because we are
encoding our source code as UTF-8 and there is a theoretical problem
that due to this that we are not protecting the strings correctly to
ensure that they are treated as latin1 when we explicitly write them
internally. It could be that reality is that this will never be a
problem, and if that is the case then please give me the background on
that so I can pass this on too.
qstricmp(str, QLatin1String("a").latin1());
and that would be ensuring it is correctly seen as a latin1 encoded
string. If this is how it should be done, then shouldn’t we change our
usage of it in the Qt code as well to do the same thing? Or am I
missing something?
Well, as long as the source string is actually printable ASCII, there
should be no problem, as UTF-8 and Latin-1 agree on those.

If the source string contains bytes > 127, such bytes should be encoded
using suitable escapes; if the string is to be read as Latin-1 and the
source file is encoded in UTF-8, the raw form of the bytes would not be
displayed as the character it would actually be read as.

So I'm not quite sure what problem you're referring to, but if a string
in the source is meant to be Latin-1, it shouldn't be entered in literal
form in the source code, it should be entered using escapes,

QLatin1Char yUmlaut('\xff');

for example.

Eddy.
Andy Shaw
2018-11-14 10:13:51 UTC
Permalink
... there may be some problems that are connected to using qstricmp
and other functions that are expecting latin1 strings for one reason
or another. The reason that this might be a problem is because we are
encoding our source code as UTF-8 and there is a theoretical problem
that due to this that we are not protecting the strings correctly to
ensure that they are treated as latin1 when we explicitly write them
internally. It could be that reality is that this will never be a
problem, and if that is the case then please give me the background on
that so I can pass this on too.
qstricmp(str, QLatin1String("a").latin1());
and that would be ensuring it is correctly seen as a latin1 encoded
string. If this is how it should be done, then shouldn’t we change our
usage of it in the Qt code as well to do the same thing? Or am I
missing something?
Well, as long as the source string is actually printable ASCII, there
should be no problem, as UTF-8 and Latin-1 agree on those.

If the source string contains bytes > 127, such bytes should be encoded
using suitable escapes; if the string is to be read as Latin-1 and the
source file is encoded in UTF-8, the raw form of the bytes would not be
displayed as the character it would actually be read as.

So I'm not quite sure what problem you're referring to, but if a string
in the source is meant to be Latin-1, it shouldn't be entered in literal
form in the source code, it should be entered using escapes,

QLatin1Char yUmlaut('\xff');

for example.


I am speaking purely theoretically, there may not be a problem, I don't know enough to be sure that’s why I am asking ( Though I think this clarifies enough for me, as long as we are using characters <= 127 in ASCII then there is no problem on our side. And the user has to just be aware of it themselves when they are using these functions to remember that their source code will be UTF-8 encoded and not Latin1 and thus should escape characters > 127.

Andy
Thiago Macieira
2018-11-14 16:29:25 UTC
Permalink
Post by Andy Shaw
qstricmp(str, QLatin1String("a").latin1());
What we should do is actually have QLatin1String overloads of the functions in
questions. We already have quite a few in qstringalgorithms.h, including this
pearl of wisdom:

Q_REQUIRED_RESULT bool isLatin1(QLatin1String s) Q_DECL_NOTHROW;
// in qstring.h

The thing is, those older functions like qstricmp are meant to be used with
QByteArray, which is documented to operate on Latin1 (see isUpper and
toUpper).
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel Open Source Technology Center
Loading...