Page 1 of 1

Accented Characters

PostPosted: Wed Sep 26, 2007 2:11 am
by FORTUETA
Hello!!!

The people that use a language with accented characters has a problem with X1 since version 5.2.3

"Habitación" and "Habitacion" should be the same word for X1 (since version 5.2.3 are different words).

I can´t buy X1 for my company if it isn´t work fine!!!!

I think that a large part of the non-english users of X1 need this fix.

Please, repair it!!!!.
(I´m still using 5.2.3 version)

PostPosted: Wed Sep 26, 2007 6:43 am
by w0qj
If anyone else feels this feature (Unicode support searching ) is needed, please feel free to contribute to this discussion thread!

[This discussion moved to Feature Request Forum]

PostPosted: Wed Sep 26, 2007 11:47 am
by Tod
To clarify - X1 does actually support some Unicode indexing and querying. What the original poster is requesting is that X1 actually strip out accents when indexing so that Habitación and Habitacion are both returned as the result of a the same query.

Some users have reported this as a bug and insisted that X1 properly index accents. Please provide your response to this - should X1 index and therefore require you to query with accents?

PostPosted: Sun Oct 07, 2007 11:35 am
by gt13
This feature is not only needed, but it is mandatory for every people using accented characters.

The reason is that accented letters are almost written randomly in files. It is due:
- to the fact that old text (ASCII) files were not accented
- to the fact that accented characters are more difficult to type on the keyboard
- to the fact that uppercase letters are generally not accented
- to the fact that everybody understands the text even if it is not accented
- to the fact that very often people do not know the correct writing and the kind of accentuation
- taking "e" for example, in French, you can find 10 characters: e é è ê ë E É È Ê Ë
- same problem for "a", "i", "o", "u", c ç C Ç

It means that it is IMPOSSIBLE to find accentuated words if the search engine makes a difference between all these characters.

Let us take an example: suppose that you are searching for the word "fenêtrées".
There are 4 "e", and even if you suppose that the only possible accented ones are the second and the third, and that reasonably the person who wrote it could only have used (e é è ê), you have 4x4=16 searches to do!
And if you have a search with several accented words :evil:?

Presently, search engines do not make the difference between lowercase and uppercase characters. Why isn't it possible to define equivalences between several characters instead of 2? In the same way as (e E) are equivalent, it should be possible to decide that (e é è ê ë E É È Ê Ë) are equivalent in a search process.

Of course, for some purposes, it would also be useful to desactivate this feature, and make a distinction between all the characters.

This is the point of view of the user.
OK, it is perhaps not easy to implement, but it ought to be.
And as long as it is not, this software cannot be used in French, Spanish, German, northern Europe, and many other... languages.

This is the reason why many of us stay to X1 version 5.2.3 (Build 1852bz-bs) (Released Friday, August 26, 2005, that means more than 2 years ago!), which works fine from this point of view.
LATER EDIT (Feb. 24th, 2008): in fact, this assumption seems not exact: following posts in this thread show that some diacritics give problem.

And it is not the first time that this problem is pointed out: see for instance point ii) in http://forums.x1.com/viewtopic.php?p=5762

Gerard

PostPosted: Tue Oct 09, 2007 1:17 pm
by bluegecko
I work extensively with multilingual documents, and must say I agree with Gerard. The ideal solution would be to offer an option (check box) beside the search field to "ignore accents".

Two simple test cases:

1. Searching for the French word for tea, "thé", obviously I'd not want to ignore accents, otherwise I'd get a list of virtually every file on my hard disk...

2. On the other hand, transliterations (for instance from Arabic) will often have accents all over the place, with all manner of different schemes, some valid, some not, plus typos and each author's foibles coming into play. For example, Mauritania's main port town is Nouadhibou, also spelled Nouâdhibou, Nouâdhiboû, Nouádhibou, etc etc, plus the same using macrons instead of circumflexes. Needless to say, to find Nouadhibou, I'd need the option to ignore accents.

Thanks

How much people in the world has this problem?

PostPosted: Wed Jan 09, 2008 1:57 am
by FORTUETA
Hi!!!.
Thank you eveybody for you answers.

Gerard, you make a very good explanation of the problem.

I agee with bluegecko with the solution, a check box with "ignone fu* accents"

Does X1 someday answer to this problem? will fix it?

How much peolpe in the world has this problem? 30% of the world population? 1.800.000.000 people?

I have been waiting for years to have the problem fixed.

PostPosted: Mon Jan 28, 2008 9:46 am
by cwiekol
Hi there,
I'm new X1 user and ready to buy new 6.0 version, it's impressive in comparison with copernic desktop search but problem with diacritic letters makes your search tool completly unusable for Poles (not only Poles i suppose). In my language characters: e ę E Ę, oóOÓ, aąAĄ, sśSŚ, lłLŁ, zżźZŻŹ, cćCĆ, nńNŃ should means the same for search engine.
Please, let me know when can I expect you fixed this problem. Now I will try to find some older release, as my predecessors show, but i don't want pay for it....
regards
Michal

PostPosted: Sat Feb 02, 2008 7:01 am
by cwiekol
unfortunately older versions (i've got 5.2.18..) have this same bug.
i have to try micosoft desktop search (sic!)..
i willtry in the future your product to find out if you fixed this problem...
regards
Michal

PostPosted: Sat Feb 02, 2008 12:23 pm
by cwiekol
Image

This is what we really need, and what only, i think, Microsoft's WDS has...

PostPosted: Sun Feb 03, 2008 12:43 pm
by gt13
If you want to go back to a free version without this "bug", go on the page below, and follow the light green instructions (there are some lines in English in the middle of the French ones) to get Yahoo! Desktop Search version 1.2 Build 1852je, which is almost identical to X1 Version 5.2.3 (Build 1852bz-bs)(Released Friday, August 26, 2005).
That is to my knowledge one of the best free Desktop Search, and it has no problem with diacritics.
http://snipurl.com/2iuez

PostPosted: Sun Feb 03, 2008 12:52 pm
by gt13
I just propose something to test your Desktop Search softwares.
The result is here (Excel file): http://snipurl.com/7fugo
And you can get instructions and the files in order to do the same here (zip file): http://snipurl.com/7fumt

I will add the results of Copernic Desktop Search soon.

PostPosted: Sun Feb 03, 2008 7:17 pm
by gt13
I just updated the test with some more features and Copernic results

PostPosted: Mon Feb 04, 2008 4:25 am
by cwiekol
[gt13]

thanks for help, i've installed this but it doesn't work for me.. still words "POZWOLIŁEM" is different form "POZWOLILEM". I will have to use windows desktop search till X1 will fix this problem..
I think that many people don't even know that this problem exists (I didn't till last week)..

regards
Michal

PostPosted: Mon Feb 04, 2008 5:55 am
by gt13
@ cwiekol
It seems to prove that there is a difference between accented letters (used in French for instance) and some more complicated diacritics.
I will include "POZWOLIŁEM" in a next version of my tests ! (=> LATER EDIT: it is now done)
Sorry

PostPosted: Thu Feb 07, 2008 2:22 am
by cwiekol
it seems to me that only Windows desktop search "know" everything about diacritics..

don't use "pozwoliłem", use ŚRÓDŁĄCZĘ this same as SRODLACZE;)

regards
Michal

PostPosted: Fri Feb 22, 2008 7:59 am
by cwiekol
it's pitty that no one from X1 Team do anything about this (at least they don't inform about it).
Please, give me an information if you (x1 team) would do anything about this issue..
regards
Michal

PostPosted: Sun Feb 24, 2008 10:15 am
by w0qj
As far as I know, Unicode support (eg: accented characters) is not even on the roadmap of future development for X1.

It may be a looong time, if ever, that there'll be Unicode support...


Suggest if folks feel so strongly about Unicode support, suggest you start a petition thread in this same Feature Request Forum. Tks!

PostPosted: Sun Feb 24, 2008 2:03 pm
by gt13
@ pgk :
I fully agree. Ideally, the user could even modify this lookup table, and customize it to his/her needs.

@ w0qj :
And if there is a petition on this topic, I will subscribe!

Gerard

PostPosted: Sun Feb 24, 2008 2:56 pm
by Kenward
While coding this task might be easy, I wouldn't know, that is not the only thing that matters. If enabling this slows down X1 to such a state that it is unusable by everyone who does not need this feature, then I vote against it.

My guess is that were it a "no brainer" it would have happened long ago.

PostPosted: Mon Feb 25, 2008 10:54 am
by Tod
OK, first, X1 is looking into this and trying to figure out how best to re-activate this feature (since X1 used to ignore diacritics)

Second, it would not slow down indexing, but it would require a COMPLETE reindex when switching from indexing with diacritics to without diacritics and vice-versa. While some users might be OK with this, some other users could get very upset by it.

Third, the overall consideration is not the time it takes to code the feature, but the customer support questions - how do we expose the setting, what kind of warning boxes do you have to go through, and what possible nasty bugs might emerge.

We will definitely let you know if we make the change and will ask you to act as the first beta group.

PostPosted: Mon Feb 25, 2008 2:23 pm
by Kenward
Great. Now that we know that you are on the case, you may well be flooded out with volunteers to do some testing for you.

If I can make a, probably insulting and almost certainly unwelcome, observation, this sort of failure to recognise that the rest of the world does things differently, with its funny accents and stuff, hobbles too many IT businesses in the USA.

Microsoft went global years ago, which is why it is quicker to offer such features.

If X1 can offer a decent "diacritics" option, it could open up large and growing markets. Get ahead of the pack and "they will come".

PostPosted: Thu Mar 06, 2008 7:48 am
by cwiekol
thanks god someone from x1 responsed for this issue. Now i will check this forum more often to look up for this beta. I'm tired using microsoft desktop search (what a slow, limited, stupid tool)!!
regards
Michal

PostPosted: Mon Mar 31, 2008 7:20 am
by cwiekol
i've just bought x1 licence because i'm sick of using WDS and Copernic DS. I hope i will soon see new X1 beta release with "diacritics indexing option" !!
regards
Michal

Is TOD an X1 employed?

PostPosted: Tue Apr 08, 2008 4:10 am
by FORTUETA
Hello friends:

I haven´t seen this forum since the first days.

Is TOD an X1 employed? Is X1 really fixing this bug?

(i´m not and english used and is difficult to me to understand at alll users posts).

If X1 developes a beta version with this feature, I can test it too.
I have been waiting it for years and many spanish bussiness that i have shown X1 declined buying it fir this problem

Tod is an employed (now I know it)

PostPosted: Tue Apr 08, 2008 4:19 am
by FORTUETA
Tod is an employed (now I know it)

Can anybody of X1 team tell us about the developement of the accent characters fix?

TY

Any new witc accented characters?

PostPosted: Sun Jul 12, 2009 7:27 am
by FORTUETA
Hello friends,
Any new witc accented characters?

PostPosted: Sun Jul 12, 2009 8:07 am
by tjh
v6.5 and above has support of double byte characters. I assume (but don't know) if this means it supports accented characters. Give it a test.

PostPosted: Sun Jul 12, 2009 9:00 am
by gt13
@ tjh,
I don't think so.
And I am fed up with testing new releases that do no improve anything from this point of view. Each test means some hours of time waste, and very often it is even difficult to retrieve the old version working fine after the test!

But since you have it already installed, you can just test it easily,

either using this complete procedure:

gt13 wrote:I just propose something to test your Desktop Search softwares.
The result is here (Excel file): http://snipurl.com/7fugo
And you can get instructions and the files in order to do the same here (zip file): http://snipurl.com/7fumt


either a simplified one: just retrieve the ZIP file http://snipurl.com/7fumt , unpack, index, and search for the two words "unusualword eczé", and then for "unusualword ecze".

And tell us...

Thanks.
Gerard

PostPosted: Sun Jul 12, 2009 9:35 am
by tjh
Searching for "unusualword eczé" (with quotes around) returns Readme34.doc and Readme34.pdf

There's no need to even unpack, X1 searches in the zip file.

It seems to pass the tests that pack offers.

This is using Blackbird Beta II

PostPosted: Sun Jul 12, 2009 10:02 am
by gt13
Please unpack the zip, and don't use the quotes.
You will perhaps get a little bit more answers.

Note that people in this topic would like to get all the 38 files for each request:
unusualword eczé
unusualword ecze

That is why we say that these versions are unusable for people using accented letters!

Gerard

PostPosted: Sun Jul 12, 2009 10:18 am
by tjh
Ok, removing the quotes I get 30 results. i.e. it seems to find all instances of it.

I don't get results from file 04_indexation_test_DOS.txt, nor from the 14 XLS file, which is the one that tests comments.

But overall, it seems to work well. It returns more results than Windows Desktop Search 4 is returning. I'll install Copernic Professional later and give that a test as well if you like.

I can also make a VM available to you using VNC if you want to do some tests yourself.

Cheers,
Tim

PostPosted: Sun Jul 12, 2009 10:49 am
by gt13
Thanks for the test.

It seems that there are indeed some improvements since the last time I was testing X1.
If you also get the IPTC data embedded in 17_indexation_test_IPTC.jpg, it becomes all the more interesting!

Ok also for a VNC run. I tried to join you on MSN, but your MSN Id seems to be outdated.

Gerard

PostPosted: Sun Jul 12, 2009 11:10 am
by tjh
Sorry, nothing for test 17!

I'm running a few various things in my VM, let me get it cleaned up and accessible remotely and I'll PM you some details.

Also, I've just turned on my MSN, tim[at]muppetz dot com is correct.

PostPosted: Sun Jul 12, 2009 12:26 pm
by gt13
After some testing, there is no miracle.
Indeed, the last version of X1 finds files when one asks for the exact accented word, but looking for eczema will not find eczéma.

X1 is still unusable for people using accented languages.

PostPosted: Sun Jul 12, 2009 2:18 pm
by Kenward
What happens when you search for:

"eczema OR eczéma"

Seeking

"écosse OR ecosse"

works just fine here.

So, as I understand it, your beef is really that X1 cannot, of its own free will, interpret ȇ, ȅ, ȩ, è, é and/or ë as e.

I dive in here late, again, and I am not going to faff around running complicated tests. I just throw in this observation so that people who come across this discussion do not get the impression that x1 cannot handle accented characters. It can. But perhaps not in the way that everyone would like.

It does seem like a good idea to treat all the flavours of a character as a "basic" letter. But then what happens when someone wants to to find écosse but not ecosse?

PostPosted: Sun Jul 12, 2009 3:24 pm
by gt13
We (non English computer users) have been explaining the problem since 2005. You can go back to the first posts of this topic to understand the problem.

It is useless to try to explain us your way of thinking. This topic could be renamed (I know, it is too long):
"As long as X1 is not able to handle correctly the problem of accented letters explained many times, X1 can not be considered as a SEARCH engine for non-English people, and consequently it cannot penetrate business market".

For this reason, we do not understand why X1 does not make any effort to solve the problem.
Microsoft did the job (you can choose in Windows Search if you want to make a difference between "e" and "é"). Windows Search is also able to search IPTC data embedded in pictures, which is also a great feature. For end users like me, it still lacks (for some time?) some important features, like Thunderbird email indexing (should be solved with the next Thunderbird 3 release).

Gerard

At least accent option (v 6.5)

PostPosted: Sun Jul 12, 2009 11:24 pm
by FORTUETA
Hello I have installed a new beta.

There is an option "ignore accents".

I have made a test (2 different *.txt one with the word "habitación" the other with "habitación")

When I search habitación I have 2 results.
When I search habitacion I have 2 results too.

It seems to work.

I´ll make more test.

PostPosted: Mon Jul 13, 2009 1:12 am
by Kenward
You will find this feature in:

>>Options
>>Indexing

There is also an option:

"Insert word breaks around Asian language characters"

Here is the relevant section in the Help file:

From the indexing options on the right, under Indexing Options, click to select character options:

Ignore accents in characters: If you select this option, X1 will treat letters with accents the same as the letter without an accent. For example, é and e will be both be considered to be the accent-less letter e. If you leave this option unchecked, X1 will differentiate between a letter with and without an accent during searches.

Insert word breaks around Asian language characters: If you select this option, X1 will insert spaces around Asian-character words. If you are performing an exact-term search, X1 will be able to locate a word for you. If you do not select this option, X1 may not be able to tell where a word with Asian characters ends for exact-term searches.

If X1 has already created an index of your items, you will see a message alerting you that a new index will need to created. Click OK to allow X1 to clear the old index and create a new one incorporating your character settings.

I doubt if this will meet all the needs of those who want to conduct really subtle searches, but it might help those with lesser demands.

PostPosted: Tue Jul 14, 2009 12:19 am
by gt13
Thanks guys. Great news concerning accented letters !

I also noticed that the indexing of removable drives came back to X1, starting from X1® Professional Client Blackbird Beta II (Build 3840): http://forums.x1.com/viewtopic.php?t=4057

I will try the new version when I have some spare time.

Gerard

another disappointment

PostPosted: Mon Sep 14, 2009 3:56 am
by cwiekol
i've just installed 6.6.5 (Build 3904) with a hope that i could come back to X1.
and guess what.. it still doesn't recognize polish diacritics (ąśżźćęółń)..


you should be ashamed of yourself!

PostPosted: Mon Sep 14, 2009 4:53 am
by gt13
OK. Thanks for the feedback !

Re: Accented Characters

PostPosted: Sun Aug 29, 2010 10:00 am
by gt13
I just have made a test with the last 6.7.1 version.
Of course, I use the settings (before building the index): Tools > Options > Indexing > "Ignore accents in characters" enabled

Nothing new with polish diacritics: if there is POZWOLIŁEM in the file,
searching the exact word POZWOLIŁEM succeeds,
but searching the word POZWOLILEM (with L instead of Ł) fails.

For the detailed results, I have updated my test procedure with some new tests (OpenOffice and PowerPoint formats):
The result is here (Excel file): http://snipurl.com/7fugo
And you can get instructions and the files in order to do the same here (zip file): http://snipurl.com/7fumt

Gerard

Re: Accented Characters

PostPosted: Fri Nov 19, 2010 7:01 am
by cwiekol
hi Guys,

do we have any improvment on this issue?

I'm dreaming about this!!!!!

Re: Accented Characters

PostPosted: Tue Jun 18, 2013 11:15 pm
by cwiekol
hi there,
are there any changes in accented/diacritics characters in X1 8??

Re: Accented Characters

PostPosted: Thu Jun 20, 2013 10:44 am
by Greg Dawes
Yes! Have a look in the Menu > Options > Indexing tab. Then look in the Indexing Options section for:
- Ignore accents in characters

FYI - Like our older v6.7.x version, we've also added the option to:
- Insert word breaks around Asian language characters