Sorting the characters in a utf-16 string in java Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!What is the difference between String and string in C#?Is Java “pass-by-reference” or “pass-by-value”?How do I read / convert an InputStream into a String in Java?How do I sort a dictionary by value?Sort array of objects by string property valueHow to replace all occurrences of a string in JavaScriptHow to check whether a string contains a substring in JavaScript?How do I convert a String to an int in Java?Why is char[] preferred over String for passwords?Why is it faster to process a sorted array than an unsorted array?

Can gravitational waves pass through a black hole?

Are Flameskulls resistant to magical piercing damage?

Weaponising the Grasp-at-a-Distance spell

Does traveling In The United States require a passport or can I use my green card if not a US citizen?

Kepler's 3rd law: ratios don't fit data

Compiling and throwing simple dynamic exceptions at runtime for JVM

Will the Antimagic Field spell cause elementals not summoned by magic to dissipate?

If gravity precedes the formation of a solar system, where did the mass come from that caused the gravity?

Im stuck and having trouble with ¬P ∨ Q Prove: P → Q

Can a Knight grant Knighthood to another?

Knights and Knaves question

What's the difference between using dependency injection with a container and using a service locator?

"Destructive force" carried by a B-52?

Help Recreating a Table

Can I ask an author to send me his ebook?

Should man-made satellites feature an intelligent inverted "cow catcher"?

Is "ein Herz wie das meine" an antiquated or colloquial use of the possesive pronoun?

Sorting the characters in a utf-16 string in java

How to keep bees out of canned beverages?

How to create a command for the "strange m" symbol in latex?

Can I take recommendation from someone I met at a conference?

How to leave only the following strings?

Etymology of 見舞い

Lights are flickering on and off after accidentally bumping into light switch



Sorting the characters in a utf-16 string in java



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
Data science time! April 2019 and salary with experience
The Ask Question Wizard is Live!What is the difference between String and string in C#?Is Java “pass-by-reference” or “pass-by-value”?How do I read / convert an InputStream into a String in Java?How do I sort a dictionary by value?Sort array of objects by string property valueHow to replace all occurrences of a string in JavaScriptHow to check whether a string contains a substring in JavaScript?How do I convert a String to an int in Java?Why is char[] preferred over String for passwords?Why is it faster to process a sorted array than an unsorted array?



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








10















tl;dr



Java uses 2 chars to represent UTF-16. Using Arrays.sort (unstable sort), messes with char sequencing. Should I convert char[] to int[] or is there a better way?



Details



Java represents Character as UTF-16. But Character class itself wraps char(16 bit). For UTF-16, it will be array of 2 char(32 bit).



Sorting String of UTF-16 chars using inbuilt sort messes with data.
(Arrays.sort uses Dual Pivot Quick sort and Collections.sort uses Arrays.sort to do heavy lifting.)



To be specific, do you convert char[] to int[] or is there a better way to sort?



import java.util.Arrays;

public class Main
public static void main(String[] args)
int[] utfCodes = 128513, 128531, 128557;
String emojis = new String(utfCodes, 0, 3);
System.out.println("Initial String: " + emojis);

char[] chars = emojis.toCharArray();
Arrays.sort(chars);
System.out.println("Sorted String: " + new String(chars));




Output:



Initial String: 😁😓😭
Sorted String: ??😁??









share|improve this question









New contributor




dingy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • This is what we call a "Collation". You should use a library for this because there are many collations to choose from.

    – Guillaume F.
    3 hours ago

















10















tl;dr



Java uses 2 chars to represent UTF-16. Using Arrays.sort (unstable sort), messes with char sequencing. Should I convert char[] to int[] or is there a better way?



Details



Java represents Character as UTF-16. But Character class itself wraps char(16 bit). For UTF-16, it will be array of 2 char(32 bit).



Sorting String of UTF-16 chars using inbuilt sort messes with data.
(Arrays.sort uses Dual Pivot Quick sort and Collections.sort uses Arrays.sort to do heavy lifting.)



To be specific, do you convert char[] to int[] or is there a better way to sort?



import java.util.Arrays;

public class Main
public static void main(String[] args)
int[] utfCodes = 128513, 128531, 128557;
String emojis = new String(utfCodes, 0, 3);
System.out.println("Initial String: " + emojis);

char[] chars = emojis.toCharArray();
Arrays.sort(chars);
System.out.println("Sorted String: " + new String(chars));




Output:



Initial String: 😁😓😭
Sorted String: ??😁??









share|improve this question









New contributor




dingy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • This is what we call a "Collation". You should use a library for this because there are many collations to choose from.

    – Guillaume F.
    3 hours ago













10












10








10


1






tl;dr



Java uses 2 chars to represent UTF-16. Using Arrays.sort (unstable sort), messes with char sequencing. Should I convert char[] to int[] or is there a better way?



Details



Java represents Character as UTF-16. But Character class itself wraps char(16 bit). For UTF-16, it will be array of 2 char(32 bit).



Sorting String of UTF-16 chars using inbuilt sort messes with data.
(Arrays.sort uses Dual Pivot Quick sort and Collections.sort uses Arrays.sort to do heavy lifting.)



To be specific, do you convert char[] to int[] or is there a better way to sort?



import java.util.Arrays;

public class Main
public static void main(String[] args)
int[] utfCodes = 128513, 128531, 128557;
String emojis = new String(utfCodes, 0, 3);
System.out.println("Initial String: " + emojis);

char[] chars = emojis.toCharArray();
Arrays.sort(chars);
System.out.println("Sorted String: " + new String(chars));




Output:



Initial String: 😁😓😭
Sorted String: ??😁??









share|improve this question









New contributor




dingy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












tl;dr



Java uses 2 chars to represent UTF-16. Using Arrays.sort (unstable sort), messes with char sequencing. Should I convert char[] to int[] or is there a better way?



Details



Java represents Character as UTF-16. But Character class itself wraps char(16 bit). For UTF-16, it will be array of 2 char(32 bit).



Sorting String of UTF-16 chars using inbuilt sort messes with data.
(Arrays.sort uses Dual Pivot Quick sort and Collections.sort uses Arrays.sort to do heavy lifting.)



To be specific, do you convert char[] to int[] or is there a better way to sort?



import java.util.Arrays;

public class Main
public static void main(String[] args)
int[] utfCodes = 128513, 128531, 128557;
String emojis = new String(utfCodes, 0, 3);
System.out.println("Initial String: " + emojis);

char[] chars = emojis.toCharArray();
Arrays.sort(chars);
System.out.println("Sorted String: " + new String(chars));




Output:



Initial String: 😁😓😭
Sorted String: ??😁??






java string sorting utf-16






share|improve this question









New contributor




dingy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




dingy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 3 hours ago









jtahlborn

47.6k56198




47.6k56198






New contributor




dingy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 4 hours ago









dingydingy

536




536




New contributor




dingy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





dingy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






dingy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • This is what we call a "Collation". You should use a library for this because there are many collations to choose from.

    – Guillaume F.
    3 hours ago

















  • This is what we call a "Collation". You should use a library for this because there are many collations to choose from.

    – Guillaume F.
    3 hours ago
















This is what we call a "Collation". You should use a library for this because there are many collations to choose from.

– Guillaume F.
3 hours ago





This is what we call a "Collation". You should use a library for this because there are many collations to choose from.

– Guillaume F.
3 hours ago












3 Answers
3






active

oldest

votes


















4














I looked around for a bit and couldn't find any clean ways to sort an array by groupings of two elements without the use of a library.



Luckily, the codePoints of the String are what you used to create the String itself in this example, so you can simply sort those and create a new String with the result.



public static void main(String[] args) 
int[] utfCodes = 128531, 128557, 128513;
String emojis = new String(utfCodes, 0, 3);
System.out.println("Initial String: " + emojis);

int[] codePoints = emojis.codePoints().sorted().toArray();
System.out.println("Sorted String: " + new String(codePoints, 0, 3));




Initial String: 😓😭😁



Sorted String: 😁😓😭




I switched the order of the characters in your example because they were already sorted.






share|improve this answer

























  • Haha.. my string was already sorted... I couldn't tell because I couldn't sort (pun intended). I should move to java8 =)

    – dingy
    57 mins ago


















4














If you are using Java 8 or later, then this is a simple way to sort the characters in a string while respecting (not breaking) multi-char codepoints:



int[] codepoints = someString.codePoints().sort().toArray();
String sorted = new String(codepoints, 0, codepoints.length);


Prior to Java 8, I think you either need to use a loop to iterate the code points in the original string, or use a 3rd-party library method.




Fortunately, sorting the codepoints in a String is uncommon enough that the clunkyness and inefficiency of the solutions above are rarely a concern.



(When was the last time you tested for anagrams of emojis?)






share|improve this answer

























  • Thanks for reply. I was looking at Java 7's documentation, I should move to java 8. BTW, I am from China and making an app where I need to sort strings in Mandarin, just kidding, but it's a valid usecase. I stumbled upon it while I was trying to understand how Java works with UTF-16. Since other answers are same, I'll select the one which came earliest. Thanks again!

    – dingy
    1 hour ago











  • I didn't say invalid. I said uncommon. (And the fact that you had to make up a use-case only reinforces my point ... :-) )

    – Stephen C
    58 mins ago












  • Hmm.. sorry.. my bad =)

    – dingy
    55 mins ago











  • See also: chinese.stackexchange.com/questions/24053/chinese-anagrams. (First answer: "Why do you need that? We never use that in China.")

    – Stephen C
    27 mins ago



















3














We can't use char for Unicode, because Java's Unicode char handling is broken.



In the early days of Java, Unicode code points were always 16-bits (fixed size at exactly one char). However, the Unicode specification changed to allow supplemental characters. That meant Unicode characters are now variable widths, and can be longer than one char. Unfortunately, it was too late to change Java's char implementation without breaking a ton of production code.



So the best way to manipulate Unicode characters is by using code points directly, e.g., using String.codePointAt(index) or the String.codePoints() stream on JDK 1.8 and above.






share|improve this answer








New contributor




peekay is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • Thanks for reply, I completely missed the String::codePointAt api, also I think I should move to java 8. Since other answers are same, I'll select the one which came earliest.

    – dingy
    1 hour ago











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);






dingy is a new contributor. Be nice, and check out our Code of Conduct.









draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55803293%2fsorting-the-characters-in-a-utf-16-string-in-java%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























3 Answers
3






active

oldest

votes








3 Answers
3






active

oldest

votes









active

oldest

votes






active

oldest

votes









4














I looked around for a bit and couldn't find any clean ways to sort an array by groupings of two elements without the use of a library.



Luckily, the codePoints of the String are what you used to create the String itself in this example, so you can simply sort those and create a new String with the result.



public static void main(String[] args) 
int[] utfCodes = 128531, 128557, 128513;
String emojis = new String(utfCodes, 0, 3);
System.out.println("Initial String: " + emojis);

int[] codePoints = emojis.codePoints().sorted().toArray();
System.out.println("Sorted String: " + new String(codePoints, 0, 3));




Initial String: 😓😭😁



Sorted String: 😁😓😭




I switched the order of the characters in your example because they were already sorted.






share|improve this answer

























  • Haha.. my string was already sorted... I couldn't tell because I couldn't sort (pun intended). I should move to java8 =)

    – dingy
    57 mins ago















4














I looked around for a bit and couldn't find any clean ways to sort an array by groupings of two elements without the use of a library.



Luckily, the codePoints of the String are what you used to create the String itself in this example, so you can simply sort those and create a new String with the result.



public static void main(String[] args) 
int[] utfCodes = 128531, 128557, 128513;
String emojis = new String(utfCodes, 0, 3);
System.out.println("Initial String: " + emojis);

int[] codePoints = emojis.codePoints().sorted().toArray();
System.out.println("Sorted String: " + new String(codePoints, 0, 3));




Initial String: 😓😭😁



Sorted String: 😁😓😭




I switched the order of the characters in your example because they were already sorted.






share|improve this answer

























  • Haha.. my string was already sorted... I couldn't tell because I couldn't sort (pun intended). I should move to java8 =)

    – dingy
    57 mins ago













4












4








4







I looked around for a bit and couldn't find any clean ways to sort an array by groupings of two elements without the use of a library.



Luckily, the codePoints of the String are what you used to create the String itself in this example, so you can simply sort those and create a new String with the result.



public static void main(String[] args) 
int[] utfCodes = 128531, 128557, 128513;
String emojis = new String(utfCodes, 0, 3);
System.out.println("Initial String: " + emojis);

int[] codePoints = emojis.codePoints().sorted().toArray();
System.out.println("Sorted String: " + new String(codePoints, 0, 3));




Initial String: 😓😭😁



Sorted String: 😁😓😭




I switched the order of the characters in your example because they were already sorted.






share|improve this answer















I looked around for a bit and couldn't find any clean ways to sort an array by groupings of two elements without the use of a library.



Luckily, the codePoints of the String are what you used to create the String itself in this example, so you can simply sort those and create a new String with the result.



public static void main(String[] args) 
int[] utfCodes = 128531, 128557, 128513;
String emojis = new String(utfCodes, 0, 3);
System.out.println("Initial String: " + emojis);

int[] codePoints = emojis.codePoints().sorted().toArray();
System.out.println("Sorted String: " + new String(codePoints, 0, 3));




Initial String: 😓😭😁



Sorted String: 😁😓😭




I switched the order of the characters in your example because they were already sorted.







share|improve this answer














share|improve this answer



share|improve this answer








edited 3 hours ago

























answered 3 hours ago









Jacob G.Jacob G.

17k52466




17k52466












  • Haha.. my string was already sorted... I couldn't tell because I couldn't sort (pun intended). I should move to java8 =)

    – dingy
    57 mins ago

















  • Haha.. my string was already sorted... I couldn't tell because I couldn't sort (pun intended). I should move to java8 =)

    – dingy
    57 mins ago
















Haha.. my string was already sorted... I couldn't tell because I couldn't sort (pun intended). I should move to java8 =)

– dingy
57 mins ago





Haha.. my string was already sorted... I couldn't tell because I couldn't sort (pun intended). I should move to java8 =)

– dingy
57 mins ago













4














If you are using Java 8 or later, then this is a simple way to sort the characters in a string while respecting (not breaking) multi-char codepoints:



int[] codepoints = someString.codePoints().sort().toArray();
String sorted = new String(codepoints, 0, codepoints.length);


Prior to Java 8, I think you either need to use a loop to iterate the code points in the original string, or use a 3rd-party library method.




Fortunately, sorting the codepoints in a String is uncommon enough that the clunkyness and inefficiency of the solutions above are rarely a concern.



(When was the last time you tested for anagrams of emojis?)






share|improve this answer

























  • Thanks for reply. I was looking at Java 7's documentation, I should move to java 8. BTW, I am from China and making an app where I need to sort strings in Mandarin, just kidding, but it's a valid usecase. I stumbled upon it while I was trying to understand how Java works with UTF-16. Since other answers are same, I'll select the one which came earliest. Thanks again!

    – dingy
    1 hour ago











  • I didn't say invalid. I said uncommon. (And the fact that you had to make up a use-case only reinforces my point ... :-) )

    – Stephen C
    58 mins ago












  • Hmm.. sorry.. my bad =)

    – dingy
    55 mins ago











  • See also: chinese.stackexchange.com/questions/24053/chinese-anagrams. (First answer: "Why do you need that? We never use that in China.")

    – Stephen C
    27 mins ago
















4














If you are using Java 8 or later, then this is a simple way to sort the characters in a string while respecting (not breaking) multi-char codepoints:



int[] codepoints = someString.codePoints().sort().toArray();
String sorted = new String(codepoints, 0, codepoints.length);


Prior to Java 8, I think you either need to use a loop to iterate the code points in the original string, or use a 3rd-party library method.




Fortunately, sorting the codepoints in a String is uncommon enough that the clunkyness and inefficiency of the solutions above are rarely a concern.



(When was the last time you tested for anagrams of emojis?)






share|improve this answer

























  • Thanks for reply. I was looking at Java 7's documentation, I should move to java 8. BTW, I am from China and making an app where I need to sort strings in Mandarin, just kidding, but it's a valid usecase. I stumbled upon it while I was trying to understand how Java works with UTF-16. Since other answers are same, I'll select the one which came earliest. Thanks again!

    – dingy
    1 hour ago











  • I didn't say invalid. I said uncommon. (And the fact that you had to make up a use-case only reinforces my point ... :-) )

    – Stephen C
    58 mins ago












  • Hmm.. sorry.. my bad =)

    – dingy
    55 mins ago











  • See also: chinese.stackexchange.com/questions/24053/chinese-anagrams. (First answer: "Why do you need that? We never use that in China.")

    – Stephen C
    27 mins ago














4












4








4







If you are using Java 8 or later, then this is a simple way to sort the characters in a string while respecting (not breaking) multi-char codepoints:



int[] codepoints = someString.codePoints().sort().toArray();
String sorted = new String(codepoints, 0, codepoints.length);


Prior to Java 8, I think you either need to use a loop to iterate the code points in the original string, or use a 3rd-party library method.




Fortunately, sorting the codepoints in a String is uncommon enough that the clunkyness and inefficiency of the solutions above are rarely a concern.



(When was the last time you tested for anagrams of emojis?)






share|improve this answer















If you are using Java 8 or later, then this is a simple way to sort the characters in a string while respecting (not breaking) multi-char codepoints:



int[] codepoints = someString.codePoints().sort().toArray();
String sorted = new String(codepoints, 0, codepoints.length);


Prior to Java 8, I think you either need to use a loop to iterate the code points in the original string, or use a 3rd-party library method.




Fortunately, sorting the codepoints in a String is uncommon enough that the clunkyness and inefficiency of the solutions above are rarely a concern.



(When was the last time you tested for anagrams of emojis?)







share|improve this answer














share|improve this answer



share|improve this answer








edited 1 hour ago

























answered 2 hours ago









Stephen CStephen C

528k72590946




528k72590946












  • Thanks for reply. I was looking at Java 7's documentation, I should move to java 8. BTW, I am from China and making an app where I need to sort strings in Mandarin, just kidding, but it's a valid usecase. I stumbled upon it while I was trying to understand how Java works with UTF-16. Since other answers are same, I'll select the one which came earliest. Thanks again!

    – dingy
    1 hour ago











  • I didn't say invalid. I said uncommon. (And the fact that you had to make up a use-case only reinforces my point ... :-) )

    – Stephen C
    58 mins ago












  • Hmm.. sorry.. my bad =)

    – dingy
    55 mins ago











  • See also: chinese.stackexchange.com/questions/24053/chinese-anagrams. (First answer: "Why do you need that? We never use that in China.")

    – Stephen C
    27 mins ago


















  • Thanks for reply. I was looking at Java 7's documentation, I should move to java 8. BTW, I am from China and making an app where I need to sort strings in Mandarin, just kidding, but it's a valid usecase. I stumbled upon it while I was trying to understand how Java works with UTF-16. Since other answers are same, I'll select the one which came earliest. Thanks again!

    – dingy
    1 hour ago











  • I didn't say invalid. I said uncommon. (And the fact that you had to make up a use-case only reinforces my point ... :-) )

    – Stephen C
    58 mins ago












  • Hmm.. sorry.. my bad =)

    – dingy
    55 mins ago











  • See also: chinese.stackexchange.com/questions/24053/chinese-anagrams. (First answer: "Why do you need that? We never use that in China.")

    – Stephen C
    27 mins ago

















Thanks for reply. I was looking at Java 7's documentation, I should move to java 8. BTW, I am from China and making an app where I need to sort strings in Mandarin, just kidding, but it's a valid usecase. I stumbled upon it while I was trying to understand how Java works with UTF-16. Since other answers are same, I'll select the one which came earliest. Thanks again!

– dingy
1 hour ago





Thanks for reply. I was looking at Java 7's documentation, I should move to java 8. BTW, I am from China and making an app where I need to sort strings in Mandarin, just kidding, but it's a valid usecase. I stumbled upon it while I was trying to understand how Java works with UTF-16. Since other answers are same, I'll select the one which came earliest. Thanks again!

– dingy
1 hour ago













I didn't say invalid. I said uncommon. (And the fact that you had to make up a use-case only reinforces my point ... :-) )

– Stephen C
58 mins ago






I didn't say invalid. I said uncommon. (And the fact that you had to make up a use-case only reinforces my point ... :-) )

– Stephen C
58 mins ago














Hmm.. sorry.. my bad =)

– dingy
55 mins ago





Hmm.. sorry.. my bad =)

– dingy
55 mins ago













See also: chinese.stackexchange.com/questions/24053/chinese-anagrams. (First answer: "Why do you need that? We never use that in China.")

– Stephen C
27 mins ago






See also: chinese.stackexchange.com/questions/24053/chinese-anagrams. (First answer: "Why do you need that? We never use that in China.")

– Stephen C
27 mins ago












3














We can't use char for Unicode, because Java's Unicode char handling is broken.



In the early days of Java, Unicode code points were always 16-bits (fixed size at exactly one char). However, the Unicode specification changed to allow supplemental characters. That meant Unicode characters are now variable widths, and can be longer than one char. Unfortunately, it was too late to change Java's char implementation without breaking a ton of production code.



So the best way to manipulate Unicode characters is by using code points directly, e.g., using String.codePointAt(index) or the String.codePoints() stream on JDK 1.8 and above.






share|improve this answer








New contributor




peekay is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • Thanks for reply, I completely missed the String::codePointAt api, also I think I should move to java 8. Since other answers are same, I'll select the one which came earliest.

    – dingy
    1 hour ago















3














We can't use char for Unicode, because Java's Unicode char handling is broken.



In the early days of Java, Unicode code points were always 16-bits (fixed size at exactly one char). However, the Unicode specification changed to allow supplemental characters. That meant Unicode characters are now variable widths, and can be longer than one char. Unfortunately, it was too late to change Java's char implementation without breaking a ton of production code.



So the best way to manipulate Unicode characters is by using code points directly, e.g., using String.codePointAt(index) or the String.codePoints() stream on JDK 1.8 and above.






share|improve this answer








New contributor




peekay is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • Thanks for reply, I completely missed the String::codePointAt api, also I think I should move to java 8. Since other answers are same, I'll select the one which came earliest.

    – dingy
    1 hour ago













3












3








3







We can't use char for Unicode, because Java's Unicode char handling is broken.



In the early days of Java, Unicode code points were always 16-bits (fixed size at exactly one char). However, the Unicode specification changed to allow supplemental characters. That meant Unicode characters are now variable widths, and can be longer than one char. Unfortunately, it was too late to change Java's char implementation without breaking a ton of production code.



So the best way to manipulate Unicode characters is by using code points directly, e.g., using String.codePointAt(index) or the String.codePoints() stream on JDK 1.8 and above.






share|improve this answer








New contributor




peekay is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.










We can't use char for Unicode, because Java's Unicode char handling is broken.



In the early days of Java, Unicode code points were always 16-bits (fixed size at exactly one char). However, the Unicode specification changed to allow supplemental characters. That meant Unicode characters are now variable widths, and can be longer than one char. Unfortunately, it was too late to change Java's char implementation without breaking a ton of production code.



So the best way to manipulate Unicode characters is by using code points directly, e.g., using String.codePointAt(index) or the String.codePoints() stream on JDK 1.8 and above.







share|improve this answer








New contributor




peekay is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this answer



share|improve this answer






New contributor




peekay is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









answered 2 hours ago









peekaypeekay

22613




22613




New contributor




peekay is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





peekay is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






peekay is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • Thanks for reply, I completely missed the String::codePointAt api, also I think I should move to java 8. Since other answers are same, I'll select the one which came earliest.

    – dingy
    1 hour ago

















  • Thanks for reply, I completely missed the String::codePointAt api, also I think I should move to java 8. Since other answers are same, I'll select the one which came earliest.

    – dingy
    1 hour ago
















Thanks for reply, I completely missed the String::codePointAt api, also I think I should move to java 8. Since other answers are same, I'll select the one which came earliest.

– dingy
1 hour ago





Thanks for reply, I completely missed the String::codePointAt api, also I think I should move to java 8. Since other answers are same, I'll select the one which came earliest.

– dingy
1 hour ago










dingy is a new contributor. Be nice, and check out our Code of Conduct.









draft saved

draft discarded


















dingy is a new contributor. Be nice, and check out our Code of Conduct.












dingy is a new contributor. Be nice, and check out our Code of Conduct.











dingy is a new contributor. Be nice, and check out our Code of Conduct.














Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55803293%2fsorting-the-characters-in-a-utf-16-string-in-java%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Oświęcim Innehåll Historia | Källor | Externa länkar | Navigeringsmeny50°2′18″N 19°13′17″Ö / 50.03833°N 19.22139°Ö / 50.03833; 19.2213950°2′18″N 19°13′17″Ö / 50.03833°N 19.22139°Ö / 50.03833; 19.221393089658Nordisk familjebok, AuschwitzInsidan tro och existensJewish Community i OświęcimAuschwitz Jewish Center: MuseumAuschwitz Jewish Center

Valle di Casies Indice Geografia fisica | Origini del nome | Storia | Società | Amministrazione | Sport | Note | Bibliografia | Voci correlate | Altri progetti | Collegamenti esterni | Menu di navigazione46°46′N 12°11′E / 46.766667°N 12.183333°E46.766667; 12.183333 (Valle di Casies)46°46′N 12°11′E / 46.766667°N 12.183333°E46.766667; 12.183333 (Valle di Casies)Sito istituzionaleAstat Censimento della popolazione 2011 - Determinazione della consistenza dei tre gruppi linguistici della Provincia Autonoma di Bolzano-Alto Adige - giugno 2012Numeri e fattiValle di CasiesDato IstatTabella dei gradi/giorno dei Comuni italiani raggruppati per Regione e Provincia26 agosto 1993, n. 412Heraldry of the World: GsiesStatistiche I.StatValCasies.comWikimedia CommonsWikimedia CommonsValle di CasiesSito ufficialeValle di CasiesMM14870458910042978-6

Typsetting diagram chases (with TikZ?) Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)How to define the default vertical distance between nodes?Draw edge on arcNumerical conditional within tikz keys?TikZ: Drawing an arc from an intersection to an intersectionDrawing rectilinear curves in Tikz, aka an Etch-a-Sketch drawingLine up nested tikz enviroments or how to get rid of themHow to place nodes in an absolute coordinate system in tikzCommutative diagram with curve connecting between nodesTikz with standalone: pinning tikz coordinates to page cmDrawing a Decision Diagram with Tikz and layout manager