How to correctly write regular expression to match ASCII control chars Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) Announcing the arrival of Valued Associate #679: Cesar Manara Unicorn Meta Zoo #1: Why another podcast?How can I test and use a Perl regular expression interactively?How do I create a dynamic regexp with rx?How to save part of a regular expression during search and replace?Custom Major Mode - Regex to find word before equal sign and set font-lock-variable-name-faceHow to escape regexp special characters in a string?How to match more than one instance of a single subexpression?why is this trim-space function so complicated/ugly in emacs lisp?How to match symbol in regexp?JavaScript regular expressions in re-builderChange regex-builder-mode hook to use <C-s>
Why does this iterative way of solving of equation work?
Was credit for the black hole image misattributed?
How is simplicity better than precision and clarity in prose?
Can smartphones with the same camera sensor have different image quality?
What do you call a plan that's an alternative plan in case your initial plan fails?
What kind of display is this?
3 doors, three guards, one stone
How to retrograde a note sequence in Finale?
Active filter with series inductor and resistor - do these exist?
Aligning matrix of nodes with grid
Who can trigger ship-wide alerts in Star Trek?
Mortgage adviser recommends a longer term than necessary combined with overpayments
Working around an AWS network ACL rule limit
Determine whether f is a function, an injection, a surjection
How does modal jazz use chord progressions?
What does the torsion-free condition for a connection mean in terms of its horizontal bundle?
Two different pronunciation of "понял"
What's the point in a preamp?
What was the last x86 CPU that did not have the x87 floating-point unit built in?
Replacing HDD with SSD; what about non-APFS/APFS?
What would be Julian Assange's expected punishment, on the current English criminal law?
How to select 3,000 out of 10,000 files in file manager?
Did the new image of black hole confirm the general theory of relativity?
Autumning in love
How to correctly write regular expression to match ASCII control chars
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
Announcing the arrival of Valued Associate #679: Cesar Manara
Unicorn Meta Zoo #1: Why another podcast?How can I test and use a Perl regular expression interactively?How do I create a dynamic regexp with rx?How to save part of a regular expression during search and replace?Custom Major Mode - Regex to find word before equal sign and set font-lock-variable-name-faceHow to escape regexp special characters in a string?How to match more than one instance of a single subexpression?why is this trim-space function so complicated/ugly in emacs lisp?How to match symbol in regexp?JavaScript regular expressions in re-builderChange regex-builder-mode hook to use <C-s>
I would like to to create a regular expression in elisp (in the standard 'read' form) to match extended ASCII chars the same as PCRE does:
^[a-zA-Z_x7f-xff][a-zA-Z0-9_x7f-xff]*$
So, I'm currently сonfused about x7f-xff. Is there a way to set a range using something like xhh?
regular-expressions
add a comment |
I would like to to create a regular expression in elisp (in the standard 'read' form) to match extended ASCII chars the same as PCRE does:
^[a-zA-Z_x7f-xff][a-zA-Z0-9_x7f-xff]*$
So, I'm currently сonfused about x7f-xff. Is there a way to set a range using something like xhh?
regular-expressions
I think the answer depends on whether you're matching against unibyte or multibyte strings. Do you think À (which is undefined in ASCII, 0xC0 in latin-1 and Unicode, but encoded as 0xC380 in UTF-8) falls into the range 0x7F-0xFF?
– npostavs
5 hours ago
I think so. At least PCRE matched À as a char in 0x7F-0xFF range. I need the same behavior for standard Elisp regular expression.
– serghei
4 hours ago
And as I can seeÀis defined in ASCII: ascii-code.com. 0xC0 is between 0x7F and 0xFF
– serghei
4 hours ago
add a comment |
I would like to to create a regular expression in elisp (in the standard 'read' form) to match extended ASCII chars the same as PCRE does:
^[a-zA-Z_x7f-xff][a-zA-Z0-9_x7f-xff]*$
So, I'm currently сonfused about x7f-xff. Is there a way to set a range using something like xhh?
regular-expressions
I would like to to create a regular expression in elisp (in the standard 'read' form) to match extended ASCII chars the same as PCRE does:
^[a-zA-Z_x7f-xff][a-zA-Z0-9_x7f-xff]*$
So, I'm currently сonfused about x7f-xff. Is there a way to set a range using something like xhh?
regular-expressions
regular-expressions
edited 4 hours ago
serghei
asked 5 hours ago
sergheiserghei
187110
187110
I think the answer depends on whether you're matching against unibyte or multibyte strings. Do you think À (which is undefined in ASCII, 0xC0 in latin-1 and Unicode, but encoded as 0xC380 in UTF-8) falls into the range 0x7F-0xFF?
– npostavs
5 hours ago
I think so. At least PCRE matched À as a char in 0x7F-0xFF range. I need the same behavior for standard Elisp regular expression.
– serghei
4 hours ago
And as I can seeÀis defined in ASCII: ascii-code.com. 0xC0 is between 0x7F and 0xFF
– serghei
4 hours ago
add a comment |
I think the answer depends on whether you're matching against unibyte or multibyte strings. Do you think À (which is undefined in ASCII, 0xC0 in latin-1 and Unicode, but encoded as 0xC380 in UTF-8) falls into the range 0x7F-0xFF?
– npostavs
5 hours ago
I think so. At least PCRE matched À as a char in 0x7F-0xFF range. I need the same behavior for standard Elisp regular expression.
– serghei
4 hours ago
And as I can seeÀis defined in ASCII: ascii-code.com. 0xC0 is between 0x7F and 0xFF
– serghei
4 hours ago
I think the answer depends on whether you're matching against unibyte or multibyte strings. Do you think À (which is undefined in ASCII, 0xC0 in latin-1 and Unicode, but encoded as 0xC380 in UTF-8) falls into the range 0x7F-0xFF?
– npostavs
5 hours ago
I think the answer depends on whether you're matching against unibyte or multibyte strings. Do you think À (which is undefined in ASCII, 0xC0 in latin-1 and Unicode, but encoded as 0xC380 in UTF-8) falls into the range 0x7F-0xFF?
– npostavs
5 hours ago
I think so. At least PCRE matched À as a char in 0x7F-0xFF range. I need the same behavior for standard Elisp regular expression.
– serghei
4 hours ago
I think so. At least PCRE matched À as a char in 0x7F-0xFF range. I need the same behavior for standard Elisp regular expression.
– serghei
4 hours ago
And as I can see
À is defined in ASCII: ascii-code.com. 0xC0 is between 0x7F and 0xFF– serghei
4 hours ago
And as I can see
À is defined in ASCII: ascii-code.com. 0xC0 is between 0x7F and 0xFF– serghei
4 hours ago
add a comment |
1 Answer
1
active
oldest
votes
You can use -ÿ instead of x7f-xff. That first character, which StackExchange prints as a space, is DEL, which has codepoint 127 (decimal), #o177 (octal), and #x7f (hexadecimal).
That is, you can just insert the characters themselves in the regexp pattern.
One way to input such characters is to use C-x 8 RET. To search for any char in the range x7f through xff you would type this at the C-M-s prompt (without the spaces):
[ C-x 8 RET # x 7 f - C-x 8 RET # x f f ]
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "583"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2femacs.stackexchange.com%2fquestions%2f48925%2fhow-to-correctly-write-regular-expression-to-match-ascii-control-chars%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can use -ÿ instead of x7f-xff. That first character, which StackExchange prints as a space, is DEL, which has codepoint 127 (decimal), #o177 (octal), and #x7f (hexadecimal).
That is, you can just insert the characters themselves in the regexp pattern.
One way to input such characters is to use C-x 8 RET. To search for any char in the range x7f through xff you would type this at the C-M-s prompt (without the spaces):
[ C-x 8 RET # x 7 f - C-x 8 RET # x f f ]
add a comment |
You can use -ÿ instead of x7f-xff. That first character, which StackExchange prints as a space, is DEL, which has codepoint 127 (decimal), #o177 (octal), and #x7f (hexadecimal).
That is, you can just insert the characters themselves in the regexp pattern.
One way to input such characters is to use C-x 8 RET. To search for any char in the range x7f through xff you would type this at the C-M-s prompt (without the spaces):
[ C-x 8 RET # x 7 f - C-x 8 RET # x f f ]
add a comment |
You can use -ÿ instead of x7f-xff. That first character, which StackExchange prints as a space, is DEL, which has codepoint 127 (decimal), #o177 (octal), and #x7f (hexadecimal).
That is, you can just insert the characters themselves in the regexp pattern.
One way to input such characters is to use C-x 8 RET. To search for any char in the range x7f through xff you would type this at the C-M-s prompt (without the spaces):
[ C-x 8 RET # x 7 f - C-x 8 RET # x f f ]
You can use -ÿ instead of x7f-xff. That first character, which StackExchange prints as a space, is DEL, which has codepoint 127 (decimal), #o177 (octal), and #x7f (hexadecimal).
That is, you can just insert the characters themselves in the regexp pattern.
One way to input such characters is to use C-x 8 RET. To search for any char in the range x7f through xff you would type this at the C-M-s prompt (without the spaces):
[ C-x 8 RET # x 7 f - C-x 8 RET # x f f ]
edited 2 hours ago
answered 2 hours ago
DrewDrew
49.1k463108
49.1k463108
add a comment |
add a comment |
Thanks for contributing an answer to Emacs Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2femacs.stackexchange.com%2fquestions%2f48925%2fhow-to-correctly-write-regular-expression-to-match-ascii-control-chars%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I think the answer depends on whether you're matching against unibyte or multibyte strings. Do you think À (which is undefined in ASCII, 0xC0 in latin-1 and Unicode, but encoded as 0xC380 in UTF-8) falls into the range 0x7F-0xFF?
– npostavs
5 hours ago
I think so. At least PCRE matched À as a char in 0x7F-0xFF range. I need the same behavior for standard Elisp regular expression.
– serghei
4 hours ago
And as I can see
Àis defined in ASCII: ascii-code.com. 0xC0 is between 0x7F and 0xFF– serghei
4 hours ago