Based off Regex Until But Not Including, I'm trying to match all characters up until a word boundary.

For example - matching apple in the following string:

<pre><code><b>apple</b>&lt; </code></pre>

I'm doing that using:

Like this:

/a[^\b]+/

Which should look for an "a" and then grab one or more matches for any character that is not a word boundary. So I would expect it to stop before < which is at the end of the word

Demo in Regexr

Demo in StackSnippets

<!-- begin snippet: js hide: false --> <!-- language: lang-js -->
var input = [ "apple<", "apple/" ];
var myRegex = /a[^\b]+/;

for (var i = 0; i < input.length; i++) {
  console.log(myRegex.exec(input[i]));  
}
<!-- end snippet -->

Couple other regex strings I tried:

I can use a negated word boundary or a negated set with a regular word boundary:

  • /a[\B]+/
  • /a[^\b]+/

I can specify several possible word ending characters and use them in a negated set:

  • /a[^|"<>\-\\\/;:,.]+/

I can also look for a postive set and just restrict it to return for regular letters:

  • /a[\w]+/
  • /a[a-zA-Z]+/

But I'd like to know how to do it for a word boundary if that's possible.

Here's a MDN's listing of word boundary and the characters that it constitutes

Word boundaries (\b) are not characters, but the empty string between a sequence of letters and any non-letter character. Moreover, since Unicode support is still lacking in JavaScript, "letter" mean only ASCII letters.

Because of that, you

  • generally shouldn't use \b unless your data is some kind of computer language that can't possibly include Unicode
  • can't apply quantifiers to \b (an empty string times 10 is still one empty string)
  • can't negate \b (it's not a character set, so it has no complement)
  • can't include \b in a character set (in square brackets) since, again, it's not a character or character set

Since \b doesn't actually add any characters to the match, you can safely append it to your regex:

/.+?\b/

will match all characters up until the first word boundary. It's in fact a superset of:

/\w+/

which is probably what you want, since you're interested only in the words, not the stuff in between.