How to parse string with bold substring

5 views (last 30 days)
cl254
cl254 on 8 Oct 2020
Answered: Walter Roberson on 8 Oct 2020
How to parse text like:
str_input= "This is an example string for parsing bold portion, it will be very useful."
and returns a numerical vector(numvec) and a string vector(strvec).
(numvec, strvec)= strscanbold(str_input)
numvec= [0 1 0 1 0 1]
('1' for bold string and '0' for non-bold string.)
strvec= [ [This is] [ an example] [ string for] [ parsing bold] [ portion,] [ it will be very useful.]]

Answers (1)

Walter Roberson
Walter Roberson on 8 Oct 2020
Text itself is not bold. Text is a vector of character codes with an internal structure noting that the codes represent character. There is no widely used standard for character sequences to indicate that bold must be turned on or off.
In earlier days, ISO 6429 specified "escape sequences" to indicate formattin. Those were fine for sending to end devices to indicate rendering, but they were not good for document processing.
Later, HTML introduced sequences such as bold>text</bold which had a number of advantages, but which had to evolve into a five or six part international standard that includes a programming language. You cannot tell for sure whether something is to be bolded in that standard without executing the program.
In short... text does not inherently have bold or italic or color or font style or font size or so on. And you cannot parse it to find the bold because the bold is not there.
Now, if the text includes an encoding that your system is going to interpret as bold, then that encoding can be detected. But first you need to know how the bold is encoded.

Categories

Find more on Characters and Strings in Help Center and File Exchange

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!