Split function for text ( split(...) or strsplit(...))
6 views (last 30 days)
Show older comments
Hello,
I have a pdf document, How do I can divide it by full stops (whole paragraphs) using split(...) or strsplit(...)?. For instance, For this text composed by these two paragraphs, I need split the text into two paragrah divided by full stop.
"
In economics the demand curve is the graphical representation of the relationship between the price and the quantity that consumers are willing to purchase. The curve shows how the price of a commodity or service changes as the quantity demanded increases. Every point on the curve is an amount of consumer demand and the corresponding market price. The graph shows the law of demand, which states that people will buy less of something if the price goes up and vice versa.
The slope of a linear demand curve is constant. The elasticity of demand changes continuously as one moves down the demand curve because the ratio of price to quantity continuously falls. At the point the demand curve intersects the y-axis PED is infinitely elastic, because the variable Q appearing in the denominator of the elasticity formula is zero there. At the point the demand curve intersects the x-axis PED is zero, because the variable P appearing in the numerator of the elasticity formula is zero there.[2] At one point on the demand curve PED is unitary elastic: PED equals one. Above the point of unitary elasticity is the elastic range of the demand curve (meaning that the elasticity is greater than one). Below is the inelastic range, in which the elasticity is less than one. The decline in elasticity as one moves down the curve is due to the falling P/Q ratio.
"
Thanks.
6 Comments
Answers (1)
Adam Danz
on 6 Jun 2019
Edited: Adam Danz
on 6 Jun 2019
Try this out. I don't have your data so I'm taking a shot in the dark. It may require a small tweak.
str = extractFileText(filename);
t = split(str{:},newline);
emptyLineIdx = cellfun(@isempty,t); %find empty rows
paraGroups = cumsum(emptyLineIdx)+1; %assign paragraph group number to each line
t(emptyLineIdx) = []; %get rid of the empty lines
paraGroups(emptyLineIdx) = [];
c = splitapply(@(x){strjoin(x,'\n')},t,paraGroups) % produce cell array; one element per paragraph.
I feel like there's a more direct way to do this but this approach should also work. I wonder if there's a "new paragraph" indicator in regular expressions.
2 Comments
Stephen23
on 7 Jun 2019
Thanks Adam for your answer,
Yes I want two sentences,The first one has to contain the first paragraph and the second sentence have to contain the secon paragraph. The idea is that I can analyse the text by paragrahs which are diveded by full stop (.).
rng('default')
filename = "Deamand.pdf";
str = extractFileText(filename);
data = readPDFFormData(filename);
newDocuments = strsplit(str, "?");
newDocuments_1 = erasePunctuation(newDocuments);
.
.
.
.
See Also
Categories
Find more on Data Type Conversion in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!