How can I read a text file, then output a file with every word reversed, but punctuation and paragraphs in the same place?

7 views (last 30 days)
Hi,
I have a text file with multiple paragraphs that I'm trying to read and the reverse all the words while keeping the punctuation, capital letters and paragraphs in the same space. For example, "Hello" would change to "Olleh", and "That's" would change to "Stah't".
I am able to generate a cell array C with each word reversed using the following code, and can the create a text file that has each word after one another with a space in between each, but all the punctuation and capitals change order as well.
fileID = fopen('myfile.txt');
A = textscan(fileID,'%s','delimiter',' ');
B = {1};
C = reverse(B);
fclose(fileID)
Thanks for any advice anyone is able to offer.
Aaron
  3 Comments
Walter Roberson
Walter Roberson on 11 Sep 2017
First you need to define what a "word" is. This is a rather difficult thing to do.
For example in the words "That's" and "week's", the apostrophes are not acting as punctuation, and likewise in "good-natured", the dash is not acting as punctuation. In each case, those are marks that are essential parts of the word.
For example, "week's" stands in for weekes as -es is the old form of indicating possession in English. So "week's" should be expanded to "weekes" and then that should be flipped to "sekeew" and then that should be contracted again, perhaps to "s'keew". If you were to instead go to "keew's" then you would have broken the word improperly.
For "good-natured", that is all one word, a "closed compound word", so it needs to be flipped as a whole, not part by part. So "derutan-doog" rather than "good-derutan".
For "it's", that is a contraction that stands in for "it is", not technically a compound word but acting effectively as an open compound word in this context. So it needs to be expanded to "it is" and then that reversed as a whole, "si ti", then re-contracted, "s'ti", not "ti's" .
There is no syntactically way to detect open compound words. Also, in common typography there is no syntactic way to detect closed compound words that use a hyphen, compared to words that happen to have a hyphen between them with the hyphen acting as punctuation. In formal typography different hyphens ("dashes") are used for closed compounds versus the punctuation dash.

Sign in to comment.

Accepted Answer

Cedric
Cedric on 10 Sep 2017
Edited: Cedric on 11 Sep 2017
Assuming that if it is a homework (it is about loops and conditional statements and hence that -) the code below won't be suitable, here is a way to achieve it, that respects the upper/lower case and the position of special characters ( UPDATED, see EDITs at the bottom ):
str = 'Abc''d efg; hij'' 89-0! ABCD, efg.' ;
isUpper = str >= 'A' & str <= 'Z' ;
pos_noSpec = regexp( str, '[a-zA-Z0-9\s]', 'start' ) ;
buffer = reverse( strsplit( str(pos_noSpec) )) ;
str(setdiff( pos_noSpec, regexp( str, '\s', 'start' ))) = [buffer{:}] ;
str(isUpper) = upper( str(isUpper) ) ;
str(~isUpper) = lower( str(~isUpper) ) ;
This updates str to
Dcb'a gfe; jih' 09-8! DCBA, gfe.
I copy the initial and final versions below for comparison/check:
Initial: Abc'd efg; hij' 89-0! ABCD, efg.
Final : Dcb'a gfe; jih' 09-8! DCBA, gfe.
The code is a bit condensed, but I am happy to develop a little if needed. The idea is to pick only "normal" characters and spaces first, split on spaces, reverse words, and then redistribute letters at relevant positions. These positions are the positions of "normal" characters ( pos_noSpec minus positions of spaces, hence the call to SETDIFF ). The rest is for copying the initial upper/lower case.
NOTES
As mentioned by ImageAnalyst, you can use FILEREAD to read your text file in one chunk:
str = fileread( 'myfile.txt' ) ;
EDITS:
  • Call to SPRINTF replaced by concatenation of CSL [buffer{:}] ).
  • Detection of white spaces in call to SETDIFF using regexp white spaces ( \s ).
  5 Comments
Cedric
Cedric on 11 Sep 2017
Edited: Cedric on 11 Sep 2017
Well, my guess for your error is that you should replace
find( str == ' ' )
with
regexp( str, '\s', 'start' )
I am updating my answer accordingly. My first version would have worked I guess because the pattern was [a-zA-Z0-9 ], but then I thought that carriage returns and line breaks should be treated like white spaces and I updated it for [a-zA-Z0-9\s]. Doing this, I forgot that in the call to SETDIFF, we must eliminate positions of whatever is matched by \s (which includes carriage returns and line breaks).

Sign in to comment.

More Answers (1)

Categories

Find more on Characters and Strings in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!