How can I read a text file, then output a file with every word reversed, but punctuation and paragraphs in the same place?

Question

Aaron Fisher on 10 Sep 2017

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/355937-how-can-i-read-a-text-file-then-output-a-file-with-every-word-reversed-but-punctuation-and-paragra

Edited: Cedric on 11 Sep 2017

Hi,

I have a text file with multiple paragraphs that I'm trying to read and the reverse all the words while keeping the punctuation, capital letters and paragraphs in the same space. For example, "Hello" would change to "Olleh", and "That's" would change to "Stah't".

I am able to generate a cell array C with each word reversed using the following code, and can the create a text file that has each word after one another with a space in between each, but all the punctuation and capitals change order as well.

fileID = fopen('myfile.txt');
A = textscan(fileID,'%s','delimiter',' ');
B = {1};
C = reverse(B); 
fclose(fileID)

Thanks for any advice anyone is able to offer.

Aaron

3 Comments
Show 1 older commentHide 1 older comment

Aaron Fisher on 10 Sep 2017

Its a homework task

Walter Roberson on 11 Sep 2017

First you need to define what a "word" is. This is a rather difficult thing to do.

http://www.eva.mpg.de/fileadmin/content_files/staff/haspelmt/pdf/WordSegmentation.pdf

For example in the words "That's" and "week's", the apostrophes are not acting as punctuation, and likewise in "good-natured", the dash is not acting as punctuation. In each case, those are marks that are essential parts of the word.

For example, "week's" stands in for weekes as -es is the old form of indicating possession in English. So "week's" should be expanded to "weekes" and then that should be flipped to "sekeew" and then that should be contracted again, perhaps to "s'keew". If you were to instead go to "keew's" then you would have broken the word improperly.

For "good-natured", that is all one word, a "closed compound word", so it needs to be flipped as a whole, not part by part. So "derutan-doog" rather than "good-derutan".

For "it's", that is a contraction that stands in for "it is", not technically a compound word but acting effectively as an open compound word in this context. So it needs to be expanded to "it is" and then that reversed as a whole, "si ti", then re-contracted, "s'ti", not "ti's" .

There is no syntactically way to detect open compound words. Also, in common typography there is no syntactic way to detect closed compound words that use a hyphen, compared to words that happen to have a hyphen between them with the hyphen acting as punctuation. In formal typography different hyphens ("dashes") are used for closed compounds versus the punctuation dash.

Sign in to comment.

Sign in to answer this question.

Answer 1

Cedric on 10 Sep 2017

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/355937-how-can-i-read-a-text-file-then-output-a-file-with-every-word-reversed-but-punctuation-and-paragra#answer_280988

Edited: Cedric on 11 Sep 2017

Assuming that if it is a homework (it is about loops and conditional statements and hence that -) the code below won't be suitable, here is a way to achieve it, that respects the upper/lower case and the position of special characters ( UPDATED, see EDITs at the bottom ):

 str = 'Abc''d efg; hij'' 89-0! ABCD, efg.' ;
 isUpper       = str >= 'A' & str <= 'Z' ;
 pos_noSpec    = regexp( str, '[a-zA-Z0-9\s]', 'start' ) ;
 buffer        = reverse( strsplit( str(pos_noSpec) )) ;
 str(setdiff( pos_noSpec, regexp( str, '\s', 'start' ))) = [buffer{:}] ;
 str(isUpper)  = upper( str(isUpper) ) ;
 str(~isUpper) = lower( str(~isUpper) ) ;

This updates str to

Dcb'a gfe; jih' 09-8! DCBA, gfe.

I copy the initial and final versions below for comparison/check:

 Initial:  Abc'd efg; hij' 89-0! ABCD, efg.
 Final  :  Dcb'a gfe; jih' 09-8! DCBA, gfe.

The code is a bit condensed, but I am happy to develop a little if needed. The idea is to pick only "normal" characters and spaces first, split on spaces, reverse words, and then redistribute letters at relevant positions. These positions are the positions of "normal" characters ( pos_noSpec minus positions of spaces, hence the call to SETDIFF ). The rest is for copying the initial upper/lower case.

NOTES

As mentioned by ImageAnalyst, you can use FILEREAD to read your text file in one chunk:

str = fileread( 'myfile.txt' ) ;

EDITS:

Call to SPRINTF replaced by concatenation of CSL [buffer{:}] ).
Detection of white spaces in call to SETDIFF using regexp white spaces ( \s ).

5 Comments
Show 3 older commentsHide 3 older comments

Cedric on 11 Sep 2017

Edited: Cedric on 11 Sep 2017

Anyhow, the best thing for understanding is to run line by line and understand we we are doing. Using my test string and developing each expression using temporary variables:

 >> pos_noSpec = regexp( str, '[a-zA-Z0-9\s]', 'start' )
 pos_noSpec =
     1     2     3     5     6     7     8     9    11    12    13    14    16    17    18    20    22    23    24    25    26    28    29    30    31

Here you see positions of all normal chars and white spaces.

 >> str_noSpec = str(pos_noSpec)
 str_noSpec =
 Abcd efg hij 890 ABCD efg

picking characters at these positions generates a sub-string without special characters, which is what we want to reverse after splitting on white spaces.

 >> buffer = reverse( strsplit( str_noSpec ))
 buffer =
  1×6 cell array
    'dcbA'    'gfe'    'jih'    '098'    'DCBA'    'gfe'

Now we have the "words" reversed and we need to re-inject their characters in the original string at their positions, which are stored in pos_noSpec. Yet, pos_noSpec also contains the positions of white spaces, and we need to eliminate these. One way to do it is to compute a set difference between these positions and the position of white spaces:

 >> pos_noSpec_npSpace = setdiff( pos_noSpec, regexp( str, '\s', 'start' ))
 pos_noSpec_npSpace =
     1     2     3     5     7     8     9    12    13    14    17    18    20    23    24    25    26    29    30    31

where you can check that regexp( str, '\s', 'start' ) gives the position of white spaces. ( EDIT: I updated this part. ) So str at these positions must be the reversed characters stored in cell array buffer. We need to concatenate it into a single string for begin able to write str(relevant pos) = new chars. The call to SPRINTF from my original answer worked but it was not smart, STRCAT would be better, or simply:

 >> str(pos_noSpec_npSpace) = [buffer{:}]
 str =
 dcb'A gfe; jih' 09-8! DCBA, gfe.

which is roughly what we want, except for the upper/lower cases. The other lines manage this and they are much simpler to understand, so I don't develop that.

Cedric on 11 Sep 2017

Edited: Cedric on 11 Sep 2017

Well, my guess for your error is that you should replace

find( str == ' ' )

with

regexp( str, '\s', 'start' )

I am updating my answer accordingly. My first version would have worked I guess because the pattern was [a-zA-Z0-9 ], but then I thought that carriage returns and line breaks should be treated like white spaces and I updated it for [a-zA-Z0-9\s]. Doing this, I forgot that in the call to SETDIFF, we must eliminate positions of whatever is matched by \s (which includes carriage returns and line breaks).

Aaron Fisher on 11 Sep 2017

Thank you for that, that is fantastic! :) it makes sense :)

Sign in to comment.

How can I read a text file, then output a file with every word reversed, but punctuation and paragraphs in the same place?

3 Comments
Show 1 older commentHide 1 older comment

Accepted Answer

5 Comments
Show 3 older commentsHide 3 older comments

More Answers (1)

See Also

Categories

Tags

Community Treasure Hunt

How can I read a text file, then output a file with every word reversed, but punctuation and paragraphs in the same place?

3 Comments Show 1 older commentHide 1 older comment

Accepted Answer

5 Comments Show 3 older commentsHide 3 older comments

More Answers (1)

See Also

Categories

Tags

Community Treasure Hunt

3 Comments
Show 1 older commentHide 1 older comment

5 Comments
Show 3 older commentsHide 3 older comments