How can i encode strings in a array to numbers?

14 views (last 30 days)
I have different strings in a matrix and some of the coloums of this matrix is strings. they includes city names etc. so i would like to convert this city names into number i did below code. my matrix is made of string so firstly i try to learn how many unique string are exist in a coloumn for example in 9th coloumn.
c=unique(matrix(:,9));
then for example i find 3400 elements and i give numbers to each one. (first coloumn of the c is city names and second coloumn is numbers 1 to 3400)
c(:,2)=1:3400;
then i have 1 million rows and i try to match everyone of them with numbers and create a new matrix that includes only numbers
for i=1:1000000
for j=1:3400
if matrix(i,9)==c(j,1)
numbermatrix(i,9)=str2double(c(j,2));
end
end
end
so this code works well but it takes a lot of time to compare every variable with all possibilities.
Is there any easy and fast working method to do this job?
  1 Comment
MarKf
MarKf on 2 Mar 2024
Ok, maybe you can make that a little more clear.
>"I have different strings in a matrix and some of the coloums of this matrix is strings"
So you have a string matrix. Which is unlikely, so you have either a string array or more likely a cell. You could just have an actual matrix array if you have the same number of characters and have the shorter city names just padded with spaces, but ugh hopefully not. Or maybe city codes of the same lenght. Anyway it does not matter since it seems you are only interested in column 9 and unique works on all of the above. So maybe:
matcol9 = {'New York', 'London', 'Paris', 'San Francisco', 'Toronto', 'Sydney', 'Singapore', 'London', 'Paris'}'; % example cell; ["London", "Paris" etc] in case of string array
[c,ia,ic]=unique(matcol9); %If A is a vector, then C = A(ia) and A = C(ic).
c = 7×1 cell array
{'London' } {'New York' } {'Paris' } {'San Francisco'} {'Singapore' } {'Sydney' } {'Toronto' }
So unique has already a built-in way to assign an index to the original elements (the variable ic). And that may already be what you need. Assuming that the 1 million rows is the same "matrix" (why are they rows now tho?). Otherwise ismember might be more helpful than comparing each elemnt to each.

Sign in to comment.

Accepted Answer

Hassaan
Hassaan on 2 Mar 2024
Using containers.Map
% Assume 'matrix' is your string matrix
uniqueCities = unique(matrix(:,9));
% Create a map object where key-value pairs correspond to city names and their unique integer identifiers
cityMap = containers.Map(uniqueCities, 1:numel(uniqueCities));
% Preallocate the number matrix for efficiency
numberMatrix = zeros(size(matrix,1), 1); % Adjust the size according to your needs
% Now, iterate through each city name in your matrix and replace it with its corresponding number using the map
for i = 1:size(matrix,1)
city = matrix(i,9);
if isKey(cityMap, city)
numberMatrix(i,9) = cityMap(city);
end
end
Using ismember
[uniqueCities, ~, idx] = unique(matrix(:,9));
% The 'idx' array contains the indices of the unique values in 'uniqueCities' that correspond to each entry in 'matrix(:,9)'
% Now, directly use these indices as your numeric representation
% If 'numberMatrix' should retain the original 'matrix' structure but with numbers in the 9th column
numberMatrix = matrix; % Assuming 'matrix' is a numeric matrix and you want to keep other data unchanged
numberMatrix(:,9) = idx; % This replaces the 9th column with numerical identifiers
-----------------------------------------------------------------------------------------------------------------------------------------------------
If you find the solution helpful and it resolves your issue, it would be greatly appreciated if you could accept the answer. Also, leaving an upvote and a comment are also wonderful ways to provide feedback.
It's important to note that the advice and code are based on limited information and meant for educational purposes. Users should verify and adapt the code to their specific needs, ensuring compatibility and adherence to ethical standards.
Professional Interests
  • Technical Services and Consulting
  • Embedded Systems | Firmware Developement | Simulations
  • Electrical and Electronics Engineering
Feel free to contact me.

More Answers (1)

Steven Lord
Steven Lord on 2 Mar 2024
If you have data where some of the columns are text and some are numbers, consider storing your data in a table array rather than trying to use numbers as proxies for the text data. In a table array, each variable has to have the same type of data but different variables can contain different types of data.
names = ["Walter Roberson"; "Image Analyst"; "Star Strider"];
reputation = [135067; 77260; 65504];
contributors = table(names, reputation)
contributors = 3×2 table
names reputation _________________ __________ "Walter Roberson" 1.3507e+05 "Image Analyst" 77260 "Star Strider" 65504
Here the names variable is text while the reputation variable is numeric.
starStriderReputation = contributors{3, 'reputation'}
starStriderReputation = 65504
Alternately, instead of treating the names as data I could treat them as row names and use them to index into the table.
contributors2 = table(reputation, RowNames=names)
contributors2 = 3×1 table
reputation __________ Walter Roberson 1.3507e+05 Image Analyst 77260 Star Strider 65504
walterRobersonReputation = contributors2{"Walter Roberson", 1}
walterRobersonReputation = 135067
If your data is time-based a timetable may be more useful than a table because it provides capabilities to change the time basis of your data or perform time-based analysis (compute daily averages from data collected more frequently, for example.)

Categories

Find more on Numeric Types in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!