utf 8 - Elixir: Counting Frequency of Words in Text File in Hangul Alphabet -


but working data written in hangul. have word frequency script have used english txt files, script fails when pass utf-8 txt file containing hangul characters. specifically, seems read characters blank spaces. results, stored in .csv file:

, 290668 1, 2 2, 5 3d, 1 4, 1 55, 1 6, 1 6mm, 2 709, 2 710, 1 d, 1 j, 87 k, 1 m, 14 p, 19 pd100, 1 y, 1 

considering text in file contains none of these characters, seems problem. how make code read hangul? current code:

defmodule wordfrequency    def wordcount(readfile)      readfile      |> words      |> count      |> tocsv   end    defp words(file)     file     |> file.stream!     |> stream.map(&string.trim_trailing(&1))     |> stream.map(&string.split(&1,~r{[^a-za-z0-9_]}))     |> enum.to_list     |> list.flatten     |> enum.map(&string.downcase(&1))   end    defp count(words) when is_list(words)     enum.reduce(words, %{}, &update_count/2)   end    defp update_count(word, acc)     map.update acc, string.to_atom(word), 1, &(&1 + 1)   end    defp tocsv(map)     file.open("wordfreqkor.csv", [:write, :utf8], fn(file) ->       enum.each(map, &io.write(file, enum.join(tuple.to_list(&1), ", ")<>"\n"))     end)   end  end  wordfrequency.wordcount("myfile.txt") 

thanks advice!


Comments

Popular posts from this blog

Command prompt result in label. Python 2.7 -

javascript - How do I use URL parameters to change link href on page? -

amazon web services - AWS Route53 Trying To Get Site To Resolve To www -