Finding kth frequency letter in a string of characters
Finding kth frequency letter in a string of characters
I have a string say "aaabbacccd" here count[a]=4 count[b]=2 count[c]=3 and count[d]=1 . I have to find character with nth largest frequency. In the above string the character with 3rd highest frequency is b (because 1<2<3<4) and count[b]=2
The straight forward solution is storing character Vs frequency in a map and sorting the values using sorting collection by value method like below
public class MapUtil {
public static <K, V extends Comparable<? super V>> Map<K, V>
sortByValue(Map<K, V> map) {
List<Map.Entry<K, V>> list = new LinkedList<Map.Entry<K, V>>(map.entrySet());
Collections.sort( list, new Comparator<Map.Entry<K, V>>() {
public int compare(Map.Entry<K, V> o1, Map.Entry<K, V> o2) {
return (o1.getValue()).compareTo( o2.getValue() );
}
});
Map<K, V> result = new LinkedHashMap<K, V>();
for (Map.Entry<K, V> entry : list) {
result.put(entry.getKey(), entry.getValue());
}
return result;
}
}
I tried solving this problem using tree-map to maintain the characters in sorted order based on the count. But all i ended up with was violating the equality and compare to constraint thus my map ending with inconsistent values.
So can't this problem be solved using Tree-map or any other data structure in an optimal way ?
If I understand correctly, you want k'th smallest element in the array. There are algorithms which allow you to do that without sorting the array. This one should work for larger arrays in linear time which is an improvement over sorting.
– jrook
Sep 20 '17 at 19:29
@dasblinkenlight This problem has to find out kth largest frequency. Don't get how it is related to order statistics which is for unsorted array of integers
– Aarish Ramesh
Sep 20 '17 at 19:53
@Aarish Right. You start by creating an unsorted array of integers representing character counts. Then you compute k-order stat, and finally you find the character that has the corresponding count.
– dasblinkenlight
Sep 20 '17 at 19:57
6 Answers
6
Here is a streaming solution:
import java.util.Collections;
import java.util.Map;
import java.util.function.Function;
import java.util.stream.Collectors;
public class StackOverflow {
private static class S46330187 {
public static void main(String args) {
//prints b
System.out.println(kthMostFrequentChar("aaabbacccd",3));
//prints b
System.out.println(kthMostFrequentChar("aaabbacccbbbd",1));
//prints e
System.out.println(kthMostFrequentChar("aaabbbcccdddeee",5));
}
private static Character kthMostFrequentChar(final String string, final int kth) {
Map<Integer, Long> counts = string.chars()
.boxed()
.collect(Collectors.groupingBy(
Function.identity(),
Collectors.counting()
));
return counts.entrySet()
.stream()
.sorted(Collections.reverseOrder(Map.Entry.comparingByValue()))
.map(e->(char)e.getKey().intValue())
.limit(kth)
.reduce((l,r)->r)
.orElseThrow(IllegalArgumentException::new);
}
}
}
Using a TreeMap
won't be of much help because it is sorted on the keys, not on values.
TreeMap
Here's my simple algorithm:
Here's a working program basing this algorithm:
String str = "aabbddddccc";
Map<Character, Integer> map = new HashMap<>();
while (!str.isEmpty()) {
char ch = str.charAt(0);
String arr = str.split(Character.toString(ch));
map.put(ch, arr.length == 0 ? str.length() : arr.length - 1);
str = String.join("", arr);
}
System.out.println(map);
List<Integer> values = new ArrayList<>(map.values().stream().sorted().collect(Collectors.toSet()));
Collections.reverse(values);
map.keySet().stream().filter(e -> map.get(e) == values.get(3)).forEach(System.out::println);
This won't work if there are characters with same number of counts because of this map.keySet().stream().filter(e -> map.get(e) == values.get(3)).forEach(System.out::println);
– Aarish Ramesh
Sep 21 '17 at 5:31
@Aarish. Good catch. I just fixed it to first remove the duplicate frequencies by collecting the frequencies in a set rather than a list. See my updated my answer. I think you have already found the correct the answer from others' posts. But this will still be helpful to future readers.
– VHS
Sep 21 '17 at 14:00
Maintain two maps.
Read all the input chars one by one and update both maps. When done with your input, go through the TreeMap to get the Lists.
Starting from max frequency,
Solution in o(n) time and constant space.
Assumption it contains only charcters.
Pseudo code (writing from mobile , dont have IDE)
1) create a data structure of
Frequency
Char ch,
int v
2) create an array (frqy) of 52 length and type Frequency
2.1 initialize each of the 52 object with character a-zA-Z
3) itterate over each character (ch) in the string
3.1) check if (int)ch <=122 and >= 96
Then increment the frequency of the frqy array at index (int)ch - 96
3.3) check if (int)ch <=90 and >= 65
Then increment the frequency of the frqy array at index (int)ch - 65+26
4 ) sort the array based on frequency and now you have the answer
The assumption of only upper- and lower-case US English letters is naive, and not something that was included in the original question. This solution is not at all reasonable given the prevalence of Unicode in today's computing world, where there are on the order of 1.3 million possible characters.
– Jim Mischel
Sep 20 '17 at 20:39
Thanks for the informative comment without it I wouldn't know the number of charcters in todays conputational world.
– Geek
Sep 20 '17 at 21:07
Retrieve the Kth Value from the List.
import java.util.*;
import java.lang.*;
import java.io.*;
class Main{
static class Pair{
char first;
int second;
Pair(char first,int second){
this.first=first;
this.second=second;
}
}
public static void main(String args) {
TreeMap<Character,Integer> m = new TreeMap<>();
String s;
int k;
for(int i=0;i<s.length();i++){
char ch = s.charAt(i);
int x=0;
if(m.containsKey(ch))
x=m.get(ch);
m.put(ch,x+1);
}
ArrayList<Pair> list = new ArrayList<Pair>();
for(Map.Entry<Character,Integer> i : m.entrySet()){
w.println(i.getKey() +" : "+i.getValue());
list.add(new Pair(i.getKey(),i.getValue()));
}
Collections.sort(list, (a,b)->{if(a.second>=b.second) return -1; else return 1;});
System.out.println(list.get(k-1).first);
}
}
To ensure you have a kth Frequent element , you can use hashSet to count the number of unique frequencies and if it is possible to have the kth frequency.
live running code example :
https://ideone.com/p34SDC
Below is the simple solution without using java 8
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.LinkedHashSet;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Set;
import java.util.TreeMap;
class TestClass {
public static void main(String args) throws Exception {
// ma("aaabbacccd",3);
// ma("aaabbacccbbbd",4);
ma("aabbcd", 1);
}
private static void ma(String s, int k) {
TreeMap<Character, Integer> map = new TreeMap<>();
for (int i = 0; i < s.length(); i++) {
if (map.containsKey(s.charAt(i))) {
map.put(s.charAt(i), map.get(s.charAt(i)) + 1);
} else {
map.put(s.charAt(i), 1);
}
}
System.out.println(map.toString());
Set<Entry<Character, Integer>> set = map.entrySet();
List<Entry<Character, Integer>> list = new ArrayList<>(set);
Collections.sort(list, new Comparator<Map.Entry<Character, Integer>>() {
public int compare(Map.Entry<Character, Integer> o1, Map.Entry<Character, Integer> o2) {
return o2.getValue().compareTo(o1.getValue());
}
});
Set<Integer> ls = new LinkedHashSet<>();
System.out.println("after sorting by value-----");
for (Entry<Character, Integer> e : list) {
System.out.println("key: " + e.getKey() + " value: " + e.getValue());
ls.add(e.getValue());
}
System.out.println("removed duplicate from value array --------" + ls.toString());
System.out.println(new ArrayList<>(ls).get(k - 1));
for (Entry<Character, Integer> e1 : list) {
if (e1.getValue().equals(new ArrayList<>(ls).get(k - 1))) {
System.out.println(e1.getKey());
break;
}
}
}
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Look up "order statistics in linear time".
– dasblinkenlight
Sep 20 '17 at 19:29