Efficient set union and intersection in C++

Efficient set union and intersection in C++ - python

Given two sets set1 and set2, I need to compute the ratio of their intersection by their union. So far, I have the following code:
double ratio(const set<string>& set1, const set<string>& set2)
{
if( set1.size() == 0 || set2.size() == 0 )
return 0;
set<string>::const_iterator iter;
set<string>::const_iterator iter2;
set<string> unionset;
// compute intersection and union
int len = 0;
for (iter = set1.begin(); iter != set1.end(); iter++)
{
unionset.insert(*iter);
if( set2.count(*iter) )
len++;
}
for (iter = set2.begin(); iter != set2.end(); iter++)
unionset.insert(*iter);
return (double)len / (double)unionset.size();
}
It seems to be very slow (I'm calling the function about 3M times, always with different sets). The python counterpart, on the other hand, is way much faster
def ratio(set1, set2):
if not set1 or not set2:
return 0
return len(set1.intersection(set2)) / len(set1.union(set2))
Any idea about how to improve the C++ version (possibly, not using Boost)?

It can be done in linear time, without new memory:
double ratio(const std::set<string>& set1, const std::set<string>& set2)
{
if (set1.empty() || set2.empty()) {
return 0.;
}
std::set<string>::const_iterator iter1 = set1.begin();
std::set<string>::const_iterator iter2 = set2.begin();
int union_len = 0;
int intersection_len = 0;
while (iter1 != set1.end() && iter2 != set2.end())
{
++union_len;
if (*iter1 < *iter2) {
++iter1;
} else if (*iter2 < *iter1) {
++iter2;
} else { // *iter1 == *iter2
++intersection_len;
++iter1;
++iter2;
}
}
union_len += std::distance(iter1, set1.end());
union_len += std::distance(iter2, set2.end());
return static_cast<double>(intersection_len) / union_len;
}

You don't actually need to construct the union set. In Python terms, len(s1.union(s2)) == len(s1) + len(s2) - len(s1.intersection(s2)); the size of the union is the sum of the sizes of s1 and s2, minus the number of elements counted twice, which is the number of elements in the intersection. Thus, you can do
for (const string &s : set1) {
len += set2.count(s);
}
return ((double) len) / (set1.size() + set2.size() - len)

Related

translating C++ for loop to python

I'm having a little trouble translating C++ to python. The trouble I have is with the boolean statements in the for loop for (Nbound = 1; Nbound < (Nobs + 1) && B < Beta; Nbound++) and for (Ndm = 0;(Ndm < (i + 1) && P3 > (0)) || PP == 0; Ndm++). I'm unsure how this would work in python, I don't think python allows boolean statements in a for loop, so I think I would have to call it inside with an IF statement, but I'm not entirely sure. Thanks for your help!
also, I've noticed a lot of empty variables in this code for example, float PP is there a way of doing this in python or would I just assign it a value of 0 then change it later?
float Pf = 0; //The complement of Beta
float B = 0; //Beta
float P3;
float PP;
float Nbound = 1;
for (Nbound = 1; Nbound < (Nobs + 1) && B < Beta; Nbound++) {
int Ndm = 0;
int Nbgd = Nobs; //Setting Ndm=Nobs
Pf = 0; //Zeroing the placeholder for the sum
float exp; //A variable to store the exponential
for (int i = 0; i < (Nobs + 1); i++) //Summing over Nbgd+Ndm<NObs
{
P3 = 1;
PP = 0;
if (P1[Nbgd] > 0) {
for (Ndm = 0;(Ndm < (i + 1) && P3 > (0)) || PP == 0; Ndm++) {
//P3 = dist(Ndm, Nbound);
Pf = Pf + (P1[Nbgd] * P3); //Summing over the probability
PP = PP + P3;
}
}
}
}
}

For loops in Python are meant for iteration over objects. If you want a loop with specific exit condition then you should use the while loop.
for loops in C can be described as :
for {initialization_statement; condition_expression; update_statement)
{
body_statement_1;
body_statement_2;
...
body_statement_n;
}
The corresponding loop in Python is :
initialization_statement
while condition_expression:
body_statement_1
body_statement_2
...
body_statement_n
update_statement

Leetcode 200. Number of Islands TLE

Link to the question: https://leetcode.com/problems/number-of-islands/
Given a 2d grid map of '1's (land) and '0's (water), count the number of islands. An island is surrounded by water and is formed by connecting adjacent lands horizontally or vertically. You may assume all four edges of the grid are all surrounded by water.
Example 1:
Input:
11110
11010
11000
00000
Output: 1
My logic is to simply do dfs from every node and keep track of the connected components.
Am getting Time Limit Exceeded (TLE) for the 46th test case, can someone help me optimize this code?
class Solution(object):
def numIslands(self, grid):
def in_grid(x, y):
return 0 <= x < len(grid) and 0 <= y < len(grid[0])
def neighbours(node):
p, q = node
dir = [(-1, 0), (0, 1), (1, 0), (0, -1)]
return [(x, y) for x, y in [(p + i, q + j) for i, j in dir] if in_grid(x, y)]
def dfs(node):
visited.append(node)
for v in neighbours(node):
if grid[v[0]][v[1]]== "1" and v not in visited:
dfs(v)
components = 0
visited = []
for i in range(len(grid)):
for j in range(len(grid[0])):
node = (i, j)
if grid[i][j] == "1" and node not in visited:
components += 1
dfs(node)
return components

I think your approach is correct. However you are using visited as a list which takes O(n) to search a value. So its overall time complexity is O(N^2). I would suggest to use set rather than list which is a hash table.
There is just two parts to revise:
visited = [] -> visited = set()
visited.append(node) ->visited.add(node)
I confirmed that it is accepted. Now node not in visited takes O(1) so the overall time complexity is O(N).
% Like most of other LeetCode problems, the problem statement does not give any information about input data. But as your code is TLE so we can assume that we cannot solve it with time complexity O(n^2).

The reason you are getting TLE is that you are using a list to keep a track of visited nodes. The search of a value in a list takes O(n) time in the worst case.
It's optimal to keep the index status of a visited/non-visited node as a 2D matrix containing boolean values or O/1 integer values. This leads to a constant access time O(1) to find the visited/non-visited status of a node.
class Solution {
boolean isSafe(char[][] grid, int[][] visited, int i, int j)
{
int n = grid.length;
int m = grid[0].length;
if((i<0 || i>n-1) || (j<0 || j>m-1))
return false;
return visited[i][j] == 0 && grid[i][j] == '1';
}
void DFS(char[][] grid, int[][] visited, int i, int j)
{
visited[i][j] = 1; //marked visited
int[] row = {-1, 0, 1, 0};
int[] column = {0, 1, 0, -1};
for(int k = 0; k<4; k++)
{
if(isSafe(grid, visited, i+row[k], j+column[k]))
{
DFS(grid, visited, i+row[k], j+column[k]);
}
}
}
int DFSUtil(char[][] grid)
{
int count = 0;
if(grid == null || grid.length == 0)
return count;
int n = grid.length; //rows
int m = grid[0].length; //columns
int[][] visited = new int[n][m];
for(int i = 0; i<n; i++)
for(int j = 0; j<m; j++)
{
if(grid[i][j]=='1' && visited[i][j] == 0)
{
DFS(grid, visited, i, j);
count++;
}
}
return count;
}
public int numIslands(char[][] grid) {
int result = DFSUtil(grid);
return result;
}
}

I solved it in Java by a DFS approach, it's simple and easy to understand. Here, my code may help you:
public static int numIslands(char[][] grid) {
int countOfIslands = 0 ;
for (int i = 0; i <grid.length ; i++) {
for (int j = 0; j <grid[i].length ; j++) {
if(grid[i][j] == '1'){
DFS(grid,i,j);
countOfIslands++;
}
}
}
return countOfIslands;
}
public static void DFS(char[][] grid , int row , int col){
if(grid[row][col] == '0')
return;
grid[row][col] = '0';
// System.out.println("grid = " + Arrays.deepToString(grid) + ", row = " + row + ", col = " + col);
if(row+1 < grid.length)
DFS(grid,row+1,col);
if(row-1 >=0)
DFS(grid,row-1,col);
if(col+1 <grid[0].length)
DFS(grid,row,col+1);
if(col-1 >= 0)
DFS(grid,row,col-1);
}
Reference for if this is your first time hearing about the DFS for a graph:
DFS Approach

Modified Simple DFS Solution
class Solution {
public int numIslands(char[][] grid) {
int count = 0;
for (int i = 0; i < grid.length; i++) {
for (int j = 0; j < grid[i].length; j++) {
if (grid[i][j] != '0') {
count++;
shrink(grid, i, j);
}
}
}
return count;
}
private void shrink(char[][] grid, int i, int j) {
if (i < 0 || j < 0 || i >= grid.length || j >= grid[0].length || grid[i][j] ==
'0')
return;
grid[i][j] = '0';
shrink(grid, i, j+1);
shrink(grid, i, j-1);
shrink(grid, i+1, j);
shrink(grid, i-1, j);
}
}

Longest palindromic substring top down dynamic programming

Here is the algorithm for finding longest palindromic substring given a string s using bottom-up dynamic programming. So the algorithm explores all possible length j substring and checks whether it is a valid palindrome for j in 1 to n. The resulting time and space complexity is O(n^2).
def longestPalindrome(s):
n = len(s)
if n < 2:
return s
P = [[False for _ in range(n)] for _ in range(n)]
longest = s[0]
# j is the length of palindrome
for j in range(1, n+1):
for i in range(n-j+1):
# if length is less than 3, checking s[i] == s[i+j-1] is sufficient
P[i][i+j-1] = s[i] == s[i+j-1] and (j < 3 or P[i+1][i+j-2])
if P[i][i+j-1] and j > len(longest):
longest = s[i:i+j]
return longest
I am trying to implement the same algorithm in top-down approach with memoization.
Question:
Is it possible to convert this algorithm to top-down approach?
There are many questions about longest palindromic substring, but they are mostly using this bottom-up approach. The answer in https://stackoverflow.com/a/29959104/6217326 seems to be the closest to what I have in mind. But the answer seems to be using different algorithm from this one (and much slower).

Here is my solution recursively:
Start with i = 0, j = max length
if(i,j) is palindrome: then max substring length is j-1.
else do recursion with (i+1,j) and (i, j-1) and take the Max between these two.
Code will explain more.
The code is in Java, but I hope it will give the idea how to implement it. #zcadqe wanted the idea regarding how to implement in Top-down approach. I gave the idea and as a bonus also giving the code of java for better understanding. Anyone who knows python can easily convert the code!
public class LongestPalindromeSubstringWithSubStr {
static String str;
static int maxLen;
static int startLen;
static int endLen;
static int dp[][];// 0: not calculaed. 1: from index i to j is palindrome
static boolean isPal(int i, int j) {
if (dp[i][j] != 0) {
System.out.println("Res found for i:" + i + " j: " + j);
return (dp[i][j] == 1);
}
if (i == j) {
dp[i][j] = 1;
return true;
}
if (i + 1 == j) {// len 2
if (str.charAt(i) == str.charAt(j)) {
dp[i][j] = 1;
return true;
}
dp[i][j] = -1;
return false;
}
if (str.charAt(i) == str.charAt(j)) {
boolean res = isPal(i + 1, j - 1);
dp[i][j] = (res) ? 1 : 0;
return res;
}
dp[i][j] = 0;
return false;
}
// update if whole string from i to j is palindrome
static void longestPalCalc(int i, int j) {
if (isPal(i, j)) {
if (j - i + 1 > maxLen) {// update res
maxLen = j - i + 1;
startLen = i;
endLen = j;
}
} else {
longestPalCalc(i + 1, j);
longestPalCalc(i, j - 1);
}
}
public static void main(String[] args) {
str = "abadbbda";
dp = new int[str.length()][str.length()];
longestPalCalc(0, str.length() - 1);
System.out.println("Longest: " + maxLen);
System.out.println(str.subSequence(startLen, endLen + 1));
}
}

the problem with top down approach here is that it's hard to implement topological order . You cant run 2 for loops and use memoization with it, as this Topological order (2 for loops) gives substrings but it isn't the right T.O for palindrome as palindrome of 3 digit requires info about it's inside palindrome always(of 1 digit in this case).to know if a _ _ a is palindrome or not you must know whether _ _ is palindrome or not. Thus the Topo order you require is : x,x,xx,xx,xx,xxx,xxx,xxxx,xxxxx substrings of increasing length.
I'll post Top Down approach when I code or get one.

I tried to code Junaed's java code to Python and it's running quite well on Leetcode but is getting Memory Limit Exceeded on one of the test cases. See if we can somehow modify this further to get a better result or if I missed something in it, please do correct me.
def longestPalindrome(self, s: str) -> str:
#lru_cache(maxsize=None)
def dp(i,j):
if i==j:
return True
if i+1==j:
if s[i]==s[j]:
return True
return False
if s[i]==s[j]:
return dp(i+1,j-1)
return False
self.maxlen=0
#lru_cache(maxsize=None)
def dp2(i,j):
if dp(i,j):
if (j-i+1 > self.maxlen):
self.maxlen=j-i+1
self.ans=s[i:j+1]
else:
dp2(i+1,j)
dp2(i,j-1)
self.ans=""
i=0
j=len(s)-1
dp2(i,j)
return self.ans

This problem can be solved by adding memorization to the brute force approach,
We need to generate each substring this will take O(n^2) time, and
we need to check whether the generated substring is a palindrome, this will take an additional O(n),
in total it will be an O(n^3) time complexity.
Now, adding and storing the states that we already encountered to speed up the process, the time complexity can be reduced by O(n). So the total time complexity will be O(n^2)
here's the solution:
class Solution:
def longestPalindrome(self, s: str) -> str:
memo = {}
def isPalindrome(left,right):
state = (left, right)
if state in memo: return memo[state]
if left >= right: return True
if s[left] != s[right]: return False
memo[state] = isPalindrome(left+1, right-1)
return memo[state]
N = len(s)
result = ''
for i in range(N):
for j in range(i,N):
if (j-i+1) > len(result) and isPalindrome(i,j):
result = s[i:j+1]
return result

#include<iostream>
#include<string>
#include<vector>
using namespace std;
bool isPalindrome(string str, int startIdx, int stopIdx, vector<vector<int>>& T) {
const int i = startIdx;
const int j = stopIdx - 1;
if (i == (j + 1)) {
return true;
}
if (i >= j) {
return false;
}
if (T[i][j] == -1) {
if (str[i] == str[j]) {
T[i][j] = isPalindrome(str, startIdx + 1, stopIdx - 1, T);
}
else {
T[i][j] = 0;
}
}
return (T[i][j] == 1);
}
string getLongestStr(string str, int startIdx, int stopIdx, vector<vector<int>>& T) {
if (isPalindrome(str, startIdx, stopIdx, T)) {
return str.substr(startIdx, (stopIdx - startIdx));
}
else {
string str1 = getLongestStr(str, startIdx + 1, stopIdx, T);
string str2 = getLongestStr(str, startIdx, stopIdx - 1, T);
return str1.size() > str2.size() ? str1 : str2;
}
return "";
}
string getLongestStr(string str) {
const int N = str.size();
vector<vector<int>> T(N, vector<int>(N, -1));
return getLongestStr(str, 0, N, T);
}
int main() {
string str = "forgeeksskeegfor";
//string str = "Geeks";
cout << getLongestStr(str) << endl;
return 0;
}

maximum of gcd of huge list of number [duplicate]

what is the fastest way to compute the greatest common divisor of n numbers?

Without recursion:
int result = numbers[0];
for(int i = 1; i < numbers.length; i++){
result = gcd(result, numbers[i]);
}
return result;
For very large arrays, it might be faster to use the fork-join pattern, where you split your array and calculate gcds in parallel. Here is some pseudocode:
int calculateGCD(int[] numbers){
if(numbers.length <= 2){
return gcd(numbers);
}
else {
INVOKE-IN-PARALLEL {
left = calculateGCD(extractLeftHalf(numbers));
right = calculateGCD(extractRightHalf(numbers));
}
return gcd(left,right);
}
}

You may want to sort the numbers first and compute the gcd recursively starting from the smallest two numbers.

C++17
I have written this function for calculating gcd of n numbers by using C++'s inbuilt __gcd(int a, int b) function.
int gcd(vector<int> vec, int vsize)
{
int gcd = vec[0];
for (int i = 1; i < vsize; i++)
{
gcd = __gcd(gcd, vec[i]);
}
return gcd;
}
To know more about this function visit this link .
Also refer to Dijkstra's GCD algorithm from the following link. It works without division. So it could be slightly faster (Please correct me if I am wrong.)

You should use Lehmer's GCD algorithm.

How about the following using Euclidean algorithm by subtraction:
function getGCD(arr){
let min = Math.min(...arr);
let max= Math.max(...arr);
if(min==max){
return min;
}else{
for(let i in arr){
if(arr[i]>min){
arr[i]=arr[i]-min;
}
}
return getGCD(arr);
}
}
console.log(getGCD([2,3,4,5,6]))
The above implementation takes O(n^2) time. There are improvements that can be implemented but I didn't get around trying these out for n numbers.

If you have a lot of small numbers, factorization may be actually faster.
//Java
int[] array = {60, 90, 45};
int gcd = 1;
outer: for (int d = 2; true; d += 1 + (d % 2)) {
boolean any = false;
do {
boolean all = true;
any = false;
boolean ready = true;
for (int i = 0; i < array.length; i++) {
ready &= (array[i] == 1);
if (array[i] % d == 0) {
any = true;
array[i] /= d;
} else all = false;
}
if (all) gcd *= d;
if (ready) break outer;
} while (any);
}
System.out.println(gcd);
(works for some examples, but not really tested)

Use the Euclidean algorithm :
function gcd(a, b)
while b ≠ 0
t := b;
b := a mod b;
a := t;
return a;
You apply it for the first two numbers, then the result with the third number, etc... :
read(a);
read(b);
result := gcd(a, b);
i := 3;
while(i <= n){
read(a)
result := gcd(result, a);
}
print(result);

Here below is the source code of the C program to find HCF of N numbers using Arrays.
#include<stdio.h>
int main()
{
int n,i,gcd;
printf("Enter how many no.s u want to find gcd : ");
scanf("%d",&n);
int arr[n];
printf("\nEnter your numbers below :- \n ");
for(i=0;i<n;i++)
{
printf("\nEnter your %d number = ",i+1);
scanf("%d",&arr[i]);
}
gcd=arr[0];
int j=1;
while(j<n)
{
if(arr[j]%gcd==0)
{
j++;
}
else
{
gcd=arr[j]%gcd;
i++;
}
}
printf("\nGCD of k no.s = %d ",gcd);
return 0;
}
For more refer to this website for further clarification.......

You can use divide and conquer. To calculate gcdN([]), you divide the list into first half and second half. if it only has one num for each list. you calculate using gcd2(n1, n2).
I just wrote a quick sample code. (assuming all num in the list are positive Ints)
def gcdN(nums):
n = len(nums)
if n == 0: return "ERROR"
if n == 1: return nums[0]
if n >= 2: return gcd2(gcdN(nums[:n//2]), gcdN(nums[n//2:]))
def gcd2(n1, n2):
for num in xrange(min(n1, n2), 0, -1):
if n1 % num == 0 and n2 % num == 0:
return num

Here's a gcd method that uses the property that gcd(a, b, c) = gcd(a, gcd(b, c)).
It uses BigInteger's gcd method since it is already optimized.
public static BigInteger gcd(BigInteger[] parts){
BigInteger gcd = parts[0];
for(int i = 1; i < parts.length; i++)
gcd = parts[i].gcd(gcd);
return gcd;
}

//Recursive solution to get the GCD of Two Numbers
long long int gcd(long long int a,long long int b)<br>
{
return b==0 ? a : gcd(b,a%b);
}
int main(){
long long int a,b;
cin>>a>>b;
if(a>b) cout<<gcd(a,b);
else cout<<gcd(b,a);
return 0;
}

import java.io.*;
import java.util.*;
import java.text.*;
import java.math.*;
import java.util.regex.*;
class GCDArray{
public static int [] extractLeftHalf(int [] numbers)
{
int l =numbers.length/2;
int arr[] = Arrays.copyOf(numbers, l+1);
return arr;
}
public static int [] extractRightHalf(int [] numbers)
{
int l =numbers.length/2;
int arr[] = Arrays.copyOfRange(numbers,l+1, numbers.length);
return arr;
}
public static int gcd(int[] numbers)
{
if(numbers.length==1)
return numbers[0];
else {
int x = numbers[0];
int y = numbers[1];
while(y%x!=0)
{
int rem = y%x;
y = x;
x = rem;
}
return x;
}
}
public static int gcd(int x,int y)
{
while(y%x!=0)
{
int rem = y%x;
y = x;
x = rem;
}
return x;
}
public static int calculateGCD(int[] numbers){
if(numbers.length <= 2){
return gcd(numbers);
}
else {
int left = calculateGCD(extractLeftHalf(numbers));
int right = calculateGCD(extractRightHalf(numbers));
return gcd(left,right);
}
}
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
int n = sc.nextInt();
int arr[] = new int[n];
for(int i=0;i<n;i++){
arr[i]=sc.nextInt();
}
System.out.println(calculateGCD(arr));
}
}
**
Above is the java working code ..... the pseudo code of which is
already mention by https://stackoverflow.com/users/7412/dogbane
**

A recursive JavaScript (ES6) one-liner for any number of digits.
const gcd = (a, b, ...c) => b ? gcd(b, a % b, ...c) : c.length ? gcd(a, ...c) : Math.abs(a);

This is what comes off the top of my head in Javascript.
function calculateGCD(arrSize, arr) {
if(!arrSize)
return 0;
var n = Math.min(...arr);
for (let i = n; i > 0; i--) {
let j = 0;
while(j < arrSize) {
if(arr[j] % i === 0) {
j++;
}else {
break;
}
if(j === arrSize) {
return i;
}
}
}
}
console.log(generalizedGCD(4, [2, 6, 4, 8]));
// Output => 2

Here was the answer I was looking for.
The best way to find the gcd of n numbers is indeed using recursion.ie gcd(a,b,c)=gcd(gcd(a,b),c). But I was getting timeouts in certain programs when I did this.
The optimization that was needed here was that the recursion should be solved using fast matrix multiplication algorithm.

Print number of ways for non-consecutive one's

Given a positive integer N, print all integers between 1 and 2^N such that there is no consecutive 1’s in its Binary representation.
I have below code but it is printing duplicate sometimes. Is it possible to print without duplicates?
#include <stdio.h>
int a[100];
void foo(int i, int size)
{
if (i >= size) {
int i;
for (i=0;i<size;i++)
printf("%d\n", a[i]);
printf("----\n");
return;
}
if (a[i-1] == 1 || a[i-1] == 0)
a[i] = 0;
foo(i+1, size);
if (a[i-1] == 0)
a[i] = 1;
foo(i+1, size);
}
int main(void) {
int i = 0;
int size = 5;
a[i] = 1;
foo(1, size);
return 0;
}
I have this http://ideone.com/cT4Hco python program which uses hash maps to print the elements but I think we can do this without hashmaps also.

Couple of notes:
you shouldn't start the backtracking from index 1. Instead, start from 0 since your numbers would be in the range [0, n-1] in array a
you shouldn't initialize a[0] to 1 since a[0] = 0 is also a valid case.
if (a[i-1] == 1 || a[i-1] == 0) is redundant
Code:
#include <stdio.h>
int a[100];
void foo(int i, int size)
{
if (i >= size) {
int i;
for (i=0;i<size;i++)
printf("%d ", a[i]);
printf("\n----\n");
return;
}
a[i] = 0;
foo(i+1, size);
if ( i == 0 || a[i-1] == 0) {
a[i] = 1;
foo(i+1, size);
}
}
int main(void) {
int i = 0;
int size = 5;
foo(0, size);
return 0;
}
You might also want to filter the solution 0 0 0 ... 0 during the printing since you need only the numbers from 1 to 2^n. If 2^n is included you should also print it. The backtracking considers the numbers 0, ...., 2^n-1

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Efficient set union and intersection in C++ - python

Related

translating C++ for loop to python

Leetcode 200. Number of Islands TLE

Longest palindromic substring top down dynamic programming

maximum of gcd of huge list of number [duplicate]

Print number of ways for non-consecutive one's

Categories

Resources