like I got some idea to create Instagram user finder from name and last name
i know it will be not that accurate but i just want to create so
import random
first_name="abc"
last_name="xyz"
first_name_list=list(first_name)
last_name_list=list(last_name)
some_extra=[".","_","-","#"]
numbers = []
for i in range(10001):
numbers.append(i)
genuser=[]
genuser_len = len(genuser)
for i in range(genuser_len):
user_name=genuser[l]
I want to create a bunch of username with this information
now I have no idea how to generate list
please help me
i want to generate usernames like this a_xyz123, abc_xyz, abc.xyz, a.xyz023
You can try this.
import random
first_name="abc"
last_name="xyz"
first_name_list=list(first_name)
first_name_list.append(first_name)
last_name_list=list(last_name)
last_name_list.append(last_name)
some_extra=[".","_","-","#"]
numbers = [i for i in range(10001)]
genuser=[]
genuser_len = len(genuser)
for i in numbers:
list1=[]
x=random.choice(first_name_list) #== Random choice from first_name_list
y=random.choice(last_name_list) #== Random choice from last_name_list
z=random.choice(some_extra) #== Random choice from some_extra
x1=random.randint(1,10001)
list1.append(x)
list1.append(y)
list1.append(str(x1))
random.shuffle(list1) #=== Shuffle list
k=random.randint(1,2)
list1.insert(k,z)
print(''.join(list1))
Sample output:
abc120-xyz
4132#xa
c5336-y
8355_cy
abcz#3168
a-z9931
a.783x
9323x.c
x1980.c
z_3948a
This is a permutations problem, and the output result is going to be enormously large for the given requirement, you can try something like this:
from itertools import permutations
list(map(lambda x: ''.join(map(str,x)),permutations(first_name_list+last_name_list+some_extra+numbers, 3)))
PS: To avoid lage computational time and high memory usage for this example, I'm using range(11) for the numbers list, and I'm using 3 as the width of user names generated which is passed to permutations function above.
SMALL SLICE OF OUTPUT:
['ab2', 'ab3', 'ab4', 'ab5', 'ab6', 'ab7', 'ab8', 'ab9', 'ab10', 'acb', 'acx', 'acy', 'acz', 'ac.', 'ac_', 'ac-', 'ac#', 'ac0', 'ac1', 'ac2']
Related
I wanted to create random 16bytes long strings that only include "A's" and "2's" to decrypt my AES-encrypted ciphertext. How do I achieve this with python?
Using cryptographically-secure randomness
from random import SystemRandom
sr = SystemRandom()
l = (b"A", b"2")
result = b"".join(l[sr.randrange(0, 2)] for i in range(16))
print(result)
For modeling or simulations (or most use cases), you can just use randrange() without SystemRandom.
Example Output
b'2A22AA2222AAA222'
I am running the following python script:
import random
result_str = ''.join((random.choice('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!##$%^&*()') for i in range(8)))
with open('file_output.txt','a') as out:
out.write(f'{result_str}\n')
Is there a way I could automate this script to run automatically? or If I can get multiple outputs instantly?
Ex. Right now the output stores itself in the file one by one
kmfd5s6s
But if somehow I can get 1,000,000 entries in the file on one click and there is no duplication.
Same logic as given by PangolinPaws,but since you require it for a 1,000,000 entries, which is quite large, using numpy could be more effecient. Also, replacing random.choice() with random.choices() with k=8, inorder to avoid the for loop to generate the string.
import random
import numpy as np
a = np.array([])
for i in range(1000000):
str = ''.join((random.choices('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!##$%^&*()', k = 8)))
if str not in a:
a = np.append(a,str)
np.savetxt("generate_strings.csv", a, fmt='%s')
You need to nest your out.write() in a loop, something like this, to make it happen multiple times:
import random
with open('file_output.txt','a') as out:
for x in range(1000): # the number of lines you want in the output file
result_str = ''.join((random.choice('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!##$%^&*()') for i in range(8)))
out.write(f'{result_str}\n')
However, while unlikely, it is possible that you could end up with duplicate rows. To avoid this, you can generate and store your random strings in a loop and check for duplicates as you go. Once you have enough, write them all to the file outside the loop:
import random
results = []
while len(results) < 1000: # the number of lines you want in the output file
result_str = ''.join((random.choice('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!##$%^&*()') for i in range(8)))
if result_str not in results: # check if the generated result_str is a duplicate
results.append(result_str)
with open('file_output.txt','a') as out:
out.write( '\n'.join(results) )
Update: This can not be solved 100% since the number of merchants each user must receive is different. So some users might end up getting the same merchants as before. However, is it possible to let them get the same merchants, if there are not any other different merchants available?
I have the following excel file:
What I would like to do is to redistribute the merchants (Mer_id) so each user (Origin_pool) gets the same number of merchants as before, but a different set of merchants. For example, after the redistribution, Nick will receive 3 Mer_id's but not: 30303, 101020, 220340. Anna will receive 4 merchants but not 23401230,310231, 2030230, 2310505 and so on. Of course, one merchant can not be assigned to more than one person.
What I did so far is to find the total number of merchants each user must receive and randomly give them one mer_id that is not previously assigned to them. After I find a different mer_id I remove it from the list, so the other users won't receive the same merchant:
import pandas as pd
import numpy as np
df=pd.read_excel('dup_check_origin.xlsx')
dfcounts=df.groupby(['Origin_pool']).size().reset_index(name='counts')
Origin_pool=list(dfcounts['Origin_pool'])
counts=list(dfcounts['counts'])
dict_counts = dict(zip(Origin_pool, counts))
dest_name=[]
dest_mer=[]
for pool in Origin_pool:
pername=0
#for j in range(df.shape[0]):
while pername<=dict_counts[pool]:
rn=random.randint(0,df.shape[0]-1)
rid=df['Mer_id'].iloc[rn]
if (pool!=df['Origin_pool'].iloc[rn]):
#new_dict[pool]=rid
pername+=1
dest_name.append(pool)
dest_mer.append(rid)
df=df.drop(df.loc[df['Mer_id']==rid].index[0])
But it is not efficient at all, given the fact that in the future I might have more data than 18 rows.
Is there any library that does this or a way to make it more efficient?
Several days after your question, but I think it's a bullet proof code.
You can manage to create a function or class with the entire code.
I only created one, which is a recursive one, to handle the leftovers.
There are 3 lists, initialized at the beginning of the code:
pairs -> it returns your pool list (final one)
reshuffle -> it returns the pairs pool generated randomly and already appeared at pool pairs in the excel
still -> to handle the repeated pool pairs inside the function pullpush
The pullpsuh function comes first, because it will be called in different situations.
The first part of the program is a random algorithm to make pairs from mer_id(merchants) and origin_pool(poolers).
If the pair is not in the excel than it goes to the pairs list, otherwise they go to the reshuffle list.
Depending on the reshuffle characteristics another random algorithm is called or it will be processed by pullpush function.
If you execute the code once, as it is, and print(pairs) you may find a list with 15, 14 any more pool pairs lesser than 18.
Then, if you print(reshuffle) you will see the rest of the pairs to make 18.
To get the full 18 matchings in the pairs variable you must run:
pullpush(reshuffle).
The output here was obtained running the code followed by:
pullpush(reshuffle)
If you want to control that mer_id and origin_pool should not repeat for 3 rounds, you can load other 2 excels and split
them into oldpair2 and oldpair3.
[[8348201, 'Anna'], [53256236, 'Anna'], [9295, 'Anna'], [54240, 'Anna'], [30303, 'Marios'], [101020, 'Marios'], [959295, 'Marios'], [2030230, 'George'], [310231, 'George'], [23401230, 'George'], [2341134, 'Nick'], [178345, 'Marios'], [220340, 'Marios'], [737635, 'George'], [[2030230, 'George'], [928958, 'Nick']], [[5560503, 'George'], [34646, 'Nick']]]
The code:
import pandas as pd
import random
df=pd.read_excel('dup_check_origin.xlsx')
oldpair = df.values.tolist() #check previous pooling pairs
merchants = df['Mer_id'].values.tolist() #convert mer_id in list
poolers = df['Origin_pool'].values.tolist() #convert mer_id in list
random.shuffle(merchants) #1st step shuffle
pairs = [] #empty pairs list
reshuffle = [] #try again
still = [] #same as reshuffle for pullpush
def pullpush(repetition):
replacement = repetition #reshuffle transfer
for re in range(len(replacement)):
replace = next(r for r in pairs if r not in replacement)
repair = [[replace[0],replacement[re][1]],
[replacement[re][0],replace[1]]]
if repair not in oldpair:
iReplace = pairs.index(replace)#get index of pair
pairs.append(repair)
del pairs[iReplace] # remove from pairs
else:
still.append(repair)
if still:
pullpush(still) #recursive call
for p in range(len(poolers)):#avoid more merchants than poolers
pair = [merchants[p],poolers[p]]
if pair not in oldpair:
pairs.append(pair)
else:
reshuffle.append(pair)
if reshuffle:
merchants_bis = [x[0] for x in reshuffle]
poolers_bis = [x[1] for x in reshuffle]
if len(reshuffle) > 2: #shuffle needs 3 or more elements
random.shuffle(merchants_bis)
reshuffle = [] #clean before the loop
for n in range(len(poolers_bis)):
new_pair = [merchants_bis[n],poolers_bis[n]]
if new_pair not in oldpair:
pairs.append(new_pair)
else:
reshuffle.append(new_pair)
if len(reshuffle) == len(poolers_bis):#infinite loop
pullpush(reshuffle)
# double pairs and different poolers
elif (len(reshuffle) == 2 and not[i for i in reshuffle[0] if i in reshuffle[1]]):
merchants_bis = [merchants_bis[1],merchants_bis[0]]
new_pair = [[merchants_bis[1],poolers_bis[0]],
[merchants_bis[0],poolers_bis[1]]]
if new_pair not in oldpair:
pairs.append(new_pair)
else:
reshuffle.append(new_pair)
pullpush(reshuffle)
else: #one left or same poolers
pullpush(reshuffle)
My solution using dictionaries and lists, i print the result, but you can create a new dataframe with that.
from random import shuffle
import pandas as pd
df = pd.read_excel('dup_check_origin.xlsx')
dpool = {}
mers = list(df.Mer_id.unique())
shuffle(mers)
for pool in df.Origin_pool.unique():
dpool[pool] = list(df.Mer_id[df.Origin_pool == pool])
for key in dpool.keys():
inmers = dpool[key]
cnt = len(inmers)
new = [x for x in mers if x not in inmers][:cnt]
mers = [x for x in mers if x not in new]
print(key, new)
I want to make a list of elements where each element starts with 4 numbers and ends with 4 letters with every possible combination. This is my code
import itertools
def char_range(c1, c2):
"""Generates the characters from `c1` to `c2`"""
for c in range(ord(c1), ord(c2)+1):
yield chr(c)
chars =list()
nums =list()
for combination in itertools.product(char_range('a','b'),repeat=4):
chars.append(''.join(map(str, combination)))
for combination in itertools.product(range(10),repeat=4):
nums.append(''.join(map(str, combination)))
c = [str(x)+y for x,y in itertools.product(nums,chars)]
for dd in c:
print(dd)
This runs fine but when I use a bigger range of characters, such as (a-z) the program hogs the CPU and memory, and the PC becomes unresponsive. So how can I do this in a more efficient way?
The documentation of itertools says that "it is roughly equivalent to nested for-loops in a generator expression". So itertools.product is never an enemy of memory, but if you store its results in a list, that list is. Therefore:
for element in itertools.product(...):
print element
is okay, but
myList = [element for itertools.product(...)]
or the equivalent loop of
for element in itertools.product(...):
myList.append(element)
is not! So you want itertools to generate results for you, but you don't want to store them, rather use them as they are generated. Think about this line of your code:
c = [str(x)+y for x,y in itertools.product(nums,chars)]
Given that nums and chars can be huge lists, building another gigantic list of all combinations on top of them is definitely going to choke your system.
Now, as mentioned in the comments, if you replace all the lists that are too fat to fit into the memory with generators (functions that just yield), memory is not going to be a concern anymore.
Here is my full code. I basically changed your lists of chars and nums to generators, and got rid of the final list of c.
import itertools
def char_range(c1, c2):
"""Generates the characters from `c1` to `c2`"""
for c in range(ord(c1), ord(c2)+1):
yield chr(c)
def char(a):
for combination in itertools.product(char_range(str(a[0]),str(a[1])),repeat=4):
yield ''.join(map(str, combination))
def num(n):
for combination in itertools.product(range(n),repeat=4):
yield ''.join(map(str, combination))
def final(one,two):
for foo in char(one):
for bar in num(two):
print str(bar)+str(foo)
Now let's ask what every combination of ['a','b'] and range(2) is:
final(['a','b'],2)
Produces this:
0000aaaa
0001aaaa
0010aaaa
0011aaaa
0100aaaa
0101aaaa
0110aaaa
0111aaaa
1000aaaa
1001aaaa
1010aaaa
1011aaaa
1100aaaa
1101aaaa
1110aaaa
1111aaaa
0000aaab
0001aaab
0010aaab
0011aaab
0100aaab
0101aaab
0110aaab
0111aaab
1000aaab
1001aaab
1010aaab
1011aaab
1100aaab
1101aaab
1110aaab
1111aaab
0000aaba
0001aaba
0010aaba
0011aaba
0100aaba
0101aaba
0110aaba
0111aaba
1000aaba
1001aaba
1010aaba
1011aaba
1100aaba
1101aaba
1110aaba
1111aaba
0000aabb
0001aabb
0010aabb
0011aabb
0100aabb
0101aabb
0110aabb
0111aabb
1000aabb
1001aabb
1010aabb
1011aabb
1100aabb
1101aabb
1110aabb
1111aabb
0000abaa
0001abaa
0010abaa
0011abaa
0100abaa
0101abaa
0110abaa
0111abaa
1000abaa
1001abaa
1010abaa
1011abaa
1100abaa
1101abaa
1110abaa
1111abaa
0000abab
0001abab
0010abab
0011abab
0100abab
0101abab
0110abab
0111abab
1000abab
1001abab
1010abab
1011abab
1100abab
1101abab
1110abab
1111abab
0000abba
0001abba
0010abba
0011abba
0100abba
0101abba
0110abba
0111abba
1000abba
1001abba
1010abba
1011abba
1100abba
1101abba
1110abba
1111abba
0000abbb
0001abbb
0010abbb
0011abbb
0100abbb
0101abbb
0110abbb
0111abbb
1000abbb
1001abbb
1010abbb
1011abbb
1100abbb
1101abbb
1110abbb
1111abbb
0000baaa
0001baaa
0010baaa
0011baaa
0100baaa
0101baaa
0110baaa
0111baaa
1000baaa
1001baaa
1010baaa
1011baaa
1100baaa
1101baaa
1110baaa
1111baaa
0000baab
0001baab
0010baab
0011baab
0100baab
0101baab
0110baab
0111baab
1000baab
1001baab
1010baab
1011baab
1100baab
1101baab
1110baab
1111baab
0000baba
0001baba
0010baba
0011baba
0100baba
0101baba
0110baba
0111baba
1000baba
1001baba
1010baba
1011baba
1100baba
1101baba
1110baba
1111baba
0000babb
0001babb
0010babb
0011babb
0100babb
0101babb
0110babb
0111babb
1000babb
1001babb
1010babb
1011babb
1100babb
1101babb
1110babb
1111babb
0000bbaa
0001bbaa
0010bbaa
0011bbaa
0100bbaa
0101bbaa
0110bbaa
0111bbaa
1000bbaa
1001bbaa
1010bbaa
1011bbaa
1100bbaa
1101bbaa
1110bbaa
1111bbaa
0000bbab
0001bbab
0010bbab
0011bbab
0100bbab
0101bbab
0110bbab
0111bbab
1000bbab
1001bbab
1010bbab
1011bbab
1100bbab
1101bbab
1110bbab
1111bbab
0000bbba
0001bbba
0010bbba
0011bbba
0100bbba
0101bbba
0110bbba
0111bbba
1000bbba
1001bbba
1010bbba
1011bbba
1100bbba
1101bbba
1110bbba
1111bbba
0000bbbb
0001bbbb
0010bbbb
0011bbbb
0100bbbb
0101bbbb
0110bbbb
0111bbbb
1000bbbb
1001bbbb
1010bbbb
1011bbbb
1100bbbb
1101bbbb
1110bbbb
1111bbbb
Which is the exact result you are looking for. Each element of this result is generated on the fly, hence never creates a memory problem. You can now try and see that much bigger operations such as final(['a','z'],10) are CPU-friendly.
I'm new to python and coding (last night). I need to generate a very large number of itertools products with a specific format for the output. I can generate the combinations using,
import itertools
s=[ ['CPT1','OTHERCPT1','OTHERCPT2','OTHERCPT3','OTHERCPT4','OTHERCPT5','OTHERCPT6','OTHERCPT7','OTHERCPT8','OTHERCPT9','OTHERCPT10','CONCURR1','CONCURR2','CONCURR3','CONCURR4','CONCURR5','CONCURR6','CONCURR7','CONCURR8','CONCURR9','CONCURR10'], ['15756','15757','15758','43496','49006','20969','20955','20956','20957','20962','20970','20972','20973'],['CPT1','OTHERCPT1','OTHERCPT2','OTHERCPT3','OTHERCPT4','OTHERCPT5','OTHERCPT6','OTHERCPT7','OTHERCPT8','OTHERCPT9','OTHERCPT10','CONCURR1','CONCURR2','CONCURR3','CONCURR4','CONCURR5','CONCURR6','CONCURR7','CONCURR8','CONCURR9','CONCURR10'], ['15756','15757','15758','43496','49006','20969','20955','20956','20957','20962','20970','20972','20973']]
x=list(itertools.product(*s))
print x
however the output appears as such:
('CPT1', '15756', 'CPT1', '15756'), ... etc.
I would like it to appear:
SELECT IF(CPT1='15756' AND CPT1='15756').
SELECT IF(...).
etc.
Thanks for your help!
You should use string formatting (https://docs.python.org/2/library/string.html)
import itertools
s = [[...first list...],[...second list...]]
for p in itertools.product(*s):
print("SELECT IF(CPT1='{}' AND CPT1='{}').".format(*p))