I am trying to export a voucher code for a flyer with Excel. I want to create 15000 rows with random voucher codes. So far I have made below, but how can I make it create 15000 voucher codes for each row?
Thanks a lot.
import random
import string
import pandas as pd
def random_string_generator(str_size, allowed_chars):
return ''.join(random.choice(allowed_chars) for x in range(str_size))
chars = string.ascii_uppercase
size = 5
for i in range(15000):
print (random_string_generator(size, chars)+"-"+random_string_generator(size, chars)+"-"+random_string_generator(size, chars)+"-"+random_string_generator(size, chars)+"-"+random_string_generator(size, chars))
Here's your original code with a few tweaks:
import random
import string
chars = string.ascii_uppercase
amount_of_vouchers = 10
segments_per_voucher = 5
chars_per_segment = 5
def random_string_generator(allowed_chars, str_size):
return ''.join(random.choices(allowed_chars, k=str_size))
vouchers = []
for i in range(amount_of_vouchers):
voucher = [random_string_generator(chars, chars_per_segment) for j in range(segments_per_voucher)]
vouchers.append('-'.join(voucher))
print('\n'.join(vouchers))
What I've done:
Instead of concatenating the function calls on one line, I've changed this to a loop. This is easier to read, easier to change and shorter.
Added a loop around the code generation, so that we can create multiple vouchers
Vouchers are stored in the imaginatively named vouchers array.
Changed random.choice to random.choices, which allows us to generate the entire segment at once, rather than per character.
Example output:
RIRSE-BURXY-NTBFP-VZTBC-LNQYD
OWTSZ-AIUPS-POXMW-PQXJY-DUXUE
BFDJI-ASLPZ-XIRKR-ZKVLB-YGRCA
SQTHJ-DYJYL-IZQFD-EFBJO-OWPHO
OWPWW-PJGNY-BOCZM-ANNLJ-CFXKY
NHQUN-MMBQB-KHLYL-ZQVTD-TDUQC
MNOYT-WAVWV-QSUND-RYKHB-TNUCF
OAHOR-DPJFN-RQYHE-GUSVF-CPCBF
OFNHT-LCARH-EZDWT-YRLLI-IWJZW
NXLKI-GCJDM-QZGPU-MIZCC-XSOQD
Related
I am running the following python script:
import random
result_str = ''.join((random.choice('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!##$%^&*()') for i in range(8)))
with open('file_output.txt','a') as out:
out.write(f'{result_str}\n')
Is there a way I could automate this script to run automatically? or If I can get multiple outputs instantly?
Ex. Right now the output stores itself in the file one by one
kmfd5s6s
But if somehow I can get 1,000,000 entries in the file on one click and there is no duplication.
Same logic as given by PangolinPaws,but since you require it for a 1,000,000 entries, which is quite large, using numpy could be more effecient. Also, replacing random.choice() with random.choices() with k=8, inorder to avoid the for loop to generate the string.
import random
import numpy as np
a = np.array([])
for i in range(1000000):
str = ''.join((random.choices('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!##$%^&*()', k = 8)))
if str not in a:
a = np.append(a,str)
np.savetxt("generate_strings.csv", a, fmt='%s')
You need to nest your out.write() in a loop, something like this, to make it happen multiple times:
import random
with open('file_output.txt','a') as out:
for x in range(1000): # the number of lines you want in the output file
result_str = ''.join((random.choice('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!##$%^&*()') for i in range(8)))
out.write(f'{result_str}\n')
However, while unlikely, it is possible that you could end up with duplicate rows. To avoid this, you can generate and store your random strings in a loop and check for duplicates as you go. Once you have enough, write them all to the file outside the loop:
import random
results = []
while len(results) < 1000: # the number of lines you want in the output file
result_str = ''.join((random.choice('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!##$%^&*()') for i in range(8)))
if result_str not in results: # check if the generated result_str is a duplicate
results.append(result_str)
with open('file_output.txt','a') as out:
out.write( '\n'.join(results) )
Issue is with the loop
I can't iterate and check the value from solu with dgu list.
It prints above output upto print(solu)
The loop used later lags and stops there with no output and I'm clueless here.
Could Someone explain how to compare strings if they exist in two different files from different sources?
from pandas import *
import pandas as pd
import csv
import re
import deepdiff
from pprint import pprint
import xlrd
from difflib import SequenceMatcher
import xlsxwriter
import tocamelcase
from spellchecker import SpellChecker
import numpy as np
xlsx = ExcelFile('WrongSpelling.xlsx')
df = xlsx.parse(xlsx.sheet_names[0])
dg = pd.read_csv("pfm.csv", usecols = ['Place Id','Name','Category'])
pla = dg['Place Id'].values.tolist()
nam = dg['Name'].values.tolist()
cat = dg['Category'].values.tolist()
print()
df2 = pd.DataFrame(df, columns = ['Spelling'])
bat= df2['Spelling'].values.tolist()
namo = [x.lower() for x in nam]
bato = [x.lower() for x in bat]
sol = set(namo) & set(bato)
solu = list(sol)
dgu= dg.values.tolist()
nam=list(nam)
print(solu)
print()
print("The Count of Matches with the incorrect data is" ,len(solu))
print(dg[:5])
print()
while i < len(dgu):
while i < len(solu):
# a = solu[i]
# b = dgu[i]
# c = nam[i]
if solu[i] in dgu[i]:
print(dgu[i])
else:
pass
i+=1
Your inner while loop is using the variable i as the conditional to when it passes the length of solu, but you enver increment within that while loop, so it will loop forever checking for i < len(solu) which will never evaluate to False if it enters the loop the first time.
As #offeltoffel mentioned, for loop seems to fit your need better here. Without being able to compile your code without a verifiable example, here is what the for loop could look like:
for i in range(len(dgu):
for j in range(len(solu)):
if solu[j] in dgu[i]:
print(dgu[i])
# don't need elsepass here, as it serves no purpose
# don't need to increment i/j in a for loop manually as it iterates through the range created from the length of dgu/solu
I have this block of code that does several things; I loops through files I have saved in a folder that are labeled 1-100. These files are all forecast files for a specific month (example june 2016). What this function does is read all the files and goes to previous files to conduct forecasts. I store all the values by different months. I want to see the totals for how accurate predictions were for "one month ago", "two months ago", etc. I am able to do this with the code, however I am having trouble extracting the exact values that contribute to this total using arrays/lists. The portion that does not have to do with arrays or lists work, but I am wondering how I can extract these specific numbers. I would want to use the append numbers (the list) for graphing purposes later, that is why I am extracting it, the # with ?'s indicate the list portion that does not seem to work
import pandas as pd
import csv
def nmonthaccuracy(basefilenumber, n):
basefileread = pd.read_csv(str(basefilenumber)+'.csv', encoding='Latin-1')
basefilevalue = basefileread.loc[basefileread['Customer'].str.contains('Customer A', na=False), 'Jun-16\nQty']
nmonthread = pd.read_csv(str(basefilenumber-n)+'.csv', encoding = 'Latin-1')
nmonthvalue = nmonthread.loc[nmonthread['Customer'].str.contains('Customer A', na=False), 'Jun-16\nQty']
return int(nmonthvalue)/int(basefilevalue)
N = 12
total_by_month = [0] * N
total_by_month_list [] * N #????
for basefilenumber in range(24,36):
for n in range(N):
total_by_month[n] += nmonthaccuracy(basefilenumber, n)
total_by_month_list[n].append(nmonthaccuracy(basefilenumber,n)) #????
onetotal = total_by_month[1]
twototal = total_by_month[2]
#etc
Try running your code by initializing total_by_month_list as
total_by_month_list = [[] for _ in range(N)]
Without your data, it's currently speculative. What I understood is that total_by_month_list should be a list of 12 sublists.
I use python for teaching some of my science courses, where I use it to generate unique assignments and tests for students. I've run into an issue that I can't sort out on my own.
I'm trying to make a series of nested lists. For example, I would like to have a numbered question, and then sub parts to the question underneath. For example:
Use the Henderson-Hasselbalch equation to determine pH of the following solutions:
A. 250 mM Ammonium Chloride
B. 100 mM Acetic Acid
I've used style "List Number" to create the numbered list, but I can't figure out how to create a custom list that starts with the letters.
Here is what I've got so far:
import sys
import os
if os.uname()[1] == 'iMac':
sys.path.append("/Users/mgreene3/Library/Python/2.7/lib/python/site-packages")
else:
sys.path.append("/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python")
import numpy as np
import math
import random
import textwrap
from docx import Document
from docx.shared import Pt, Inches
from docx.enum.style import WD_STYLE_TYPE
from docx.text.tabstops import TabStop as ts
from docx.text.parfmt import ParagraphFormat
assignment = Document()
ordered = "a"
style = assignment.styles["Normal"]
font = style.font
font.name = "Calibri"
font.size = Pt(12)
style.paragraph_format.space_after = Pt(0)
LetteredList = style.paragraph_format._NumberingStyle(ordered)
sub_style = assignment.styles["ListBullet"]
sub_font = sub_style.font
sub_font.name = "Calibri"
###sub_style.paragraph_format.style("List")
sub_font.size = Pt(12)
sub_style.paragraph_format.left_indent = Inches(1)
sub_style.paragraph_format.space_before = Pt(0)
sub_style.paragraph_format.space_after = Pt(40)
doc_heading = assignment.add_paragraph("Name:_______________________")
doc_heading.add_run("\t" * 4)
doc_heading.add_run(" " * 12)
doc_heading.add_run("BIOL444: Biochemistry\t\t\t\t\t\t ")
doc_heading.add_run("\n")
doc_heading.add_run("Take Home 1, v.")
doc_heading.add_run((str(1).zfill(2)))
doc_heading.add_run("\n" * 2)
doc_heading.add_run("Instructions: Complete test (")
show_work = doc_heading.add_run("show work")
show_work.bold = True
show_work.underline = True
show_work
doc_heading.add_run("), submit ")
hard_copy = doc_heading.add_run("hard copy")
hard_copy.bold = True
hard_copy.underline = True
hard_copy
doc_heading.add_run(" by ")
doc_heading.add_run("11:59 pm, Friday, February 10").bold =True
doc_heading.add_run(". Late submissions will ")
doc_heading.add_run("NOT").bold=True
doc_heading.add_run(" be accepted.")
question1 = assignment.add_paragraph("Using the data for K", style = "List Number")
question1.add_run("a").font.subscript = True
question1.add_run(" and pK")
question1.add_run("a").font.subscript = True
question1.add_run(" of the following compounds, calculate the concentrations (M) of all ionic species as well as the pH of the following aqueous solutions: ")
question1.add_run("\n")
question1a = assignment.add_paragraph("100 mM Acetic acid", style = sub_style)
question1b = assignment.add_paragraph("250 mM NaOH", style = sub_style)
assignment.save("TestDocx.docx")
The short answer is that it's probably more trouble than it's worth. Creating numbered lists, especially nested numbered lists in Word is a complex operation, possibly for legacy reasons (we're on version 14 or something of Word). Partly because of this complexity, API support for this doesn't yet exist in python-docx.
If you really wanted to do it, it would entail manipulating numbering definitions that exist in another package part from the document part (I believe it's numbering.xml). This would be using low-level lxml calls.
For myself, I'd be strongly inclined to use RestructuredText for a job like this, rendering to PDF, perhaps using Sphinx. As a side-effect, you could easily get HTML version as well for posting assignments on the web. However, I'm too far away from your actual requirements to say that would really suit; you'll have to check it out and see for yourself :)
I have imported a data set for a Machine Learning project. I need each "Neuron" in my first input layer to contain one numerical piece of data. However, I have been unable to do this. Here is my code:
import math
import numpy as np
import pandas as pd; v = pd.read_csv('atestred.csv',
error_bad_lines=False).values
rw = 1
print(v)
for x in range(0,10):
rw += 1
s = (v[rw])
list(s)
#s is one row of the dataset
print(s)#Just a debug.
myvar = s
class l1neuron(object):
def gi():
for n in range(0, len(s)):
x = (s[n])
print(x)#Just another debug
n11 = l1neuron
n11.gi()
What I would ideally like is a variant of this where the code creates a new variable for every new row it extracts from the data(what I try to do in the first loop) and a new variable for every piece of data extracted from each row (what I try to do in the class and second loop).
If I have been completely missing the point with my code then feel free to point me in the right direction for a complete re-write.
Here are the first few rows of my dataset:
fixed acidity;"volatile acidity";"citric acid";"residual sugar";"chlorides";"free sulfur dioxide";"total sulfur dioxide";"density";"pH";"sulphates";"alcohol";"quality"
7.4;0.7;0;1.9;0.076;11;34;0.9978;3.51;0.56;9.4;5
7.8;0.88;0;2.6;0.098;25;67;0.9968;3.2;0.68;9.8;5
7.8;0.76;0.04;2.3;0.092;15;54;0.997;3.26;0.65;9.8;5
Thanks in advance.
If I understand your problem well, you would like to convert each row in your csv-table into a separate variable, that in turn holds all the values of that row.
Here is an example of how you might approach this. There are many ways to that end, and others may be more efficient, faster, more pythonic, hipper or whatever. But the code below was written to help you understand how to store tabellic data into named variables.
Two remarks:
if reading the data is the only thing you need pandas for, you might look for a less complex solution
the L1Neuron-class is not very transparant while it's members cannot be read from code, but instead are created runtime by the list of variables in attrs. You may want to have a look at namedTuples for better readability instead.
`
import pandas as pd
from io import StringIO
import numbers
# example data:
atestred = StringIO("""fixed acidity;volatile acidity;citric acid;\
residual sugar;chlorides;free sulfur dioxide;total sulfur dioxide;\
density;pH;sulphates;alcohol;quality
7.4;0.7;0;1.9;0.076;11;34;0.9978;3.51;0.56;9.4;5
7.8;0.88;0;2.6;0.098;25;67;0.9968;3.2;0.68;9.8;5
7.8;0.76;0.04;2.3;0.092;15;54;0.997;3.26;0.65;9.8;5
""")
# read example data into dataframe 'data'; extract values and column names:
data = pd.read_csv(atestred, error_bad_lines=False, sep=';')
colNames = list(data)
class L1Neuron(object):
"neuron class that holds the variables of one data line"
def __init__(self, **attr):
"""
attr is a dict (like {'alcohol': 12, 'pH':7.4});
every pair in attr will result in a member variable
of this object with that name and value"""
for name, value in attr.items():
setattr(self, name.replace(" ", "_"), value)
def gi(self):
"print all numeric member variables whose names don't start with an underscore:"
for v in sorted(dir(self)):
if not v.startswith('_'):
value = getattr(self, v)
if isinstance(value, numbers.Number):
print("%-20s = %5.2f" % (v, value))
print('-'*50)
# read csv into variables (one for each line):
neuronVariables = []
for s in data.values:
variables = dict(zip(colNames, s))
neuron = L1Neuron(**variables)
neuronVariables.append(neuron)
# now the variables in neuronVariables are ready to be used:
for n11 in neuronVariables:
print("free sulphur dioxide in this variable:", n11.free_sulfur_dioxide, end = " of ")
print(n11.total_sulfur_dioxide, "total sulphur dioxide" )
n11.gi()
If this is for a machine learning project, I would recommend loading your CSV into a numpy array for ease of manipulation. You store every value in the table as its own variable, but that will give you a performance hit by preventing you from using vectorized operations, as well as make your data more difficult to work with. I'd suggest this:
from numpy import genfromtxt
my_data = genfromtxt('my_file.csv', delimiter=',')
If your machine learning problem is supervised, you'll also want to split your labels into a separate data structure. If you're doing unsupervised learning, though, a single data structure will suffice. If you provide additional context on the problem you're trying to solve, we could provide you with additional context and guidance.