Python Basics - First Project/Challenge

Python Basics - First Project/Challenge - python

I'm extremely new to Python (and software programming/development in general). I decided to use the scenario below as my first project. The project includes 5 main personal challenges. Some of the challenges I have been able to complete (although probably not the most effecient way), and others I'm struggling with. Any feedback you have on my approach and recommendations for improvement is GREATLY appreciated.
Project Scenario = "If I doubled my money each day for 100 days, how much would I end up with at day #100? My starting amount on Day #1 is $1.00"
1.) Challenge 1 - What is the net TOTAL after day 100 - (COMPLETED, I think, please correct me if I'm wrong)
days = 100
compound_rate = 2
print('compound_rate ** days) # 2 raised to the 100th
#==Result===
1267650600228229401496703205376
2.) Challenge 2 - Print to screen the DAYS in the first column, and corresponding Daily Total in the second column. - (COMPLETED, I think, please correct me if I'm wrong)
compound_rate = 2
days_range = list(range(101))
for x in days_range:
print (str(x),(compound_rate ** int(x)))
# ===EXAMPLE Results
# 0 1
# 1 2
# 2 4
# 3 8
# 4 16
# 5 32
# 6 64
# 100 1267650600228229401496703205376
3.) Challenge 3 - Write TOTAL result (after the 100 days) to an external txt file - (COMPLETED, I think, please correct me if I'm wrong)
compound_rate = 2
days_range = list(range(101))
hundred_days = (compound_rate ** 100)
textFile = open("calctest.txt", "w")
textFile.write(str(hundred_days))
textFile.close()
#===Result====
string of 1267650600228229401496703205376 --> written to my file 'calctest.txt'
4.) Challenge 4 - Write the Calculated running DAILY Totals to an external txt file. Column 1 will be the Day, and Column 2 will be the Amount. So just like Challenge #2 but to an external file instead of screen
NEED HELP, I can't seem to figure this one out.
5.) Challenge 5 - Somehow plot or chart the Daily Results (based on #4) - NEED GUIDANCE.
I appreciate everyone's feedback as I start on my personal Python journey!

challenge 2
This will work fine, but there's no need to write list(range(101)), you can just write range(101). In fact, there's no need even to create a variable to store that, you can just do this:
for x in range(101):
print("whatever you want to go here")
challenge 3
Again, this will work fine, but when writing to a file, it is normally best to use a with statement, this means that you don't need to close the file at the end, as python will take care of that. For example:
with open("calctest.txt", "w") as f:
write(str(hundred_days))
challenge 4
Use a for loop as you did with challenge 2. Use "\n" to write a new line. Again do everything inside a with statement. e.g.
with open("calctest.txt", "w") as f:
for x in range(101):
f.write("something here \n").
(would write a file with 'something here ' written 101 times)
challenge 5
There is a python library called matplotlib, which I have never used, but I would suggest that would be where to go to in order to solve this task.
I hope this is of some help :)

You can use what you did in challenge 3 to open and close the ouput file.
In between, you have to do what you did in challenge 2 to compute the data for each day.
In stead of writing the daily result to the stream, you will have to combine it into a string. After that, you can write that string to the file, exactly like you did in challenge 3.

Challenge One:
This is the correct way.
days = 100
compound_rate = 2
print("Result after 100 days" + (compound_rate ** days))
Challenge Two
This is corrected.
compound_rate = 2
days_range = list(range(101))
for x in days_range:
print(x + (compound_rate ** x))
Challenge Three
This one is close but you didn't need to cast the result of hundred_days to a string as you can write the integer to a file and python doesn't care most of the time. Explicit casts need only to be worried about when using the data in some way other than simply printing it.
compound_rate = 2
days_range = list(range(101))
hundred_days = (compound_rate ** 100)
textFile = open("calctest.txt", "w")
textFile.write(hundred_days)
textFile.close()
Challenge Four
For this challenge, you will want to look into the python CSV module. You can write the data in two rows separated by commas very simply with this module.
Challenge Five
For this challenge, you will want to look into the python library matplotlib. This library will give you tools to work with the data in a graphical way.

Answer for challenge 1 is as follows:
l = []
for a in range(0,100):
b = 2 ** a
l.append(b)
print("Total after 100 days", sum(l))

import os, sys
import datetime
import time
#to get the current work directory, we use below os.getcwd()
print(os.getcwd())
#to get the list of files and folders in a path, we use os.listdir
print(os.listdir())
#to know the files inside a folder using path
spath = (r'C:\Users\7char')
l = spath
print(os.listdir(l))
#converting a file format to other, ex: txt to py
path = r'C:\Users\7char'
print(os.listdir(path))
# after looking at the list of files, we choose to change 'rough.py' 'rough.txt'
os.chdir(path)
os.rename('rough.py','rough.txt')
#check whether the file has changed to new format
print(os.listdir(path))
#yes now the file is changed to new format
print(os.stat('rough.txt').st_size)
# by using os.stat function we can see the size of file (os.stat(file).sst_size)
path = r"C:\Users\7char\rough.txt"
datetime = os.path.getmtime(path)
moddatetime = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(datetime))
print("Last Modified Time : ", moddatetime)
#differentiating b/w files and folders using - os.path.splitext
import os
path = r"C:\Users\7char\rough.txt"
dir(os.path)
files = os.listdir()
for file in files:
print(os.path.splitext(file))
#moving a file from one folder to other (including moving with folders of a path or moving into subforlders)
import os
char_7 = r"C:\Users\7char"
cleardata = r"C:\Users\clearadata"
operating = os.listdir(r"C:\Users\7char")
print(operating)
for i in operating:
movefrom = os.path.join(char_7,i)
moveto = os.path.join(cleardata,i)
print(movefrom,moveto)
os.rename(movefrom,moveto)
#now moving files based on length of individual charecter (even / odd) to a specified path (even or odd).
import os
origin_path = r"C:\Users\movefilehere"
fivechar_path= r"C:\Users\5char"
sevenchar_path = r"C:\Users\7char"
origin_path = os.listdir(origin_path)
for file_name in origin_pathlist:
l = len(file_name)
if l % 2 == 0:
evenfilepath = os.path.join(origin_path,file_name)
newevenfilepath = os.path.join(fivechar_path,file_name)
print(evenfilepath,newevenfilepath)
os.rename(evenfilepath,newevenfilepath)
else:
oddfilepath = os.path.join(origin_path,file_name)
newoddfilepath = os.path.join(sevenchar_path,file_name)
print(oddfilepath,newoddfilepath)
os.rename(oddfilepath,newoddfilepath)
#finding the extension in a folder using isdir
import os
path = r"C:\Users\7char"
print(os.path.isdir(path))
#how a many files .py and .txt (any files) in a folder
import os
from os.path import join, splitext
from glob import glob
from collections import Counter
path = r"C:\Users\7char"
c = Counter([splitext(i)[1][1:] for i in glob(join(path, '*'))])
for ext, count in c.most_common():
print(ext, count)
#looking at the files and extensions, including the total of extensions.
import os
from os.path import join, splitext
from collections import defaultdict
path = r"C:\Users\7char"
c = defaultdict(int)
files = os.listdir(path)
for filenames in files:
extension = os.path.splitext(filenames)[-1]
c[extension]+=1
print(os.path.splitext(filenames))
print(c,extension)
#getting list from range
list(range(4))
#break and continue statements and else clauses on loops
for n in range(2,10):
for x in range(2,n):
if n%x == 0:
print(n,'equals',x, '*', n//x)
break
else:
print(n, 'is a prime number')
#Dictionaries
#the dict() constructer builds dictionaries directly from sequences of key-value pairs
dict([('ad', 1212),('dasd', 2323),('grsfd',43324)])
#loop over two or more sequences at the same time, the entries can be paired with the zip() function.
questions = ['name', 'quest', 'favorite color']
answers = ['lancelot', 'the holy grail', 'blue']
for q, a in zip(questions, answers):
print('What is your {0}? It is {1}.'.format(q, a))
#Using set()
basket = ['apple', 'orange', 'apple', 'pear', 'orange', 'banana']
for f in sorted(set(basket)):
print(f)

Related

A Pythonic way to delete older logfiles

I'm just cleaning log files greater than 50 (by oldest first). This is the only thing I've been able to come up with and I feel like there is a better way to do this. I'm currently getting a pylint warning using the lambda on get_time.
def clean_logs():
log_path = "Runtime/logs/"
max_log_files = 50
def sorted_log_list(path):
get_time = lambda f: os.stat(os.path.join(path, f)).st_mtime
return list(sorted(os.listdir(path), key=get_time))
del_list = sorted_log_list(log_path)[0:(len(sorted_log_list(log_path)) - max_log_files)]
for x in del_list:
pathlib.Path(pathlib.Path(log_path).resolve() / x).unlink(missing_ok=True)
clean_logs()

The two simplified solutions below are used to accomplish different tasks, so included both for flexibility. Obviously, you can wrap this in a function if you like.
Both code examples breaks down into the following steps:
Set the date delta (as an epoch reference) for mtime comparison as, N days prior to today.
Collect the full path to all files matching a given extension.
Create a generator (or list) to hold the files to be deleted, using mtime as a reference.
Iterate the results and delete all applicable files.
Removing log files older than (n) days:
import os
from datetime import datetime as dt
from glob import glob
# Setup
path = '/tmp/logs/'
days = 5
ndays = dt.now().timestamp() - days * 86400
# Collect all files.
files = glob(os.path.join(path, '*.sql.gz'))
# Choose files to be deleted.
to_delete = (f for f in files if os.stat(f).st_mtime < ndays)
# Delete files older than (n) days.
for f in to_delete:
os.remove(f)
Keeping the (n) latest log files
To keep the (n) latest log files, simply replace the to_delete definition above with:
n = 50
to_delete = sorted(files, key=lambda x: os.stat(x).st_mtime)[:len(files)-n]

How to create a script that gives me every combination possible of a six digit code

Me and a friend want to create a script that gives us every possible permutation of a six digit code, comprised of 36 alphanumeric characters (0-9, and a-z), in alphabetical order, then be able to see them in a .txt file.
And I want it to use all of the CPU and RAM it can, so that it takes less time to complete the task.
So far, this is the code:
import random
charset = "0123456789abcdefghijklmnopqrstuvwxyz"
links = []
file = open("codes.txt", "a")
for g in range(0, 36**6):
key = ""
base = ""
print(str(g))
for i in range(0, 6):
char = random.choice(charset)
key += char
base += key
file.write(base + "\n")
file.close()
This code randomly generates the combinations and immediately writes them in a .txt file, while printing the amount of codes it has already created but, it isn't in alphabetical order (have to do it afterwards), and it takes too long.
How can the code be improved to give the desired outcome?
Thanks to #R0Best for providing the best answer

Although this post already has 6 answers, I'm not content with any of them, so I've decided to contribute a solution of my own.
First, note that many of the answers provide the combinations or permutations of letters, however the post actually wants the Cartesian Product of the alphabet with itself (repeated N times, where N=6). There is (at this time) two answers that do this, however they both write an excessive number of times, resulting in subpar performance, and also concatenate their intermediate results in the hottest portion of the loop (also bringing down performance).
In the interest of taking optimization to the absolute max, I present the following code:
from string import digits, ascii_lowercase
from itertools import chain
ALPHABET = (digits + ascii_lowercase).encode("ascii")
def fast_brute_force():
# Define some constants to make the following sections more readable
base_size = 6
suffix_size = 4
prefix_size = base_size - suffix_size
word_size = base_size + 1
# define two containers
# word_blob - placeholder words, with hyphens in the unpopulated characters (followed by newline)
# sleds - a tuple of repeated bytes, used for substituting a bunch of characters in a batch
word_blob = bytearray(b"-" * base_size + b"\n")
sleds = tuple(bytes([char]) for char in ALPHABET)
# iteratively extend word_blob and sleds, and filling in unpopulated characters using the sleds
# in doing so, we construct a single "blob" that contains concatenated suffixes of the desired
# output with placeholders so we can quickly substitute in the prefix, write, repeat, in batches
for offset in range(prefix_size, base_size)[::-1]:
word_blob *= len(ALPHABET)
word_blob[offset::word_size] = chain.from_iterable(sleds)
sleds = tuple(sled * len(ALPHABET) for sled in sleds)
with open("output.txt", "wb") as f:
# I've expanded out the logic for substituting in the prefixes into explicit nested for loops
# to avoid both redundancy (reassigning the same value) and avoiding overhead associated with
# a recursive implementation
# I assert this below, so any changes in suffix_size will fail loudly
assert prefix_size == 2
for sled1 in sleds:
word_blob[0::word_size] = sled1
for sled2 in sleds:
word_blob[1::word_size] = sled2
# we write to the raw FileIO since we know we don't need buffering or other fancy
# bells and whistles, however in practice it doesn't seem that much faster
f.raw.write(word_blob)
There's a lot of magic happening in that code block, but in a nutshell:
I batch the writes, so that I'm writing 36**4 or 1679616 entries at once, so there's less context switching.
I update all 1679616 entries per batch simultaneously with the new prefix, using bytearray slicing / assignment.
I operate on bytes, write to the raw FileIO, expand the loops for the prefix assignments, and other small optimizations to avoid encoding/buffering/function call overhead/other performance hits.
Note, unless you have a very fast disk and slowish CPU, you won't see much benefit from the smaller optimizations, just the write batching probably.
On my system, it takes about 45 seconds to product + write the 14880348 file, and that's writing to my slowest disk. On my NVMe drive, it takes 6.868 seconds.

The fastest way I can think of is using pypy3 with this code:
import functools
import time
from string import digits, ascii_lowercase
#functools.lru_cache(maxsize=128)
def main():
cl = []
cs = digits + ascii_lowercase
for letter in cs:
cl.append(letter)
ct = tuple(cl)
with open("codes.txt", "w") as file:
for p1 in ct:
for p2 in ct:
for p3 in ct:
for p4 in ct:
for p5 in ct:
for p6 in ct:
file.write(f"{p1}{p2}{p3}{p4}{p5}{p6}\n")
if __name__ == '__main__':
start = time.time()
main()
print(f"Done!\nTook {time.time() - start} seconds!")
It writes at around 10-15MB/s. The total file is around 15GB I believe so it would take like 990-1500 seconds to generate. The results are on a VM of unraid with 1 3.4 ghz core of server CPU, with an old SATA3 SSD. You will probably get better results with an NVME drive and a faster single core CPU.

Random Can be very inefficient. You can try :
from itertools import permutations
from pandas import Series
charset = list("0123456789abcdefghijklmnopqrstuvwxyz")
links = []
file = open("codes.txt", "a")
comb = permutations(charset,6)
comb = list(comb)
comb = list(map(lambda x:return ''.join(x),comb))
mySeries = Series(comb)
mySeries = mySeries.sort_values()
base = ""
for k in mySeries:
base += k
file.write(base + "\n")
file.close()

You could use itertools.permutaions from the default itertools library. You can also specify the number of characters in the combination.
from itertools import permutations
charset = "0123456789abcdefghijklmnopqrstuvwxyz"
c = permutations(charset, 6)
with open('code.txt', 'w') as f:
for i in c:
f.write("".join(i) + '\n')
Runs on my computer in about 200 milliseconds for creating the list of permutations, then spends a lot of time writing to the file

For permutations, this would do the trick:
from itertools import permutations
charset = "0123456789abcdefghijklmnopqrstuvwxyz"
links = []
with open("codes.txt", "w") as f:
for permutation in permutations(charset, 6):
f.write(''.join(permutation) + '\n')
FYI, it would create a 7.8 GigaByte file
For combinations, this would do the trick:
from itertools import combinations
charset = "0123456789abcdefghijklmnopqrstuvwxyz"
links = []
with open("codes.txt", "w") as f:
for comb in combinations(charset, 6):
f.write(''.join(comb)+ '\n')
FYI, it would create a 10.8 megabyte file

First thing; There is better ways to do this but I want to write something clear and understandable.
Pseudo Code:
base = "";
for(x1=0; x1<charset.length(); x1++)
for(x2=0; x2<charset.length(); x2++)
for(x3=0; x3<charset.length(); x3++)
.
.
.
{ base = charset[x1]+charset[x2]+charset[x3]+.....+charset[x6];
file.write(base + "\n")
}

This is a combination problem where you are trying to get combinations of length 6 from the character set of length 36. This will produce an output of size 36!/(30!*6!) . You can refer the itertools for solving a combination problem like yours. You can refer the Combination function in itertools
Documentation. It is recommended not to perform such a performance intensive computation using Python.

How to read in multiple documents with same code?

So I have a couple of documents, of which each has a x and y coordinate (among other stuff). I wrote some code which is able to filter out said x and y coordinates and store them into float variables.
Now Ideally I'd want to find a way to run the same code on all documents I have (number not fixed, but let's say 3 for now), extract x and y coordinates of each document and calculate an average of these 3 x-values and 3 y-values.
How would I approach this? Never done before.
I successfully created the code to extract the relevant data from 1 file.
Also note: In reality each file has more than just 1 set of x and y coordinates but this does not matter for the problem discussed at hand.
I'm just saying that so that the code does not confuse you.
with open('TestData.txt', 'r' ) as f:
full_array = f.readlines()
del full_array[1:31]
del full_array[len(full_array)-4:len(full_array)]
single_line = full_array[1].split(", ")
x_coord = float(single_line[0].replace("1 Location: ",""))
y_coord = float(single_line[1])
size = float(single_line[3].replace("Size: ",""))
#Remove unecessary stuff
category= single_line[6].replace(" Type: Class: 1D Descr: None","")
In the end I'd like to not have to write the same code for each file another time, especially since the amount of files may vary. Now I have 3 files which equals to 3 sets of coordinates. But on another day I might have 5 for example.

Use os.walk to find the files that you want. Then for each file do you calculation.
https://docs.python.org/2/library/os.html#os.walk

First of all create a method to read a file via it's file name and do the parsing in your way. Now iterate through the directory,I guess files are in the same directory.
Here is the basic code:
import os
def readFile(filename):
try:
with open(filename, 'r') as file:
data = file.read()
return data
except:
return ""
for filename in os.listdir('C:\\Users\\UserName\\Documents'):
#print(filename)
data=readFile( filename)
print(data)
#parse here
#do the calculation here

Python Autocomplete Variable Name

I am new to Python and want an auto-completing variable inside a while-loop. I try to give a minimal example. Lets assume I have the following files in my folder, each starting with the same letters and an increasing number but the end of the filenames consists of just random numbers
a_i=1_404
a_i=2_383
a_i=3_180
I want a while loop like
while 1 <= 3:
old_timestep = 'a_i=n_*'
actual_timestep = 'a_i=(n+1)_*'
... (some functions that need the filenames saved in the two above initialised variables)
n = n+1
So if I start the loop I want it to automatically process all files in my directory. As a result there are two questions:
1) How do I tell python that (in my example I used the '*') I want the filename to be completed automatically?
2) How to I use a formula inside the filename (in my example the '(n+1)')?
Many thanks in advance!

1) To my knowledge you can't do that automatically. I would store all filenames in the directory in a list and then do a search through that list
from os import listdir
from os.path import isfile, join
dir_files = [ f for f in listdir('.') if isfile(join('.',f)) ]
while i <= 3:
old_timestep = "a_i=n_"
for f in dir_files:
if f.startswith(old_timestep):
# process file
i += 1
2) You can use string concatenation
f = open("a_i=" + str(n + 1) + "remainder of filename", 'w')

You can use the glob module to do * expansion:
import glob
old = next(glob.glob('a_i={}_*'.format(n)))
actual = next(glob.glob('a_i={}_*'.format(n + 1)))

I found a way to solve 1) myself by first doing a) then b):
1) a) Truncate the file names so that only the first x characters are left:
for i in {1..3}; do
cp a_i=$((i))_* a_i=$((i))
done
b)
n = 1
while n <= 3:
old = 'a_i=' + str(n)
new = 'a_i=' + str(n+1)
the str() is for converting the integer n into a string in order to concatenate
thanks for your input!

Python: How do I iterate over several files with similar names (the variation in each name is the date)?

I wrote a program that filters files containing to pull location and time from specific ones. Each file contains one day's worth of tweets.
I would like to run this program over one year's worth of tweets, which would involve iterating over 365 folders with names like this: 2011--.tweets.dat.gz, with the stars representing numbers that complete the file name to make it a date for each day in the year.
Basically, I'm looking for code that will loop over 2011-01-01.tweets.dat.gz, 2011-01-02.tweets.dat.gz, ..., all the way through 2011-12-31.tweets.dat.gz.
What I'm imagining now is somehow telling the program to loop over all files with the name 2011-*.tweets.dat.gz, but I'm not sure exactly how that would work or how to structure it, or even if the * syntax is correct.
Any tips?

Easiest way is indeed with a glob:
import from glob import iglob
for pathname in iglob("/path/to/folder/2011-*.tweets.dat.gz"):
print pathname # or do whatever

Use the datetime module:
>>> from datetime import datetime,timedelta
>>> d = datetime(2011,1,1)
while d < datetime(2012,1,1) :
filename = "{}{}".format(d.strftime("%Y-%m-%d"),'.tweets.dat.gz')
print filename
d = d + timedelta(days = 1)
...
2011-01-01.tweets.dat.gz
2011-01-02.tweets.dat.gz
2011-01-03.tweets.dat.gz
2011-01-04.tweets.dat.gz
2011-01-05.tweets.dat.gz
2011-01-06.tweets.dat.gz
2011-01-07.tweets.dat.gz
2011-01-08.tweets.dat.gz
2011-01-09.tweets.dat.gz
2011-01-10.tweets.dat.gz
...
...
2011-12-27.tweets.dat.gz
2011-12-28.tweets.dat.gz
2011-12-29.tweets.dat.gz
2011-12-30.tweets.dat.gz
2011-12-31.tweets.dat.gz

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Basics - First Project/Challenge - python

Answer for challenge 1 is as follows: l = [] for a in range(0,100): b = 2 ** a l.append(b) print("Total after 100 days", sum(l))

Related

A Pythonic way to delete older logfiles

How to create a script that gives me every combination possible of a six digit code

How to read in multiple documents with same code?

Python Autocomplete Variable Name

Python: How do I iterate over several files with similar names (the variation in each name is the date)?

Categories

Resources