Split words and creating new files with different names(python) - python

I need to write a program like this:
Write a program that reads a file .picasa.ini and copies pictures in new files, whose names are the same as identification numbers of person on these pictures (eg. 8ff985a43603dbf8.jpg). If there are more person on the picture it makes more copies. If a person is on more pictures, later override earlier copies of pictures; if a person 8ff985a43603dbf8 may appear in more pictures, only one file with this name will exist. You must presume that we have a simple file .picasa.ini.
I have an .ini, that consists:
[img_8538.jpg]
faces=rect64(4ac022d1820c8624),**d5a2d2f6f0d7ccbc**
backuphash=46512
[img_8551.jpg]
faces=rect64(acb64583d1eb84cb),**2623af3d8cb8e040**;rect64(58bf441388df9592),**d85d127e5c45cdc2**
backuphash=8108
...
Is this a good way to start this program?
for line in open('C:\Users\Admin\Desktop\podatki-picasa\.picasa.ini'):
if line.startswith('faces'):
line.split() # what must I do here to split the bolded words?
Is there a better way to do this? Remember the .jpg file must be created with a new name, so I think I should link the current .jpg file with the bolded one.

Consider using ConfigParser. Then you will have to split each value by hand, as you describe.

import ConfigParser
import string
config = ConfigParser.ConfigParser()
config.read('C:\Users\Admin\Desktop\podatki-picasa\.picasa.ini')
imgs = []
for item in config.sections():
imgs.append(config.get(item, 'faces'))
This is still work in progress. Just want to ask if it's correct.
edit:
Still don't know hot to split the bolded words out of there. This split function really is a pain for me.

Suggestions:
Your lines don't start with 'faces', so your second line won't work the way you want it to. Depending on how the rest of the file looks, you might only need to check whether the line is empty or not at that point.
To get the information you need, first split at ',' and work from there
Try at a solution: The elements you need seem to always have a ',' before them, so you can start by splitting at the ',' sign and taking everything from the 1-index elemnt onwards [1::] . Then if what I am thinking is correct, you split those elements twice again: at the ";" and take the 0-index element of that and at that " ", again taking the 0-index element.
for line in open('thingy.ini'):
if line != "\n":
personelements = line.split(",")[1::]
for person in personelements:
personstring = person.split(";")[0].split(" ")[0]
print personstring
works for me to get:
d5a2d2f6f0d7ccbc
2623af3d8cb8e040
d85d127e5c45cdc2

Related

Number of lines added and deleted in files using gitpython

How to get/extract number of lines added and deleted?
(Just like we do using git diff --numstat).
repo_ = Repo('git-repo-path')
git_ = repo_.git
log_ = g.diff('--numstat','HEAD~1')
print(log_)
prints the entire output (lines added/deleted and file-names) as a single string. Can this output format be modified or changed so as to extract useful information?
Output format: num(added) num(deleted) file-name
For all files modified.
If I understand you correctly, you want to extract data from your log_ variable and then re-format it and print it? If that's the case, then I think the simplest way to fix it, is with a regular expression:
import re
for line in log_.split('\n'):
m = re.match(r"(\d+)\s+(\d+)\s+(.+)", line)
if m:
print("{}: rows added {}, rows deleted {}".format(m[3], m[1], m[2]))
The exact output, you can of course modify any way you want, once you have the data in a match m. Getting the hang of regular expressions may take a while but it can be very helpful for small scripts.
However, be adviced, reg exps tend to be write-only code and can be very hard to debug. However, for extracting small parts like this, it is very helpful.

How to give a name for a file?

For every iteration in my loop for, I need to give 'the number of my iteration' as a name for the file, for example, the goal is to save:
my first iteration in the first file.
my second iteration in the second file.
....
I use for that the library numpy, but my code doesn't give me the solution that i need, in fact my actual code oblige me to enter the name of the file after each iteration, that is easy if I have 6 or 7 iteration, but i am in the case that I have 100 iteration, it doesn't make sense:
for line, a in enumerate(Plaintxt_file):
#instruction
#result
fileName = raw_input()
if(fileName!='end'):
fileName = r'C:\\Users\\My_resul\\Win_My_Scripts\\'+fileName
np.save(fileName+'.npy',Result)
ser.close()
I would be very grateful if you could help me.
Create your file name from the line number:
for line, a in enumerate(Plaintxt_file):
fileName = r'C:\Users\My_resul\Win_My_Scripts\file_{}.npy'.format(line)
np.save(fileName, Result)
This start with file name file_0.npy.
If you like to start with 1, specify the starting index in enumerate:
for line, a in enumerate(Plaintxt_file, 1):
Of course, this assumes you don't need line starting with 0 anywhere else.
I'm not 100% sure what your issue is, but as far as I can tell, you just need some string formatting for the filename.
So, you want, say 100 files, each one created after an iteration. The easiest way to do this would probably be to use something like the following:
for line, a in enumerate():
#do work
filename = "C:\\SaveDir\\OutputFile{0}.txt".format(line)
np.save(filename, Result)
That won't be 100% accurate to your needs, but hopefully that will give you the idea.
If you're just after, say, 100 blank files with the naming scheme "0.npy", "1.npy", all the way up to "n-1.npy", a simple for loop would do the job (no need for numpy!):
n = 100
for i in range(n):
open(str(i) + ".npy", 'a').close()
This loop runs for n iterations and spits out empty files with the filename corresponding to the current iteration
If you do not care about the sequence of the files and you do not want the files from multiple runs of the loop to overwrite each other, you can use random unique IDs.
from uuid import uuid4
# ...
for a in Plaintxt_file:
fileName = 'C:\\Users\\My_resul\\Win_My_Scripts\\file_{}.npy'.format(uuid4())
np.save(fileName, Result)
Sidenote:
Do not use raw strings and escaped backslashes together.
It's either r"C:\path" or "C:\\path" - unless you want double backslashes in the path. I do not know if Windows likes them.

Making a program which will read and write to text files

I'm pretty new to Python! I recently started coding a program which I want to write and read to and from text files, while compressing/decompressing sentences (sort of).
However, I've run into a couple problems which I can't seem to fix, basically, I've managed to code the compressing section. But when I go to read the contents of the text file, I'm not sure how to recreate the original sentence through the positions and unique words?!
###This section will compress the sentence(s)###
txt_file = open("User_sentences.txt","wt")
user_sntnce = input(str("\nPlease enter your sentences you would like compressed."))
user_sntnce_list = user_sntnce.split(" ")
print(user_sntnce_list)
for word in user_sntnce_list:
if word not in uq_words:
uq_words.append(word)
txt_file.write(str(uq_words) + "\n")
for i in user_sntnce_list:
positions = int(uq_words.index(i) + 1)
index.append(positions)
print(positions)
print(i)
txt_file.write(str(positions))
txt_file.close()
###This section will DECOMPRESS the sentence(s)###
if GuideChoice == "2":
txt_file = open("User_sentences.txt","r")
contents = txt_file.readline()
words = eval(contents)
print(words)
txt_file.close()
This is my code so far, it seems to work, however as I've said I'm really stuck, and I really don't know how to move on and recreate the original sentence from the text file.
From what understand you want to substitute each word in a text file with a word of your choice (a shorter one if you want to "compress"). Meanwhile you keep a "dictionary" (not in the python sense) uq_words where you associate each different word with an index.
So a sentence "today I like pizza, today is like yesterday" will become:
"12341536".
I tried your code removing if GuideChoice == "2": and defining uq_words=[] and index=[].
If that's what you intend to do then:
I imagine you are calling this compression from time to time, it's in a function. So doing what you do in the second line is to open a NEW file with the same name of the previous ones, meaning you will always have the last sentence compressed, loosing the previous.
Try to read every time the lines, rewrite all and add the new (kinda what you did in contents = txt_file.readline().
You are printing both the compressed translation (like "2345") AND the array whose component are the words of the splitted sentence. I do not think that is the "compressed" document you are aiming for. Just the "2345" part, right?
Since, I believe, you want to keep a dictionary, but this code is inside a function, you will loose the dictionary every time the function ends. So write 2 documents: one with the compressed text (every time refreshed and not rewritten!) and another file with 2 columns, where you write the dictionary. You pass the dictionary file name as a string to the function, so you can update it in case new words are added, and you read it as a NX2 array (N the number of words).

How to assign a single variable to a specific line in python?

I was not clear enough in my last question, and so I'll explain my question more this time.
I am creating 2 separate programs, where the first one will create a text file with 2 generated numbers, one on line 1 and the second on line 2.
Basically I saved it like this:
In this example I'm not generating numbers, just assigning them quickly.
a = 15
b = 16
saving = open('filename.txt', "w")
saving.write(a+"\n")
saving.write(b+"\n")
saving.close()
Then I opened it on the next one:
opening = open('filename.txt', "w")
a = opening.read()
opening.close()
print(a) #This will print the whole document, but I need each line to be differnet
Now I got the whole file loaded into 'a', but I need it split up, which is something that i have not got a clue on how to do. I don't believe creating a list will help, as I need each number (Variables a and b from program 1) to be different variables in program 2. The reason I need them as 2 separate variables is because I need to divide it by a different number. If I do need to do a list, please say. I tried finding an answer for about an hour in total, though I couldn't find anything.
The reason I can't post the whole program is because I haven't got access to it from here, and no, this is not cheating as we are free to research and ask questions outside the classroom, if someone wonders about that after looking at my previous question.
If you need more info please put it in a comment and I'll respond ASAP.
opening = open('filename.txt') # "w" is not necessary since you're opening it read-only
a = [b.split() for b in opening.readlines()] # create a list of each line and strip the newline "\n" character
print(a[0]) # print first line
print(a[1]) # print second line

Python - How to check if the name from file is used?

I have small scraping script. I have file with 2000 names and I use these names to search for Video IDs in YouTube. Because of the amount it takes pretty long time to get all the IDs so I can't do that in one time. What I want is to find where I ended my last scrape and then start from that position. What is the best way to do this? I was thinking about adding the used name to the list and then just check if it's in the list, if no - start scraping but maybe there's a better way to do this? (I hope yes).
Part that takes name from file and scraped IDs. What I want is when I quit scraping, next time when I start it, it would run not from beginning but from point where it ended last time:
index = 0
for name in itertools.islice(f, index, None):
parameters = {'key': api_key, 'q': name}
request_url = requests.get('https://www.googleapis.com/youtube/v3/search?part=snippet&maxResults=1&type=video&fields=items%2Fid', params = parameters)
videoid = json.loads(request_url.text)
if 'error' in videoid:
pass
else:
index += 1
id_file.write(videoid['items'][0]['id']['videoId'] + '\n')
print videoid['items'][0]['id']['videoId']
You could just remember the index number of the last scraped entry. Every time you finish scraping one entry, increment a counter, then assuming the entries in your text file don't change order, just pick up again at that number?
The simplest answer here is probably mitim's answer. Just keep a file that you rewrite with the last-processed index after each line. For example:
savepath = os.path.expanduser('~/.myprogram.lines')
skiplines = 0
try:
with open(savepath) as f:
skiplines = int(f.read())
except:
pass
with open('names.txt') as f:
for linenumber, line in itertools.islice(enumerate(f), skiplines, None):
do_stuff(line)
with open(savepath, 'w') as f:
f.write(str(linenumber))
However, there are other ways you could do this that might make more sense for your use case.
For example, you could rewrite the "names" file after each name is processed to remove the first line. Or, maybe better, preprocess the list into an anydbm (or even sqlite3) database, so you can more easily remove (or mark) names once they're done.
Or, if you might run against different files, and need to keep a progress for each one, you could store a separate .lines file for each one (probably in a ~/.myprogram directory, rather than flooding the top-level home directory), or use an anydbm mapping pathnames to lines done.

Categories