I need a shortcut - python

So im just trying to make a simple script that can filter emails with different domains its working great but i need a shortcut, cause i dont wana write if and elif statements many time , Can anyone tell my how to write my script with function so that will become shorter and easier.. thanks in advance ,Script is below:
f_location = 'C:/Users/Jack The Reaper/Desktop/mix.txt'
text = open(f_location)
good = open('C:/Users/Jack The Reaper/Desktop/good.txt','w')
for line in text:
if '#yahoo' in line:
yahoo = None
elif '#gmail' in line:
gmail = None
elif '#yahoo' in line:
yahoo = None
elif '#live' in line:
live = None
elif '#outlook' in line:
outlook = None
elif '#hotmail' in line:
hotmail = None
elif '#aol' in line:
aol = None
else:
if ' ' in line:
good.write(line.strip(' '))
elif '' in line:
good.write(line.strip(''))
else:
good.write(line)
text.close()
good.close()

I would suggest you to use dict for this instead of having separate variables for all the cases.
my_dict = {}
...
if '#yahoo' in line:
my_dict['yahoo'] = None
But if you want to do the way you described in the question, you can do as done below,
email_domains = ['#yahoo', '#gmail', '#live', '#outlook', '#hotmail', '#aol']
for e in email_domains:
if e in line:
locals()[e[1:]] = None
#if you use dict, use the below line
#my_dict[e[1:]] = None
locals() returns a dictionary of the current namespace. The keys in this dict are the variable names and value is the value of the variable.
So locals()['gmail'] = None creates a local variable named gmail(if it doesn't exist) and assigns it None.

As you stated the problem and provided the sample file :
So i have two solution : One line solution and other is detailed solution.
First let's define regex pattern and import re module
import re
pattern=r'.+#(?!gmail|yahoo|aol|hotmail|live|outlook).+'
Now detailed version code:
emails=[]
with open('emails.txt','r') as f:
for line in f:
match=re.finditer(pattern,line)
for find in match:
emails.append(find.group())
with open('result.txt','w') as f:
f.write('\n'.join(emails))
output in result.txt file :
nic-os9#gmx.de
angelique.charuel#sfr.fr
nannik#interia.pl
l.andrioli#freenet.de
kamil_sieminski8#o2.pl
hugo.lebrun.basket#orange.fr
One line solution if you want too short:
with open('results.txt','w') as file:
file.write('\n'.join([find.group() for line in open('emails.txt','r') for find in re.finditer(pattern,line)]))
output:
nic-os9#gmx.de
angelique.charuel#sfr.fr
nannik#interia.pl
l.andrioli#freenet.de
kamil_sieminski8#o2.pl
hugo.lebrun.basket#orange.fr
P.S : with one line solution file will not close automatically but python clear that stuff its not a big issue (but not always) but still if you want you can use.

Related

XML to dictionary extraction

I wrote a code and some values at ecd are missing. I would like to indicate them as 'None' or 0000 to be able to create a dataframe. Unfortunately, the code runs until the missing place and then it crashes and I cannot spot a mistake.
The error message:
File "extra.py", line 236, in <module>
if dic['mudLogs']['mudLog']['geologyInterval'][i]['ecdTdAv']['#text'] != None:
KeyError: 'ecdTdAv'
Code:
xml_file = 'C:\\Users\\jtfra\\Desktop\\Thesis\\Volve_Real_Time_DData\\WITSML Realtime drilling data\\Norway-Statoil-NO 15_$47$_9-F-11\\1\\mudLog\\1.xml'
def convert(xml_file, xml_attribs=True):
with open(xml_file, "rb") as f: # notice the "rb" mode
d = xmltodict.parse(f, xml_attribs=xml_attribs)
return d
dic = convert(xml_file)
mdTop, ecd = [], []
for i in range(len(dic['mudLogs']['mudLog']['geologyInterval'])):
mdTop.append(dic['mudLogs']['mudLog']['geologyInterval'][i]['mdTop']['#text'])
if dic['mudLogs']['mudLog']['geologyInterval'][i]['ecdTdAv']['#text'] != None:
ecd.append(dic['mudLogs']['mudLog']['geologyInterval'][i]['ecdTdAv']['#text'])
else:
ecd.append('None')
print(ecd)
Instead of accessing it as:
dic['mudLogs']['mudLog']['geologyInterval'][0]['ecdTdAv']
do:
dic['mudLogs']['mudLog']['geologyInterval'][0].get('ecdTdAv', '0000')
or similar.
You can also check if key is present with:
if 'ecdTdAv' in dic['mudLogs']['mudLog']['geologyInterval'][I]:
# do something with it, e.g.:
print(dic['mudLogs']['mudLog']['geologyInterval'][i]['ecdTdAv']['#text'])

Try statement not running as I expect

I have three functions, the readHeader thet reads the header of the a txt file, readExpertsFile that reads the contents of the file and the exceptionNH function that compares the file name and header and raises an exception if the two are not compatible (e.g. if the date in the name is not the same as the header).
Here are the three functions and a txt example:
def readHeader(fileName):
fileIn = open(fileName, "r")
fileIn.readline()
day = fileIn.readline().replace("\n", "")
fileIn.readline()
time = fileIn.readline().replace("\n", "")
fileIn.readline()
company = fileIn.readline().replace("\n", "")
scope = fileIn.readline().replace(":", "").replace("\n", "")
fileIn.close()
return (day, time, company, scope)
def readFile(fileName):
expertsList = []
expertsList.append(readHeader(fileName))
fileIn = open(fileName, "r")
for line_counter in range(LNHEADER):
fileIn.readline()
fileInE.close()
return expertsList
def exceptionNH(fileName):
try:
assert fileName[10:17] == readFile(fileName)[3][0].lower().replace(":", "")
except AssertionError:
print("Error in input file: inconsistent name and header in file", fileName,".")
exit()
fileName = "file.txt"
exceptionNH("2018y03m28experts10h30.txt")
2018y03m28experts10h30.txt:
Day:
2018-03-28
Time:
10:30
Company:
XXX
Experts:
...
...
My problem here is that on the try statement I expect the assert "sees" the comparation as True and skip the except clause but this is not happening.
I suspect that the .lower() is not working but I can't understand why.
If you see other things that could be better feel free to share, as I'm a new at python and want to improve myself.
I've found the error. I was thinking that when I want to get the first element from the first tuple inside a list, I would need to write list[position of item][position of tuple], instead of it's inverse.
Following the mkrieger1's advice, I printed fileName[10:17] and readFile(fileName)[3][0].lower().replace(":", ""), the first was good but the second was not showing the third item of the first tuple (that's from readHeader) but the first item of the third tuple.
I've changed from readFile(fileName)[3][0].lower().replace(":", "") to readFile(fileName)[0][3].lower().replace(":", "") and it's working now, thank you for the help.

Using a txt file to define multiple variables in python

Background Information
I have a program that I'm using for pinging a service and printing the results back to a window. I'm currently trying to add to this program, by adding a kind of 'settings' file that users can edit to change the a) host that is pinged and b) timeout
What I've tried so far
file = open("file.txt", "r")
print (file.read())
settings = file.read()
# looking for the value of 'host'
pattern = 'host = "(.*)'
variable = re.findall(pattern, settings)[0]
print(test)
As for what is contained within the file.txt file:
host = "youtube.com"
pingTimeout = "1"
However, my attempts have been unsuccessful as this comes up with the following
error:
IndexError: list index out of range
And so, my question is:
Can anyone point me in the right direction to do this? To recap, I am asking how I can take an input from file (in this case host = "youtube.com" and save that as a variable 'host' within the python file).
First, as Patrick Haugh pointed out, you can't call read() twice on the same file object. Second, using regex to parse a simple key = value format is a bit overkill.
host, pingTimeout = None,None # Maybe intialize these to a default value
with open("settings.txt", "r") as f:
for line in f:
key,value = line.strip().split(" = ")
if key == 'host':
host = value
if key == 'pingTimeout':
pingTimeout = int(value)
print host, pingTimeout
Note that the expected input format would have no quotes for the example code above.
host = youtube.com
pingTimeout = 1
I tried this, it may help :
import re
filename = "<your text file with hostname>"
with open(filename) as f:
lines = f.read().splitlines()
for str in lines:
if re.search('host', str):
host, val = str.split('=')
val = val.replace("\"", "")
break
host = val
print host
f.close()

Why does pycharm warn about "Redeclared variable defined above without usage"?

Why does PyCharm warn me about Redeclared 'do_once' defined above without usage in the below code? (warning is at line 3)
for filename in glob.glob(os.path.join(path, '*.'+filetype)):
with open(filename, "r", encoding="utf-8") as file:
do_once = 0
for line in file:
if 'this_text' in line:
if do_once == 0:
//do stuff
do_once = 1
//some other stuff because of 'this text'
elif 'that_text' in line and do_once == 0:
//do stuff
do_once = 1
Since I want it to do it once for each file it seems appropriate to have it every time it opens a new file and it does work just like I want it to but since I have not studied python, just learned some stuff by doing and googling, I wanna know why it is giving me a warning and what I should do differently.
Edit:
Tried with a boolean instead and still got the warning:
Short code that reproduces the warning for me:
import os
import glob
path = 'path'
for filename in glob.glob(os.path.join(path, '*.txt')):
with open(filename, "r", encoding="utf-8") as ins:
do_once = False
for line in ins:
if "this" in line:
print("this")
elif "something_else" in line and do_once == False:
do_once = True
In order to solve the general case:
What you may be doing
v1 = []
for i in range(n):
v1.append([randrange(10)])
v2 = []
for i in range(n): # <<< Redeclared i without usage
v2.append([randrange(10)])
What you can do
v1 = [[randrange(10)] for _ in range(5)] # use dummy variable "_"
v2 = [[randrange(10)] for _ in range(5)]
My guess is PyCharm is being confused by the use of integers as flags, there are several alternatives that could be used in your use case.
Use a boolean flag instead of an integer
file_processed = False
for line in file:
if 'this' in line and not file_processed:
# do stuff
file_processed = True
...
A better approach would be to jump simply stop once you have processed something in the file eg:
for filename in [...list...]:
while open(filename) as f:
for line in f:
if 'this_text' in line:
# Do stuff
break # Break out of this for loop and go to the next file
Not really an answer, but maybe an explanation:
Apparently, PyCharm is trying to avoid code like
do_once = False
do_once = True
However, it's also flagging normal code like the OP's:
item_found = False
for item in items:
if item == item_that_i_want:
item_found = True
if item_found:
# do something
or, something like
last_message = ''
try:
# do something
if success:
last_message = 'successfully did something'
else:
last_message = 'did something without success'
# do something else
if success:
last_message = '2nd something was successful'
else
last_message = '2nd something was not successful'
# and so on
print(last_message)
Redeclared 'last_message' defined above without usage warning will appear for every line where last_message was reassigned without using it inbetween.
So, the workaround would be different for each case where this is happening:
ignore the warning(s)
print or log the value somewhere after setting it
perhaps make a function to call for setting/retrieving the value
determine if there's an alternate way to accomplish the desired outcome
My code was using the last_message example, and I just removed the code reassigning last_message in each case (though printing after each reassignment also removed the warnings). I was using it for testing to locate a problem, so it wasn't critical. Had I wanted to log the completed actions, I might've used a function to do so instead of reassigning the variable each time.
If I find a way to turn it off or avoid the warning in PyCharm, I'll update this answer.

How can I parse a formatted file into variables using Python?

I have a pre-formatted text file with some variables in it, like this:
header one
name = "this is my name"
last_name = "this is my last name"
addr = "somewhere"
addr_no = 35
header
header two
first_var = 1.002E-3
second_var = -2.002E-8
header
As you can see, each score starts with the string header followed by the name of the scope (one, two, etc.).
I can't figure out how to programmatically parse those options using Python so that they would be accesible to my script in this manner:
one.name = "this is my name"
one.last_name = "this is my last name"
two.first_var = 1.002E-3
Can anyone point me to a tutorial or a library or to a specific part of the docs that would help me achieve my goal?
I'd parse that with a generator, yielding sections as you parse the file. ast.literal_eval() takes care of interpreting the value as a Python literal:
import ast
def load_sections(filename):
with open(filename, 'r') as infile:
for line in infile:
if not line.startswith('header'):
continue # skip to the next line until we find a header
sectionname = line.split(None, 1)[-1].strip()
section = {}
for line in infile:
if line.startswith('header'):
break # end of section
line = line.strip()
key, value = line.split(' = ', 1)
section[key] = ast.literal_eval(value)
yield sectionname, section
Loop over the above function to receive (name, section_dict) tuples:
for name, section in load_sections(somefilename):
print name, section
For your sample input data, that results in:
>>> for name, section in load_sections('/tmp/example'):
... print name, section
...
one {'last_name': 'this is my last name', 'name': 'this is my name', 'addr_no': 35, 'addr': 'somewhere'}
two {'first_var': 0.001002, 'second_var': -2.002e-08}
Martijn Pieters is correct in his answer given your preformatted file, but if you can format the file in a different way in the first place, you will avoid a lot of potential bugs. If I were you, I would look into getting the file formatted as JSON (or XML), because then you would be able to use python's json (or XML) libraries to do the work for you. http://docs.python.org/2/library/json.html . Unless you're working with really bad legacy code or a system that you don't have access to, you should be able to go into the code that spits out the file in the first place and make it give you a better file.
def get_section(f):
section=[]
for line in f:
section += [ line.strip("\n ") ]
if section[-1] == 'header': break
return section
sections = dict()
with open('input') as f:
while True:
section = get_section(f)
if not section: break
section_dict = dict()
section_dict['sname'] = section[0].split()[1]
for param in section[1:-2]:
k,v = [ x.strip() for x in param.split('=')]
section_dict[k] = v
sections[section_dict['sname']] = section_dict
print sections['one']['name']
You can also access these sections as attributes:
class Section:
def __init__(self, d):
self.__dict__ = d
one = Section(sections['one'])
print one.name

Categories