Python3: Split multiple variables dynamically - python

I'm trying to split multiple variables that were dynamically created off a for loop and then delete everything after the first space.
Minor back story: I'm using paramiko to SSH to a network switch to pull VLAN information. Trying to create a new variable for each VLAN name and then present all variables back into a list for the user to select from.
#VLANLines## were split from VLANList off \r\n. Variables created form a for loop
VLANLine1 = 'GGGGGGGGG 5 5/7'
VLANLine2 = 'HHHH 66 22/23'
VLANLine3 = 'SSSSSSS 33 3/4'
#HHHH and SSSSSS are random names I put in place for this question. This is the data I need to keep.
#Length of VLANList = 14 in this demo
i = 0
while i < len(VLANList):
VLANLine[i].split(" ")
del VLAN[i][1:]
Error below
Traceback (most recent call last):
File "<pyshell#16>", line 2, in <module>
VLANLine[i].split(" ")
IndexError: string index out of range
How can I dynamically split 'VLANLine##' and then delete out everything after the space? I may be going at this all wrong too. I just started working with python a few weeks ago.

This may work for you.
VLAN_clean = [v[0:v.find(' ')] for v in VLANList if v.find(' ') != -1]

str.split does what you need cleanly:
VLANList = [
'GGGGGGGGG 5 5/7',
'HHHH 66 22/23',
'SSSSSSS 33 3/4',
]
VLAN_Clean = [v.split()[0] for v in VLANList]
print(VLAN_Clean)
Output:
['GGGGGGGGG', 'HHHH', 'SSSSSSS']
split will split each string at the first space character, returning a tuple of values. If there is no blank, it will simply return a tuple of length 1 containing the entire string. So, running split on each item, then selecting the first item from the resulting tuple gives you the right thing.

Related

I want python to skip the list if it doesn't have more than 3 parts (space separated) while reading a file line

I'm making python read one file that makes all the lines of a file as a space-separated list current I'm facing an issue where I want python to read a list only if the content is more than 3 parts
I'm trying if int(len(list[3])) == 3: then read the 3 parts of the list but the program is giving the error of
`IndexError: list index out of range
it is usually given when I access something that doesn't exist but the line of code that read the 3rd part shouldn't run on a list without 3+ parts
You are probably looking for:
if len(list) > 3:
# read list
You don't need to convert len() to an int - it already is an int
list[3] gives you back fourth element of the list, you need to pass whole list object to len function
int == 3 will only catch numbers equal to 3, while you wanted all numbers above 3
I think is this:
def get_file_matrix(file:str):
with open(file,'r') as arq:
lines = arq.readlines()
#saniting
lines_clear = list(map(lambda line:line.strip(),lines))
lines_splited = list(map(lambda line:line.split(' '),lines_clear))
lines_filtered = list(filter(lambda line: len(line) <= 3,lines_splited ) )
return lines_filtered
r = get_file_matrix('test.txt')
print(r)

List index out of range - Python

I'm writing a short code (my first in python) to filter a large table.
import sys
gwas_annot = open('gwascatalog.txt').read()
gwas_entry_list = gwas_annot.split('\n')[1:-1]
# paste line if has value
for lines in gwas_entry_list:
entry_notes = lines.split('\t')
source_name = entry_notes[7]
if 'omega-6' in source_name:
print(entry_notes)
Basically I want to take the 'gwascatalog' table, parse it into lines and columns, search column 7 for a string ('omega-6' in this case) and if it contains it, print the entire line.
Right now it prints all the rows to the console but won't let me paste it into another file. It also gives me the error:
Traceback (most recent call last):<br>
File "gwas_parse.py", line 9, in <module><br>
source_name = entry_notes[7]<br>
IndexError: list index out of range
Unsure why there is an error. Anything obvious to fix?
Edit: Adding snippet from data.
You can secure yourself by checking the length of the list first.
if len(entry_notes) > 7:
source_name = entry_notes[7]
The list index out of range could be that you hit a row (line) where there are less than 7 columns.
# index 0 1 2 3 4 5 6 (... no 7)
columnsArray = ['one', 'two','three','four','five','six', 'seven']
So here, if you ask for array[7], you get a "list index out of range" error because the line that the for loop is currently on only goes up to index 6.
The error tells you it happens at "line 9", which is where "source_name = entry_notes[7]". I would suggest printing out the number of columns for each row on the table. You might notice that somewhere you have 7 columns instead of 8. I also think you mean to say column 8, but position(or index 7), since counting in python starts at 0.
Maybe add another "if" to only look for lines that have a len() of 8 or more.

ValueError: y contains new labels: ['#']

I have a list of lists with every list containing 1 up to 5 tags. I have constructed a list containing the top 50 tags. My goal is to construct a new list of lists where every list contains only the top 50 tags. My approach went like this:
First I constructed a new list of lists with only the top 50 tags:
top_50 = list(np.array(pd.read_csv(os.path.join(dir,"Tags.csv")))[:,1])
train = pd.read_csv(os.path.join(dir,"Train.csv"),iterator = True)
top_50 = top_50[:51]
tags = list(np.array(train.get_chunk(50000))[:,3])
top_50_tags = [[tag for tag in list if tag in top_50] for list in tags]
Then I tried to encode the tags:
coder = preprocessing.LabelEncoder()
coder = coder.fit(top_50)
tags = [coder.transform(tag) for tag in list for list in top_50_tags]
This however gave me this error:
Traceback (most recent call last):
File "C:\Users\Ano\workspace\final_submission\src\rf_test.py", line 69, in <module>
main()
File "C:\Users\Ano\workspace\final_submission\src\rf_test.py", line 33, in main
labels = [coder.transform(tag) for tag in list for list in top_50_tags]
File "C:\Python27\lib\site-packages\sklearn\preprocessing\label.py", line 120, in transform
raise ValueError("y contains new labels: %s" % str(diff))
ValueError: y contains new labels: ['#']
I think this error rises because some of my lists are empty, since there were no top 50 tags in them. But the error specifically states that ["#"] is the newly seen label. Am I right with my hypothesis? And what should I do with the error message?
Edit:
For the people wondering why I am using list as a variable in list comprehension, I actually use a different word as a variable in my real program.
Update
I checked for differences in my top_50 and the tags:
print(len(top_50.difference(tags)))
which gave me a length of 0. This should mean that my empty lists are the problem?
Maybe you can check this issue: https://github.com/scikit-learn/scikit-learn/issues/3123
In scikit-learn 0.17 version, this bug has been solved.

Python: How to extract string from text file to use as data

this is my first time writing a python script and I'm having some trouble getting started. Let's say I have a txt file named Test.txt that contains this information.
x y z Type of atom
ATOM 1 C1 GLN D 10 26.395 3.904 4.923 C
ATOM 2 O1 GLN D 10 26.431 2.638 5.002 O
ATOM 3 O2 GLN D 10 26.085 4.471 3.796 O
ATOM 4 C2 GLN D 10 26.642 4.743 6.148 C
What I want to do is eventually write a script that will find the center of mass of these three atoms. So basically I want to sum up all of the x values in that txt file with each number multiplied by a given value depending on the type of atom.
I know I need to define the positions for each x-value, but I'm having trouble with figuring out how to make these x-values be represented as numbers instead of txt from a string. I have to keep in mind that I'll need to multiply these numbers by the type of atom, so I need a way to keep them defined for each atom type. Can anyone push me in the right direction?
mass_dictionary = {'C':12.0107,
'O':15.999
#Others...?
}
# If your files are this structured, you can just
# hardcode some column assumptions.
coords_idxs = [6,7,8]
type_idx = 9
# Open file, get lines, close file.
# Probably prudent to add try-except here for bad file names.
f_open = open("Test.txt",'r')
lines = f_open.readlines()
f_open.close()
# Initialize an array to hold needed intermediate data.
output_coms = []; total_mass = 0.0;
# Loop through the lines of the file.
for line in lines:
# Split the line on white space.
line_stuff = line.split()
# If the line is empty or fails to start with 'ATOM', skip it.
if (not line_stuff) or (not line_stuff[0]=='ATOM'):
pass
# Otherwise, append the mass-weighted coordinates to a list and increment total mass.
else:
output_coms.append([mass_dictionary[line_stuff[type_idx]]*float(line_stuff[i]) for i in coords_idxs])
total_mass = total_mass + mass_dictionary[line_stuff[type_idx]]
# After getting all the data, finish off the averages.
avg_x, avg_y, avg_z = tuple(map( lambda x: (1.0/total_mass)*sum(x), [[elem[i] for elem in output_coms] for i in [0,1,2]]))
# A lot of this will be better with NumPy arrays if you'll be using this often or on
# larger files. Python Pandas might be an even better option if you want to just
# store the file data and play with it in Python.
Basically using the open function in python you can open any file. So you can do something as follows: --- the following snippet is not a solution to the whole problem but an approach.
def read_file():
f = open("filename", 'r')
for line in f:
line_list = line.split()
....
....
f.close()
From this point on you have a nice setup of what you can do with these values. Basically the second line just opens the file for reading. The third line define a for loop that reads the file one line at a time and each line goes into the line variable.
The last line in that snippet basically breaks the string --at every whitepsace -- into an list. So line_list[0] will be the value on your first column and so forth. From this point if you have any programming experience you can just use if statements and such to get the logic that you want.
** Also keep in mind that the type of values stored in that list will all be string so if you want to perform any arithmetic operations such as adding you have to be careful.
* Edited for syntax correction
If you have pandas installed, checkout the read_fwf function that imports a fixed-width file and creates a DataFrame (2-d tabular data structure). It'll save you lines of code on import and also give you a lot of data munging functionality if you want to do any additional data manipulations.

Comparing two lists items in python

I have two files which I loaded into lists. The content of the first file is something like this:
d.complex.1
23
34
56
58
68
76
.
.
.
etc
d.complex.179
43
34
59
69
76
.
.
.
etc
The content of the second file is also the same but with different numerical values. Please consider from one d.complex.* to another d.complex.* as one set.
Now I am interested in comparing each numerical value from one set of first file with each numerical value of the sets in the second file. I would like to record the number of times each numerical has appeared in the second file overall.
For example, the number 23 from d.complex.1 could have appeared 5 times in file 2 under different sets. All I want to do is record the number of occurrences of number 23 in file 2 including all sets of file 2.
My initial approach was to load them into a list and compare but I am not able to achieve this. I searched in google and came across sets but being a python noob, I need some guidance. Can anyone help me?
If you feel the question is not clear,please let me know. I have also pasted the complete file 1 and file 2 here:
http://pastebin.com/mwAWEcTa
http://pastebin.com/DuXDDRYT
Open the file using Python's open function, then iterate over all its lines. Check whether the line contains a number, if so, increase its count in a defaultdict instance as described here.
Repeat this for the other file and compare the resulting dicts.
First create a function which can load a given file, as you may want to maintain individual sets and also want to count occurrence of each number, best would be to have a dict for whole file where keys are set names e.g. complex.1 etc, for each such set keep another dict for numbers in set, below code explains it better
def file_loader(f):
file_dict = {}
current_set = None
for line in f:
if line.startswith('d.complex'):
file_dict[line] = current_set = {}
continue
if current_set is not None:
current_set[line] = current_set.get(line, 0)
return file_dict
Now you can easily write a function which will count a number in given file_dict
def count_number(file_dict, num):
count = 0
for set_name, number_set in file_dict.iteritems():
count += number_set.get(num, 0)
return count
e.g here is a usage example
s = """d.complex.1
10
11
12
10
11
12"""
file_dict = file_loader(s.split("\n"))
print file_dict
print count_number(file_dict, '10')
output is:
{'d.complex.1': {'11': 2, '10': 2, '12': 2}}
2
You may have to improve file loader, e.g. skip empty lines, convert to int etc

Categories