Split method giving unexpected output

Split method giving unexpected output - python

I read a line as below from a CSV file:
GsmUart: enabled 2015-08-13T16:57:14.558000 0.072651
Note, each of these entries in the line were delimited by '\t' when the CSV was written.
Problem: I want to extract the 0.072651.
I've tried:
print(str(line)) gives the entire line.
line.split('\t')[1].split('\t')[0] gives the timestamp in between.
line.split('\t')[1].split('\t')[1] gives IndexError: list index out of range.

The first line.split('\t') has already split on all the tabs so there are no more to split on in your second call, that means you only have one element in the list that the second .split('\t') returns giving you an IndexError trying to index a non existent second element.
what you want is the last element from the first split to get 0.072651:
line.split('\t')[-1]
You could also use the csv module to read your file passing a tab as the delimiter:
import csv
with open("your_file") as f:
r = csv.reader(f,delimiter="\t")
for a,b,c in row:
# ...

Related

Read and filter column names from file using Python

I was new to python, my requirement is to fetch the column names from the file.
File may contains following types of contents:
OPTIONS ( SKIP=1)
LOAD DATA
TRAILING NULLCOLS
(
A_TEST NULLIF TEST=BLANKS,
B_TEST NULLIF TEST=BLANKS,
C_TEST NULLIF TEST=BLANKS,
CREATE_DT DATE 'YYYYMMDDHH24MISS' NULLIF OPENING_DT=BLANKS,
D_CST CONSTANT 'FNAMELOAD'
)
I need to fetch the data after the second open brackets and the first not empty string of each line which has the next value not like CONSTANT .
So for the above formatted file, my expected output will be:
A_TEST,B_TEST,C_TEST,CREATE_DT.

You could do something like this:
f = open("data.txt", "r")
data=f.read()
from_index=data.rfind('(' ) # find index of the last occurence of the bracket
data_sel=data[from_index:] # select just chunk of data, starting from specified index
lst=data_sel.split('\n') #split by the new line
for line in lst:
if line!='(' and line!=')' and "CONSTANT" not in line: # conditions, you will maybe have to tweak it, but here is some basic logic
print(line.split(' ')[0]) # print the first element of the created array, or place it in the list, or something...

Why am I getting an IndexError from a for loop?

I'm writing code that will take dates and numeric values from a csv file and compare them.
date_location = 3
numeric_location = 4
with open('file1.csv', 'r') as f1:
next(f1)
with open('file2.csv', 'r') as f2:
next(f2)
for i in (f1):
f1_date = (i.split()[date_location])
f1_number = (i.split()[numeric_location])
for j in (f2):
f2_date = (j.split()[date_location])
f2_number = (j.split()[numeric_location])
print(f1_date, f1_number)
print(f2_date, f2_number)
if f1_date == f2_date:
print(f1_date == f2_date)
if f2_number > f1_number:
print('WIN')
continue
elif f2_number <= f1_number:
print('lose')
f2.seek(0, 0)`
I get this error IndexError: list index out of range for f1_date = (i.split()[date_location]), which i assume will also affect:
f1_number = (i.split()[numeric_location])
f2_date = (j.split()[date_location])
f2_number = (j.split()[numeric_location])
Can anyone explain why? I haven't found a way to make it so this error doesn't show.
EDIT: I forgot to change the separator for .split() after messing around with the for loop using text files

Two main possibilities:
1) Your csv files are not space delimited, and as the default separator for .split() is " ", you will not have at least 4 space-separated items in i.split() (or 5 for numeric_location).
2) Your csv is space delimited, but is ragged, i.e. it has incomplete rows, so for some row, there is no data for column 4.
I also highly suggest using a library for reading csvs. csv is in the standard library, and pandas has built-in handling of ragged lines.

f1_number = (i.split()[numeric_location])
This is doing a lot in a single line. I suggest you split this into two lines:
f1 = i.split()
f1_number = f1[numeric_location]
f1_date = f1[date_location]
Now you will see which of these causes the problem. You should add a print(f1) to see the value after the split. Most likely it doesn't have as many elements as you think it does. Or your indexes are off from what they should be.

The call to i.split() is going to generate a new list, which will contain each word of the from the string i. So
"this is an example".split() == ["this", "is", "an", "example"]
You are trying to access the third element of the resulting list, and the index error tells you that this list has less than four members. I suggest printing the result of i.split(). Very likely this is either an off by one error, or the first line of your file contains something different than what you are expecting.
Also split() by default will split on whitespace, given that you have a csv you may have wanted to do split(',').

The error is happening because you only have one element in the case of i.split()
But date_location is equal to 3.
You need to add a separator based on your csv file in the str.split method.
You can read more about it here

Reading an nth line of a textfile in python determined from a list

I have a function gen_rand_index that generates a random group of numbers in list format, such as [3,1] or [3,2,1]
I also have a textfile that that reads something like this:
red $1
green $5
blue $6
How do I write a function so that once python generates this list of numbers, it automatically reads that # line in the text file? So if it generated [2,1], instead of printing [2,1] I would get "green $5, red $1" aka the second line in the text file and the first line in the text file?
I know that you can do print(line[2]) and commands like that, but this won't work in my case because each time I am getting a different random number of a line that I want to read, it is not a set line I want to read each time.
row = str(result[gen_rand_index]) #result[gen_rand_index] gives me the random list of numbers
file = open("Foodinventory.txt", 'r')
for line in file:
print(line[row])
file.close()
I have this so far, but I am getting this
error: invalid literal for int() with base 10: '[4, 1]'
I also have gotten
TypeError: string indices must be integers
butI have tried replacing str with int and many things like that but I'm thinking the way I'm just approaching this is wrong. Can anyone help me? (I have only been coding for a couple days now so I apologize in advance if this question is really basic)

Okay, let us first get some stuff out of the way
Whenever you access something from a list the thing you put inside the box brackets [] should be an integer, eg: [5]. This tells Python that you want the 5th element. It cannot ["5"] because 5 in this case would be treated as a string
Therefore the line row = str(result[gen_rand_index]) should actually just be row = ... without the call to str. This is why you got the TypeError about list indices
Secondly, as per your description gen_rand_index would return a list of numbers.
So going by that, why don;t you try this
indices_to_pull = gen_rand_index()
file_handle = open("Foodinventory.txt", 'r')
file_contents = file_handle.readlines() # If the file is small and simle this would work fine
answer = []
for index in indices_to_pull:
answer.append(file_contents[index-1])
Explanation
We get the indices of the file lines from gen_rand_index
we read the entire file into memory using readlines()
Then we get the lines we want, Rememebr to subtract 1 as the list is indexed from 0

The error you are getting is because you're trying to index a string variable (line) with a string index (row). Presumably row will contain something like '[2,3,1]'.
However, even if row was a numerical index, you're not indexing what you think you're indexing. The variable line is a string, and it contains (on any given iteration) one line of the file. Indexing this variable will give you a single character. For example, if line contains green $5, then line[2] will yield 'e'.
It looks like your intent is to index into a list of strings, which represent all the lines of the file.
If your file is not overly large, you can read the entire file into a list of lines, and then just index that array:
with open('file.txt') as fp:
lines = fp.readlines()
print(lines[2]).
In this case, lines[2] will yield the string 'blue $6\n'.
To discard the trailing newline, use lines[2].strip() instead.

I'll go line by line and raise some issues.
row = str(result[gen_rand_index]) #result[gen_rand_index] gives me the random list of numbers
Are you sure it is gen_rand_index and not gen_rand_index()? If gen_rand_index is a function, you should call the function. In the code you have, you are not calling the function, instead you are using the function directly as an index.
file = open("Foodinventory.txt", 'r')
for line in file:
print(line[row])
file.close()
The correct python idiom for opening a file and reading line by line is
with open("Foodinventory.txt.", "r") as f:
for line in f:
...
This way you do not have to close the file; the with clause does this for you automatically.
Now, what you want to do is to print the lines of the file that correspond to the elements in your variable row. So what you need is an if statement that checks if the line number you just read from the file corresponds to the line number in your array row.
with open("Foodinventory.txt", "r") as f:
for i, line in enumerate(f):
if i == row[i]:
print(line)
But this is wrong: it would work only if your list's elements are ordered. That is not the case in your question. So let's think a little bit. You could iterate over your file multiple times, and each time you iterate over it, print out one line. But this will be inefficient: it will take time O(nm) where n==len(row) and m == number of lines in your file.
A better solution is to read all the lines of the file and save them to an array, then print the corresponding indices from this array:
arr = []
with open("Foodinventory.txt", "r") as f:
arr = list(f)
for i in row:
print(arr[i - 1]) # arrays are zero-indiced

Python Value Error when attempting to remove an element from a list

This program is suppose to take a text file and convert its contents into a list where each element is a line from the original text file. from there I want to be able to see if certain websites that are in another list are contained with in an element, if so remove that element from the list. I keep getting a ValueError
with open(hosts_temp, 'r+') as file1:
content = file1.read()
x = content.splitlines() #convert contents of file1 in to a list of strings.
for element in x:
for site in websites_list:
if site in element:
x.remove(element)
else:
pass
Here is the error im getting:
ValueError: list.remove(x): x not in list

The problem is that you are removing the line from the line array then trying to access it again.
For example if you have a website list of
website_list = ["google","facebook"]
and your x (list of lines) is
["First sentence","Second sentence containing google","Last sentence"]
Looking at this loop
for site in websites_list:
You would remove the second sentence from x because you matched google. However, you would also then try to check if the second sentence contained "facebook". Because you already removed the second sentence from your x list, you will get an error.
I would recommend reading the file line by line instead of grabbing all the lines at once. If it is a line without a website name, then add it to a valid list collection.
Another pythonic way to solve this is to use list comprehension
if your input is not large
with open(hosts_temp, 'r+') as file1:
content = file1.read()
x = content.splitlines()
x = [line for line in x if all(w not in line for w in websites_list)]
It is good practice to be very careful when iterating over an collection and adding/deleting elements along the way.

Truncate a column of a csv file?

I'm new to Python and I have the following csv file (let's call it out.csv):
DATE,TIME,PRICE1,PRICE2
2017-01-15,05:44:27.363000+00:00,0.9987,1.0113
2017-01-15,13:03:46.660000+00:00,0.9987,1.0113
2017-01-15,21:25:07.320000+00:00,0.9987,1.0113
2017-01-15,21:26:46.164000+00:00,0.9987,1.0113
2017-01-16,12:40:11.593000+00:00,,1.0154
2017-01-16,12:40:11.593000+00:00,1.0004,
2017-01-16,12:43:34.696000+00:00,,1.0095
and I want to truncate the second column so the csv looks like:
DATE,TIME,PRICE1,PRICE2
2017-01-15,05:44:27,0.9987,1.0113
2017-01-15,13:03:46,0.9987,1.0113
2017-01-15,21:25:07,0.9987,1.0113
2017-01-15,21:26:46,0.9987,1.0113
2017-01-16,12:40:11,,1.0154
2017-01-16,12:40:11,1.0004,
2017-01-16,12:43:34,,1.0095
This is what I have so far..
with open('out.csv','r+b') as nL, open('outy_3.csv','w+b') as nL3:
new_csv = []
reader = csv.reader(nL)
for row in reader:
time = row[1].split('.')
new_row = []
new_row.append(row[0])
new_row.append(time[0])
new_row.append(row[2])
new_row.append(row[3])
print new_row
nL3.writelines(new_row)
I can't seem to get a new line in after writing each line to the new csv file.
This definitely doesnt look or feel pythonic
Thanks

The missing newlines issue is because the file.writelines() method doesn't automatically add line separators to the elements of the argument it's passed, which it expects to be an sequence of strings. If these elements represent separate lines, then it's your responsibility to ensure each one ends in a newline.
However, your code is tries to use it to only output a single line of output. To fix that you should use file.write() instead because it expects its argument to be a single string—and if you want that string to be a separate line in the file, it must end with a newline or have one manually added to it.
Below is code that does what you want. It works by changing one of the elements of the list of strings that the csv.reader returns in-place, and then writes the modified list to the output file as single string by join()ing them all back together, and then manually adds a newline the end of the result (stored in new_row).
import csv
with open('out.csv','rb') as nL, open('outy_3.csv','wt') as nL3:
for row in csv.reader(nL):
time_col = row[1]
try:
period_location = time_col.index('.')
row[1] = time_col[:period_location] # only keep characters in front of period
except ValueError: # no period character found
pass # leave row unchanged
new_row = ','.join(row)
print(new_row)
nL3.write(new_row + '\n')
Printed (and file) output:
DATE,TIME,PRICE1,PRICE2
2017-01-15,05:44:27,0.9987,1.0113
2017-01-15,13:03:46,0.9987,1.0113
2017-01-15,21:25:07,0.9987,1.0113
2017-01-15,21:26:46,0.9987,1.0113
2017-01-16,12:40:11,,1.0154
2017-01-16,12:40:11,1.0004,
2017-01-16,12:43:34,,1.0095

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Split method giving unexpected output - python

Related

Read and filter column names from file using Python

Why am I getting an IndexError from a for loop?

Reading an nth line of a textfile in python determined from a list

Python Value Error when attempting to remove an element from a list

Truncate a column of a csv file?

Categories

Resources