Python - Read in Comma Separated File, Create Two lists - python

New to Python here and I'm trying to learn/figure out the basics. I'm trying to read in a file in Python that has comma separated values, one to a line. Once read in, these values should be separated into two lists, one list containing the value before the "," on each line, and the other containing the value after it.
I've played around with it for quite a while, but I just can't seem to get it.
Here's what I have so far...
with open ("mid.dat") as myfile:
data = myfile.read().replace('\n',' ')
print(data)
list1 = [x.strip() for x in data.split(',')]
print(list1)
list2 = ?
List 1 creates a list, but it's not correct. List 2, I'm not even sure how to tackle.
PS - I have searched other similar threads on here, but none of them seem to address this properly. The file in question is not a CSV file, and needs to stay as a .dat file.
Here's a sample of the data in the .dat file:
113.64,889987.226
119.64,440987774.55
330.43,446.21
Thanks.

Use string slicing:
list1= []
list2 = []
with open ("mid.dat") as myfile:
for line in myfile:
line = line.split(",").rstrip()
list1.append( line[0])
list2.append( line[1])
Python's rstrip() method strips all kinds of trailing whitespace by default, so removes return carriage "\n" too

If you want to use only builtin packages, you can use csv.
import csv
with open("mid.dat") as myfile:
csv_records = csv.reader(myfile)
list1 = []
list2 = []
for row in csv_records:
list1.append(row[0])
list2.append(row[1])

Could try this, which creates lists of floats not strings however:
from ast import literal_eval
with open("mid.dat") as f:
list1, list2 = map(list, (zip(*map(literal_eval, f.readlines()))))
Can be simplified if you don't mind list1 and list2 as tuples.
The list(*zip(*my_2d_list)) pattern is a pretty common way of transposing 2D lists using only built-in functions. It's useful in this scenario because it's easy to obtain a list (call this result) of tuples on each line in the file (where result[0] would be the first tuple, and result[n] would be the nth), and then transpose result (call this resultT) such that resultT[0] would be all the 'left values' and resultT[1] would be the 'right values'.

let's keep it very simple.
list1 = []
list2 = []
with open ("mid.dat") as myfile:
for line in myfile:
x1,x2 = map(float,line.split(','))
list1.append(x1)
list2.append(x2)
print(list1)
print(list2)

You could do this with pandas.
import pandas as pd
df = pd.read_csv('data.csv', columns=['List 1','List 2'])
If your data is a text file the respective function also exists in the pandas package. Pandas is a very powerful tool for data such as yours.
After doing so you can split your data into two independent dataframes.
list1 = df['List 1']
list2 = df['List 2']
I would stick to a dataframe because data manipulation and analysis is much easier within the pandas framework.

Here is my suggestion to be short and readable, without any additional packages to install:
with open ("mid.dat") as myfile:
listOfLines = [line.rstrip().split(',') for line in myfile]
list1 = [line[0] for line in listOfLines]
list2 = [line[1] for line in listOfLines]ility
Note: I used rstrip() to remove the end of line character.

Following is a solution obtained by correcting your own attempt:
with open("test.csv", "r") as myfile:
datastr = myfile.read().replace("\n",",")
datalist = datastr.split(",")
list1 = []; list2=[]
for i in range(len(datalist)-1): # ignore empty last item of list
if i%2 ==0:
list1.append(datalist[i])
else:
list2.append(datalist[i])
print(list1)
print(list2)
Output:
['113.64', '119.64', '330.43']
['889987.226', '440987774.55', '446.21']

Related

Segment a txt file column to lists

I am have text file with multiple columns and separated by commas.
I am trying to read it and put each column into it's own separate list but I can't seem to do it.
What I've done so far:
with open(file, 'r') as file_test:
file_lines = file_test.readlines()
file_strip = [line.strip("\n") for line in file_lines]
#I've split big list into separate lists within `file_strip`
file_columns= [file_strip [i:i + 1] for i in range(0, len(file_strip ), 1)][2:]
So now my data is as follows:
[['22AUG18 000000, 22AUG18 000149, 5.722, UOS2'], ['22JUL18 012703, 22JUL18 013810, 52.2811, UOS2']]
I don't know how to get rid of the ' in the beginning and end of each list too
I want the first element in each list to be in List1, 2nd element in each list to be in List2 etc...
Why not use the csv module? It was designed to do what you want to do!
import csv
with open(file, 'r') as file_test:
csv_test = csv.reader(file_test)
for row in csv_test:
print(row)
Will print
['22AUG18 000000', '22AUG18 000149', '5.722', 'UOS2']
['22JUL18 012703', '22JUL18 013810', '52.2811', 'UOS2']
If you want to separate that in lists you can zip() it:
with open(file, 'r') as file_test:
csv_test = csv.reader(file_test)
list1, list2, list3, list4 = zip(*csv_test)

How to convert txt file into 2d array of each char

I am trying to read a text file I created, which looks like this:
small.txt
%%%%%%%%%%%%%%%%%%%%%%%
%eeeeeee%eeeee%eeeee%G%
%%%e%e%%%%%e%e%%%e%e%e%
%e%e%eeeeeee%eee%e%eee%
%e%e%e%e%%%e%%%e%e%%%e%
%eeeee%eee%eeeeeeeee%e%
%e%%%e%e%e%e%e%e%%%%%e%
%e%e%eee%e%e%eeeeeee%e%
%e%e%e%%%e%%%%%e%e%%%e%
%Pee%eeeeeeeee%e%eeeee%
%%%%%%%%%%%%%%%%%%%%%%%
I want to create a a 2D array board[21][11] in the specific situation.
I want each char to be in a cell, because I want to implement BFS and other algorithms to reach a specific path, it's a kind of Pacman game.
Here is my code:
f = open("small.txt", "r")
output_list = []
for rec in f:
chars = rec.split()
print chars
inner_list = []
for each in chars:
inner_list.append(each)
output_list.append(inner_list)
print output_list
As you see the output i get now is [[%%%%%%%%%%%%%%%%%%%%%%%]]
You can just do:
with open('small.txt') as f:
board = f.readlines()
The file.readlines() method will return a list of strings, which you can then use as a 2D array:
board[1][5]
>>> 'e'
Note, that with this approach, the newline characters ('\n') will be put into each row at the last index. To get rid of them, you can use str.rstrip:
board = [row.rstrip('\n') for row in board]
As another answer noted, the line strings are already indexable by integer, but if you really want a list of lists:
array = [list(line.strip()) for line in f]
That removes the line endings and converts each string to a list.
There are a few problems with your code:
you try to split lines into lists of chars using split, but that will only split at spaces
assuming your indentation is correct, you are only ever treating the last value of chars in your second loop
that second loop just wraps each of the (not splitted) lines in chars (which due to the previous issue is only the last one) into a list
Instead, you can just convert str to list...
>>> list("abcde")
['a', 'b', 'c', 'd', 'e']
... and put those into output_list directly. Also, don't forget to strip the \n:
f = open("small.txt", "r")
output_list = []
for rec in f:
chars = list(rec.strip())
output_list.append(chars)
Or using with for autoclosing and a list-comprehension:
with open("small.txt") as f:
output_list = [list(line.strip()) for line in f]
Note, however, that is you do not want to change the values in that grid, you do not have to convert to a list of lists of chars at all; a list of strings will work just as well.
output_list = list(map(str.strip, f))

List of lists (not just list) in Python

I want to make a list of lists in python.
My code is below.
import csv
f = open('agGDPpct.csv','r')
inputfile = csv.DictReader(f)
list = []
next(f) ##Skip first line (column headers)
for line in f:
array = line.rstrip().split(",")
list.append(array[1])
list.append(array[0])
list.append(array[53])
list.append(array[54])
list.append(array[55])
list.append(array[56])
list.append(array[57])
print list
I'm pulling only select columns from every row. My code pops this all into one list, as such:
['ABW', 'Aruba', '0.506252445', '0.498384331', '0.512418427', '', '', 'AND', 'Andorra', '', '', '', '', '', 'AFG', 'Afghanistan', '30.20560247', '27.09154001', '24.50744042', '24.60324707', '23.96716227'...]
But what I want is a list in which each row is its own list: [[a,b,c][d,e,f][g,h,i]...] Any tips?
You are almost there. Make all your desired inputs into a list before appending. Try this:
import csv
with open('agGDPpct.csv','r') as f:
inputfile = csv.DictReader(f)
list = []
for line in inputfile:
list.append([line[1], line[0], line[53], line[54], line[55], line[56], line[57]])
print list
To end up with a list of lists, you have to make the inner lists with the columns from each row that you want, and then append that list to the outer one. Something like:
for line in f:
array = line.rstrip().split(",")
inner = []
inner.append(array[1])
# ...
inner.append(array[57])
list.append(inner)
Note that it's also not a good practice to use the name of the type ("list") as a variable name -- this is called "shadowing", and it means that if you later try to call list(...) to convert something to a list, you'll get an error because you're trying to call a particular instance of a list, not the list built-in.
To build on csv module capabilities, I'll do
import csv
f = csv.reader(open('your.csv'))
next(f)
list_of_lists = [items[1::-1]+items[53:58] for items in f]
Note that
items is a list of items, thanks to the intervention of a csv.reader() object;
using slice addressing returns sublists taken from items, so that the + operator in this context means concatenation of lists
the first slice expression 1::-1means from 1 go to the beginning moving backwards, or [items[1], items[0]].
Referring to https://docs.python.org/2/library/csv.html#csv.DictReader
Instead of
for line in f:
Write
for line in inputfile:
And also use list.append([array[1],array[0],array[53],..]) to append a list to a list.
One more thing, referring to https://docs.python.org/2/library/stdtypes.html#iterator.next , use inputfile.next() instead of next(f) .
After these changes, you get:
import csv
f = open('agGDPpct.csv','r')
inputfile = csv.DictReader(f)
list = []
inputfile.next() ##Skip first line (column headers)
for line in inputfile:
list.append([array[1],array[0],array[53],array[54],array[55],array[56],array[57]])
print list
In addition to that, it is not a good practice to use list as a variable name as it is a reserved word for the data structure of the same name. Rename that too.
You can further improve the above code using with . I will leave that to you.
Try and see if it works.

How do I get my list to print out every line?

I'm a little confused as to why this is not working. I'm trying to get my program to read every line out a csv file change it from a string to a float and then print it out line by line.
csv_list = open('example_data.csv','rb')
lists= csv_list.readlines()
csv_list.close()
for lines in lists:
lists_1 = lists.strip().split()
list_2 = [float(x) for x in lists_1]
print list_2
Any help would be appreciated.
First, don't use readlines. Simply iterate over file
for lines in csv_list:
...
second, use csv library for reading http://docs.python.org/2/library/csv.html
In your exapmple, it is csv, so don't split by whitespace but comma or semicolon.
Try this:
import pprint
with open('example_data.csv','rb') as csv_list:
lists= csv_list.readlines()
lists_1 = []
lists_2 = []
for lines in lists:
lists_1.append(lines.strip().split())
list_2.append([float(x) for x in lists_1])
pprint.pprint(list_2)
for lines in lists:
lists_1 = lines.strip().split() # 'lines' here
list_2 = [float(x) for x in lists_1]
print list_2 # print your list in a loop
print list_2 needs to be indented to the same level as the rest of the loop and it should be lines.strip().split()
print list_2 is outside of the for loop. You need to indent it.
Judging by the file name, I assume that the fields in your file is separated by comma. If that is the case, you need to split the line using the comma:
lists_1 = lists.strip().split(',')
Better yet, use the csv module. Here is an example:
import csv
with open('example_data.csv', 'rb') as f:
csvreader = csv.reader(f)
for line in csvreader:
line = [float(x) for x in line] # line is now a list of floats
print line

Create list of tuples (in a more elegant way)

I am writing a python script to read a file which consists of three columns separated by commas, create a tuple of each line, and make a list of these tuples. With the following script I achieve what I want; I was just wondering whether there is an easier / more elegant approach than writing each of the following steps in a seperate line.
import sys
fin=open(sys.argv[1],'r')
list = []
for line1 in fin:
line2 = line1[:-1]
line3 = line2.split(',')
line4 = tuple(line3)
list.append(line4)
print(list)
Thank you for your answers.
Using a list comprehension:
lst = [tuple(line.rstrip().split(',')) for line in fin]
(Don't name your variables list; it shadows the built-in and can lead to unexpected bugs).
Python comes with batteries included! If you need to read csv files, just use the csv module:
import sys, csv
with open(sys.argv[1]) as f:
lst = list(csv.reader(f))
Note that this creates a list of lists, if you want tuples for some reason, then
with open(sys.argv[1]) as f:
lst = [tuple(row) for row in csv.reader(f)]

Categories