Segment a txt file column to lists - python

I am have text file with multiple columns and separated by commas.
I am trying to read it and put each column into it's own separate list but I can't seem to do it.
What I've done so far:
with open(file, 'r') as file_test:
file_lines = file_test.readlines()
file_strip = [line.strip("\n") for line in file_lines]
#I've split big list into separate lists within `file_strip`
file_columns= [file_strip [i:i + 1] for i in range(0, len(file_strip ), 1)][2:]
So now my data is as follows:
[['22AUG18 000000, 22AUG18 000149, 5.722, UOS2'], ['22JUL18 012703, 22JUL18 013810, 52.2811, UOS2']]
I don't know how to get rid of the ' in the beginning and end of each list too
I want the first element in each list to be in List1, 2nd element in each list to be in List2 etc...

Why not use the csv module? It was designed to do what you want to do!
import csv
with open(file, 'r') as file_test:
csv_test = csv.reader(file_test)
for row in csv_test:
print(row)
Will print
['22AUG18 000000', '22AUG18 000149', '5.722', 'UOS2']
['22JUL18 012703', '22JUL18 013810', '52.2811', 'UOS2']
If you want to separate that in lists you can zip() it:
with open(file, 'r') as file_test:
csv_test = csv.reader(file_test)
list1, list2, list3, list4 = zip(*csv_test)

Related

Spliting list till finding an element

I'm reading a file that has lines like these:
2SomethingHere
3Whatever
3Whatever
4foo
4bar
5baz
2SomethingHere
3Whatever
3Whatever
4foo
4bar
5baz
This is a test file, and I've been reading like:
file = open('data.txt', 'r')
contents = file.readlines()
In order to separate lines and getting them into a list. But I want to separate this list into a list of lists like this:
main_list = [['2SomethingHere', '3Whatever', '3Whatever', '4foo', '4baz', '5baz'], ['2SomethingHere', '3Whatever', '3Whatever', '4foo', '4baz', '5baz']]
Being 2 at the beggining of one element the start of a new list.
I've been trying this:
from itertools import groupby
result = [list(g) for k,g in groupby(contents,lambda x:x.startswith('2')) if k]
But the result is showing only the elements starting with 2
I want all the elements following this 2 until finding another.
If you know that the file will start with a 2 on the first line, then you can just do:
file = open('data.txt', 'r')
contents = file.readlines()
print(contents)
main_list = []
for el in contents:
if el.startswith("2"):
main_list.append([]) # add a new sub-list
main_list[-1].append(el.strip()) # add line (without leading/trailing whitespace) to the last sub-list
print(main_list)
but if it might not, then you would have to do something like:
main_list = [[]]
for el in contents:
if el.startswith("2") and main_list[-1]:
main_list.append([])
main_list[-1].append(el.strip())
so that the start is handled a little bit differently: an initial sublist is already present ready for the items, even if the first line does not start with "2", but if the first line does start with 2, then it does not immediately move onto a new sub-list (which would leave an empty sub-list at the start of the output).
If your trying to group the lines by the first character then:
import itertools
with open("test.txt", "r") as fp:
lines = fp.readlines()
groups = itertools.groupby(lines, key=lambda line: line[:1])
results = [list(g) for k, g in groups if k]
print(results)

Python - Read in Comma Separated File, Create Two lists

New to Python here and I'm trying to learn/figure out the basics. I'm trying to read in a file in Python that has comma separated values, one to a line. Once read in, these values should be separated into two lists, one list containing the value before the "," on each line, and the other containing the value after it.
I've played around with it for quite a while, but I just can't seem to get it.
Here's what I have so far...
with open ("mid.dat") as myfile:
data = myfile.read().replace('\n',' ')
print(data)
list1 = [x.strip() for x in data.split(',')]
print(list1)
list2 = ?
List 1 creates a list, but it's not correct. List 2, I'm not even sure how to tackle.
PS - I have searched other similar threads on here, but none of them seem to address this properly. The file in question is not a CSV file, and needs to stay as a .dat file.
Here's a sample of the data in the .dat file:
113.64,889987.226
119.64,440987774.55
330.43,446.21
Thanks.
Use string slicing:
list1= []
list2 = []
with open ("mid.dat") as myfile:
for line in myfile:
line = line.split(",").rstrip()
list1.append( line[0])
list2.append( line[1])
Python's rstrip() method strips all kinds of trailing whitespace by default, so removes return carriage "\n" too
If you want to use only builtin packages, you can use csv.
import csv
with open("mid.dat") as myfile:
csv_records = csv.reader(myfile)
list1 = []
list2 = []
for row in csv_records:
list1.append(row[0])
list2.append(row[1])
Could try this, which creates lists of floats not strings however:
from ast import literal_eval
with open("mid.dat") as f:
list1, list2 = map(list, (zip(*map(literal_eval, f.readlines()))))
Can be simplified if you don't mind list1 and list2 as tuples.
The list(*zip(*my_2d_list)) pattern is a pretty common way of transposing 2D lists using only built-in functions. It's useful in this scenario because it's easy to obtain a list (call this result) of tuples on each line in the file (where result[0] would be the first tuple, and result[n] would be the nth), and then transpose result (call this resultT) such that resultT[0] would be all the 'left values' and resultT[1] would be the 'right values'.
let's keep it very simple.
list1 = []
list2 = []
with open ("mid.dat") as myfile:
for line in myfile:
x1,x2 = map(float,line.split(','))
list1.append(x1)
list2.append(x2)
print(list1)
print(list2)
You could do this with pandas.
import pandas as pd
df = pd.read_csv('data.csv', columns=['List 1','List 2'])
If your data is a text file the respective function also exists in the pandas package. Pandas is a very powerful tool for data such as yours.
After doing so you can split your data into two independent dataframes.
list1 = df['List 1']
list2 = df['List 2']
I would stick to a dataframe because data manipulation and analysis is much easier within the pandas framework.
Here is my suggestion to be short and readable, without any additional packages to install:
with open ("mid.dat") as myfile:
listOfLines = [line.rstrip().split(',') for line in myfile]
list1 = [line[0] for line in listOfLines]
list2 = [line[1] for line in listOfLines]ility
Note: I used rstrip() to remove the end of line character.
Following is a solution obtained by correcting your own attempt:
with open("test.csv", "r") as myfile:
datastr = myfile.read().replace("\n",",")
datalist = datastr.split(",")
list1 = []; list2=[]
for i in range(len(datalist)-1): # ignore empty last item of list
if i%2 ==0:
list1.append(datalist[i])
else:
list2.append(datalist[i])
print(list1)
print(list2)
Output:
['113.64', '119.64', '330.43']
['889987.226', '440987774.55', '446.21']

List of lists (not just list) in Python

I want to make a list of lists in python.
My code is below.
import csv
f = open('agGDPpct.csv','r')
inputfile = csv.DictReader(f)
list = []
next(f) ##Skip first line (column headers)
for line in f:
array = line.rstrip().split(",")
list.append(array[1])
list.append(array[0])
list.append(array[53])
list.append(array[54])
list.append(array[55])
list.append(array[56])
list.append(array[57])
print list
I'm pulling only select columns from every row. My code pops this all into one list, as such:
['ABW', 'Aruba', '0.506252445', '0.498384331', '0.512418427', '', '', 'AND', 'Andorra', '', '', '', '', '', 'AFG', 'Afghanistan', '30.20560247', '27.09154001', '24.50744042', '24.60324707', '23.96716227'...]
But what I want is a list in which each row is its own list: [[a,b,c][d,e,f][g,h,i]...] Any tips?
You are almost there. Make all your desired inputs into a list before appending. Try this:
import csv
with open('agGDPpct.csv','r') as f:
inputfile = csv.DictReader(f)
list = []
for line in inputfile:
list.append([line[1], line[0], line[53], line[54], line[55], line[56], line[57]])
print list
To end up with a list of lists, you have to make the inner lists with the columns from each row that you want, and then append that list to the outer one. Something like:
for line in f:
array = line.rstrip().split(",")
inner = []
inner.append(array[1])
# ...
inner.append(array[57])
list.append(inner)
Note that it's also not a good practice to use the name of the type ("list") as a variable name -- this is called "shadowing", and it means that if you later try to call list(...) to convert something to a list, you'll get an error because you're trying to call a particular instance of a list, not the list built-in.
To build on csv module capabilities, I'll do
import csv
f = csv.reader(open('your.csv'))
next(f)
list_of_lists = [items[1::-1]+items[53:58] for items in f]
Note that
items is a list of items, thanks to the intervention of a csv.reader() object;
using slice addressing returns sublists taken from items, so that the + operator in this context means concatenation of lists
the first slice expression 1::-1means from 1 go to the beginning moving backwards, or [items[1], items[0]].
Referring to https://docs.python.org/2/library/csv.html#csv.DictReader
Instead of
for line in f:
Write
for line in inputfile:
And also use list.append([array[1],array[0],array[53],..]) to append a list to a list.
One more thing, referring to https://docs.python.org/2/library/stdtypes.html#iterator.next , use inputfile.next() instead of next(f) .
After these changes, you get:
import csv
f = open('agGDPpct.csv','r')
inputfile = csv.DictReader(f)
list = []
inputfile.next() ##Skip first line (column headers)
for line in inputfile:
list.append([array[1],array[0],array[53],array[54],array[55],array[56],array[57]])
print list
In addition to that, it is not a good practice to use list as a variable name as it is a reserved word for the data structure of the same name. Rename that too.
You can further improve the above code using with . I will leave that to you.
Try and see if it works.

How to read data from text file in Python using a colon delimeter

Let's say I have a text file that contains data such as this:
data1:data2
data1:data2
data1:data2
data1:data2
I want to split this data into two separate arrays. One array containing the data from the left hand side of the colon, and the other containing the data from the right hand side.
What would be the most efficient way of going about it?
Easyest way is just to split each line on the colon and append to two seperate arrays
Example:
infile = open(listfile,'r')
filecontent = infile.readlines()
infile.close()
array1 = []
array2 = []
for line in filecontent:
tmp = line.strip().split(':')
array1.append(tmp[0])
array2.append(tmp[1])
A few list comprehensions can do this pretty handily.
with open(filename) as f:
lists = [line.strip().split(':') for line in f.readlines()]
listOne = [line[0] for line in lists]
listTwo = [line[1] for line in lists]
Storing lists and then separating it saves having to read through the whole file twice.

reading in csv lines and creating lists for each line

I have a text file with lines of X,Y coordinates. Like this:
0,23.345,-122.456
1,12.546,-118.987
2,67.435,-104.112
How can I bring these lines into python so each line is their own list when it comes in?
Each of those lines is a pair of coordinates, which equals one point. So I need to then compare line 0 to 1 and line 1 to 2 and so on. Wouldn't I want each of those lines to be a list so that I could access them?
This Python template will result in reading each .csv row into a list of lists.
import csv
reader = csv.reader(open('mycsv.csv'))
mylines = list(reader)
import csv
with open("csvfile.csv", "rb") as f:
lines = list(csv.reader(f))
>>> lines
[['0', '23.345', '-122.456'], ['1', '12.546', '-118.987'], ['2', '67.435', '-104.112']]
matrix = []
line = fileHandle.readline()
while (line) :
currentList = line.strip().split(",")
matrix.append(currentList)
line = fileHandle.readline()
This will end with a list of lists where each internal list is a list of the different elements of the line. The line of the group will be the index in the matrix (0 based).

Categories