Slicing each line from data CSV file - python

I have a movie dataset that looks like this:
1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
2,Jumanji (1995),Adventure|Children|Fantasy
3,Grumpier Old Men (1995),Comedy|Romance
4,Waiting to Exhale (1995),Comedy|Drama|Romance
5,Father of the Bride Part II (1995),Comedy
6,Heat (1995),Action|Crime|Thriller
7,Sabrina (1995),Comedy|Romance
8,Tom and Huck (1995),Adventure|Children
I want to extract only the last part (genres part, e.g, Adventure|Animation|Children|Comedy|Fantasy) and store them in a list list[Adventure, Animation, Children, Comedy, Fantasy]. However, I am still stuck at slicing step. I don't know how to do that since line[:-1] doesn't slice. I use Python 2.7
with open(path + 'movie.csv') as f:
for line in f:
print line[:-1]

with open(path + 'movie.csv') as f:
for line in f:
print line.split(',')[:-1].rstrip('\n').split('|')

Your slice will return the last character of each line, since the lines are not splitted when you read the file in regular manner. You should read the file using csv module that separates the lines automatically with ',' delimiter. Then split the result with |.
import csv
with open(path + 'movie.csv') as f:
reader = csv.reader(f, delimiter=',')
for row in reader:
print(row[-1].split('|'))

Related

Converting a Text .txt document to CSV .csv Using a Delimiter

I'd like to create a CSV from a TXT file. I have a text file with lines (300 lines+) separated by backslashes. I'd like each line to be a separate row, and each backslash to be a separate new column.
The text file looks like:
example 1\example 2\example 3\example 4
test 1\test 2\test 3\test 4
I'd like the CSV to look like:
Example 1
Example 2
Example 3
Example 4
Test 1
Test 2
Test 3
Test 4
So far I have:
import csv
with open('Report.txt') as report:
report_txt = report.read()
with open('Report.csv','w',newline='') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(report_txt)
I know I need to use \ as a delimiter, but I'm not sure how. Thanks for any help!
Define your delimiter like this (escape the \):
reader = csv.reader(open("Report.csv"), delimiter="\\")
Code:
import csv
with open('Report.txt') as report:
reader = csv.reader(report, delimiter="\\")
with open('Report_output.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
for line in reader:
writer.writerow(line)
First you got to split the string based on the delimeter. You can achieve this by using the split operator or regex.
import csv
with open('file.txt', 'r') as in_file:
stripped = (line.strip() for line in in_file)
lines = (line.split("\\") for line in stripped if line)
Then pretty much write it to the csv.
with open('report.csv', 'w') as out_file:
writer = csv.writer(out_file)
writer.writerows(lines)
Tweak your code accordingly. The concept is pretty much the same. Note the double backslash is to account for the escape character.
If you are just trying to convert that text into CSV, you can just replace every "\" character with ";" and you'll have a valid CSV file.
Else, if you want to do something with the parsed data before reexporting to CSV, you can read the file line by line and use the split() Method with "\", then rejoin and write line by line, like here:
with open('in.txt') as input_file:
with open('out.csv','a') as output_file:
txt_line = input_file.readline()
while txt_line:
cells = txt_line.split("\\")
# Do something with each cell...
csv_line = ";".join(cells)
output_file.write(csv_line)
txt_line = input_file.readline()

add a line in the “n” position of a csv loaded

I have not managed to achieve it. but what confuses me the most is how to add an element in a line "n" of my csv, for example I want to add a line in the line 2 of my csv.
mycsv.csv
name,last name
yeison, smith
lola, boa
elmo, spitia
anderson, exneider
juan, ortega
this is my code:
with open('mycsv.csv', 'w') as f:
#I need add "barney, cubides" on position [2] of my csv
f.write("barney, cubides") #not works properly..
how can do it?
You have to write to file after you read it. So read the whole csv and save each line as a list, insert your new line where you want to, and then re-write the whole file.
index_to_insert = 2
new_csv = []
new_line = "barney, cubides\n"
with open("mycsv.csv", "r") as f:
new_csv = f.readlines()
new_csv.insert(index_to_insert, new_line)
with open("mycsv.csv", "w") as f:
for line in new_csv:
f.write(line)
ps. You might want to get rid of the whitespaces before and after the commas in your csv file.

Python readlines() and append data to each line output to one line

I have a csv file with say 3 rows like this:
Dallas
Houston
Ft. Worth
What I want to do is be able to read those in and make links out of them but have all the lines output on one line. Example output would need to be like this:
Dallas Houston Ft. Worth
Here is the code I have thus far and it reads the csv file and outputs but it creates different rows, and I only want one row plus I need to append the html code for hyper links in.
f_in = open("data_files/states/major_cities.csv",'r')
for line in f_in.readlines():
f_out.write(line.split(",")[0]+"")
f_in.close()
f_out.close()
That's because each line in f_in.readlines() comes with a newline tacked on to the end. (Try adding a print(repr(line)) in that loop). What you need to do is remove that newline before write ing to f_out:
for line in f_in.readlines():
actual_line = line.rstrip('\n')
Your entire code would look like this:
import re
with open('data_files/states/major_cities.csv') as f_in:
with open('output_file.csv', 'w') as f_out:
for line in f_in:
city = line.rstrip('\n')
f_out.write('{}'.format(
re.sub(r'\W+', '-', city.lower()),
city
))
The with statements take care of closeing files, so you don't need those last two lines.
UPDATE
As J.F. Sebastian pointed out, it's also necessary to slugify the city name to achieve the output you want.
Try the python CSV module for handling CSV files
import csv
file_out = open('file.txt','w')
with open('example.csv','rb') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
col=row[0]
str="<a href=/" + col.strip().lower()
str+= "/>" + col + "</a> "
file_out.write(str)

How can I speed up this really basic python script for offsetting lines of numbers

I have a simple text file which contains numbers in ASCII text separated by spaces as per this example.
150604849
319865.301865 5810822.964432 -96.425797 -1610
319734.172256 5810916.074753 -52.490280 -122
319730.912949 5810918.098465 -61.864395 -171
319688.240891 5810889.851608 -0.339890 -1790
*<continues like this for millions of lines>*
basically I want to copy the first line as is, then for all following lines I want to offset the first value (x), offset the second value (y), leave the third value unchanged and offset and half the last number.
I've cobbled together the following code as a python learning experience (apologies if it crude and offensive, truly I mean no offence) and it works ok. However the input file I'm using it on is several GB in size and I'm wondering if there's ways to speed up the execution. Currently for a 740 MB file it takes 2 minutes 21 seconds
import glob
#offset values
offsetx = -306000
offsety = -5806000
files = glob.glob('*.pts')
for file in files:
currentFile = open(file, "r")
out = open(file[:-4]+"_RGB_moved.pts", "w")
firstline = str(currentFile.readline())
out.write(str(firstline.split()[0]))
while 1:
lines = currentFile.readlines(100000)
if not lines:
break
for line in lines:
out.write('\n')
words = line.split()
newwords = [str(float(words[0])+offsetx), str(float(words[1])+offsety), str(float(words[2])), str((int(words[3])+2050)/2)]
out.write(" ".join(newwords))
Many thanks
Don't use .readlines(). Use the file directly as an iterator:
for file in files:
with open(file, "r") as currentfile, open(file[:-4]+"_RGB_moved.pts", "w") as out:
firstline = next(currentFile)
out.write(firstline.split(None, 1)[0])
for line in currentfile:
out.write('\n')
words = line.split()
newwords = [str(float(words[0])+offsetx), str(float(words[1])+offsety), words[2], str((int(words[3]) + 2050) / 2)]
out.write(" ".join(newwords))
I also added a few Python best-practices, and you don't need to turn words[2] into a float, then back to a string again.
You could also look into using the csv module, it can handle splitting and rejoining lines in C code:
import csv
for file in files:
with open(file, "rb") as currentfile, open(file[:-4]+"_RGB_moved.pts", "wb") as out:
reader = csv.reader(currentfile, delimiter=' ', quoting=csv.QUOTE_NONE)
writer = csv.writer(out, delimiter=' ', quoting=csv.QUOTE_NONE)
out.writerow(next(reader)[0])
for row in reader:
newrow = [str(float(row[0])+offsetx), str(float(row[1])+offsety), row[2], str((int(row[3]) + 2050) / 2)]
out.writerow(newrow)
Use thé CSV package. It may be more optimized than your script and will simplify your code.

Python- How to Remove Columns from a File

I'd like to remove the first column from a file. The file contains 3 columns separated by space and the columns has the following titles:
X', 'Displacement' and 'Force' (Please see the image).
I have came up with the following code, but to my disappointment it doesn't work!
f = open("datafile.txt", 'w')
for line in f:
line = line.split()
del x[0]
f.close()
Any help is much appreciated !
Esan
First of all, you're attempting to read from a file (by iterating through the file contents) that is open for writing. This will give you an IOError.
Second, there is no variable named x in existence (you have not declared/set one in the script). This will generate a NameError.
Thirdly and finally, once you have finished (correctly) reading and editing the columns in your file, you will need to write the data back into the file.
To avoid loading a (potentially large) file into memory all at once, it is probably a good idea to read from one file (line by line) and write to a new file simultaneously.
Something like this might work:
f = open("datafile.txt", "r")
g = open("datafile_fixed.txt", "w")
for line in f:
if line.strip():
g.write("\t".join(line.split()[1:]) + "\n")
f.close()
g.close()
Some reading about python i/o might be helpful, but something like the following should get you on your feet:
with open("datafile.txt", "r") as fin:
with open("outputfile.txt", "w") as fout:
for line in fin:
line = line.split(' ')
if len(line) == 3:
del line[0]
fout.write(line[0] + ' ' + line[1])
else:
fout.write('\n')
EDIT: fixed to work with blank lines
print ''.join([' '.join(l.split()[1:]) for l in file('datafile.txt')])
or, if you want to preserve spaces and you know that the second column always starts at the, say, 10th character:
print ''.join([l[11:] for l in file('datafile.txt')])

Categories