csv.reader without open() function - python

I want to read a CSV file without using the open() function.
file.txt
'xxr'|'wer'|'xxr'|'xxr'
'xxt'|'dse'|'xxt'|'xxt'
'xxv'|'cad'|'xxv'|'xxv'
'xxe'|'sdf'|'xxe'|'xxe'
'xxw'|'sder'|'xxw'|'xxw'
'xxz'|'csd'| 'xxz'| 'xxz'
I've tried this, but this doesn't open a file. Just use 'file.txt' as a string.
file = ('file.txt')
reader = csv.reader(file,delimiter="|")
mylist = list(reader)
I cannot use the regular with open('file.txt', 'r') ....
Reason: The customer is sending this data pipeline to a platform that doesn't support the open() function, due to directory function restrictions (not a permissions issue).
I also cannot read as Dataframe, because they are unstructured lists, and this template is much simpler.
This is a conversion from a python script to Data Dream, with spark. Kind of odd... but they can reproduce pandas and numpy. They can't use Function open() and with.
Any ideas?

You could use fileinput although I'm unsure of how the module deals with opening the files and if it is any different than the open function but it does allow for multiple files to be opened in order using one stream and it seems to allow for more flexibility in how the file is read:
import fileinput
with fileinput.input('file.txt') as f:
reader = csv.reader(file,delimiter="|")
mylist = list(reader)

There is nothing wrong with:
reader = csv.reader(open(file),delimiter="|")
Or with pandas:
import pandas as pd
mylist = pd.read_csv(file, sep="|").to_numpy().tolist()

Related

Need a push to start with a function about text files, I can't figure this out on my own

I don't need the entire code but I want a push to help me on the way, I've been searching on the internet for clues on how to start to write a function like this but I haven't gotten any further then just the name of the function.
So I haven't got the slightest clue on how to start with this, I don't know how to work with text files. Any tips?
These text files are CSV (Comma Separated Values). It is a simple file format used to store tabular data.
You may explore Python's inbuilt module called csv.
Following code snippet an example to load .csv file in Python:
import csv
filename = 'us_population.csv'
with open(filename, 'r') as csvfile:
csvreader = csv.reader(csvfile)

Access values outside with-block

Is there a way, in the code below, to access the variable utterances_dict outside of the with-block? The code below obviously returns the error: ValueError: I/O operation on closed file.
from csv import DictReader
utterances_dict = {}
utterance_file = 'toy_utterances.csv'
with open(utterance_file, 'r') as utt_f:
utterances_dict = DictReader(utt_f)
for line in utterances_dict:
print(line)
I am not an expert on DictReader implementation, however their documentation leaves the implementation open to the reader itself parsing the file after construction. Meaning it may be possible that the underlying file has to remain open until you are done using it. In this case, it would be problematic to attempt to use the utterances_dict outside of the with block because the underlying file will be closed by then.
Even if the current implementation of DictReader does in fact parse the whole csv on construction, it doesn't mean their implementation won't change in the future.
DictReader returns a view of the csv file.
Convert the result to a list of dictionaries.
from csv import DictReader
utterances = []
utterance_file = 'toy_utterances.csv'
with open(utterance_file, 'r') as utt_f:
utterances = [dict(row) for row in DictReader(utt_f) ]
for line in utterances:
print(line)

How to parse jsonlines file using pandas

I am new to python and trying to parse data from a file that contains millions of lines. Tried to go old school to parse it using excel but it fails. How can I parse the information efficiently and export them into an excel file so that it is easier for other people to read?
I tried using this code provided by someone else but no luck so far
import re
import pandas as pd
def clean_data(filename):
with open(filename, "r") as inputfile:
for row in inputfile:
if re.match("\[", row) is None:
yield row
with open(clean_file, 'w') as outputfile:
for row in clean_data(filename):
outputfile.write(row)
NameError: name 'clean_file' is not defined
It looks like clean_file is not defined, which is probably a problem from copy/pasteing code.
Did you mean to write to a file called "clean_file"? In which case you need to wrap it in quotes: with open("clean_file", 'w')
If you want to work with json I sugget looking into the json package which has lots of tools for loading and parsing json. Otherwise, if the json is flat, you can just use the inbuilt pandas function read_json

Python subprocess can't find the output of csv writer

I'm ripping some data from Mongo, sanitizing it via Python, and writing it to text file to import to Vertica. Vertica can't parse the python-written gzip (no idea why), so I'm trying to write the data to a csv and use bash to gzip the file instead.
csv_filename = '/home/deploy/tablecopy/{0}.csv'.format(vertica_table)
with open(csv_filename, 'wb') as csv_file:
csv_writer = csv.writer(csv_file, delimiter=',')
for replacement in mongo_object.find():
replacement_id = clean_value(replacement, "_id")
csv_writer.writerow([replacement_id, booking_id, style, added_ts])
subprocess.call(['gzip', 'file', csv_filename])
When I run this code, I get "gzip: file: No such file or directory," despite the fact that 1) the file is getting created immediately beforehand and 2) there's already a copy of the csv in the directory prior to the run, since this is a script that gets run repeatedly.
These points make me think that python is tying up the file somehow and bash can't see/access it. Any ideas on how to get this conversion to run?
Thanks
Just pass the csv_filename, gzip is looking for a file called "file" which does not exists so it errors not the csv_filename file:
subprocess.call(['gzip', csv_filename])
There is no file argument for gzip, you simply need to pass the filename.
You've already got the correct answer to your problem.... but alternately, you can use the gzip module to compress as you write so there is no need to call the gzip program at all. This example assumes you use python 3.x and you just have ascii text.
import gzip
csv_filename = '/home/deploy/tablecopy/{0}.csv'.format(vertica_table)
with gzip.open(csv_filename + '.gz', 'wt', encoding='ascii', newline='') as csv_file:
csv_writer = csv.writer(csv_file, delimiter=',')
for replacement in mongo_object.find():
replacement_id = clean_value(replacement, "_id")
csv_writer.writerow([replacement_id, booking_id, style, added_ts])

using CSV module with readline()

Yesterday I posted the below link:
Python CSV Module read and write simultaneously
Several people suggested that I "If file b is not extremely large I would suggest using readlines() to get a list of all lines and then iterate over the list and change lines as needed."
I want to still be able to use the functionality of the CSV Module but do what they have suggested. I am new to python and still don't quite undertand how I should do this.
Could someone please provide me with an example of how I should do this.
Here is a sample that reads a CSV file using a DictReader and uses a DictWriter to write to stdout. The file has a column named PERCENT_CORRECT_FLAG, and this modifies the CSV file to set this field to 0.
#!/usr/bin/env python
from __future__ import with_statement
from __future__ import print_function
from csv import DictReader, DictWriter
import sys
def modify_csv(filename):
with open(filename) as f:
reader = DictReader(f)
writer = DictWriter(sys.stdout, fieldnames=reader.fieldnames)
for i, s in enumerate(writer.fieldnames):
print(i, s, file=sys.stdout)
for row in reader:
row['PERCENT_CORRECT_FLAG'] = '0'
writer.writerow(row)
if __name__ == '__main__':
for filename in sys.argv[1:]:
modify_csv(filename)
If you do not want to write to stdout, you can open another file for write and use that. Note that if you want to write back to the original file, you have to either:
Read the file into memory and close the file before opening for write or
Open a file with a different name for write and rename it after closing it.

Categories