Reading in CSV Files with newline characters imbedded - python

I am currently reading in data from a csv file and inputting tokens and their definitions into a dictionary. The code works fine until it hits a place where the data in the CSV file looks like this:
"Token000\nip address\ntesttestest"
Here is my code so far:
for line in f:
if "Token" in line and re.search("Token\d", line):
commaIndex = line.index(",", line.index("Token"))
csvDict[line[line.index("Token"): commaIndex]] = line[commaIndex + 1: line.index(",", commaIndex + 1)]

Use this:
import csv
data={}
with open('your_file.csv') as csv_file:
reader=csv.reader(csv_file, skipinitialspace=True, quotechar="'")
for row in reader:
data[row[0]]=row[1:]
print(data)
I recommend that you take a look at csv module documentation

Related

How to convert csv file into json in python so that the header of csv are keys of every json value

I have this use case
please create a function called “myfunccsvtojson” that takes in a filename path to a csv file (please refer to attached csv file) and generates a file that contains streamable line delimited JSON.
• Expected filename will be based on the csv filename, i.e. Myfilename.csv will produce Myfilename.json or File2.csv will produce File2.json. Please show this in your code and should not be hardcoded.
• csv file has 10000 lines including the header
• output JSON file should contain 9999 lines
• Sample JSON lines from the csv file below:
CSV:
nconst,primaryName,birthYear,deathYear,primaryProfession,knownForTitles
nm0000001,Fred Astaire,1899,1987,"soundtrack,actor,miscellaneous","tt0072308,tt0043044,tt0050419,tt0053137" nm0000002,Lauren Bacall,1924,2014,"actress,soundtrack","tt0071877,tt0038355,tt0117057,tt0037382" nm0000003,Brigitte Bardot,1934,\N,"actress,soundtrack,producer","tt0057345,tt0059956,tt0049189,tt0054452"
JSON lines:
{"nconst":"nm0000001","primaryName":"Fred Astaire","birthYear":1899,"deathYear":1987,"primaryProfession":"soundtrack,actor,miscellaneous","knownForTitles":"tt0072308,tt0043044,tt0050419,tt0053137"}
{"nconst":"nm0000002","primaryName":"Lauren Bacall","birthYear":1924,"deathYear":2014,"primaryProfession":"actress,soundtrack","knownForTitles":"tt0071877,tt0038355,tt0117057,tt0037382"}
{"nconst":"nm0000003","primaryName":"Brigitte Bardot","birthYear":1934,"deathYear":null,"primaryProfession":"actress,soundtrack,producer","knownForTitles":"tt0057345,tt0059956,tt0049189,tt0054452"}
I am not able to understand is how the header can be inputted as a key to every value of jason.
Has anyone come access this scenario and help me out of it?
What i was trying i know loop is not correct but figuring it out
with open(file_name, encoding = 'utf-8') as file:
csv_data = csv.DictReader(file)
csvreader = csv.reader(file)
# print(csv_data)
keys = next(csvreader)
print (keys)
for i,Value in range(len(keys)), csv_data:
data[keys[i]] = Value
print (data)
You can convert your csv to pandas data frame and output as json:
df = pd.read_csv('data.csv')
df.to_json(orient='records')
import csv
import json
def csv_to_json(csv_file_path, json_file_path):
data_dict = []
with open(csv_file_path, encoding = 'utf-8') as csv_file_handler:
csv_reader = csv.DictReader(csv_file_handler)
for rows in csv_reader:
data_dict.append(rows)
with open(json_file_path, 'w', encoding = 'utf-8') as json_file_handler:
json_file_handler.write(json.dumps(data_dict, indent = 4))
csv_to_json("/home/devendra/Videos/stackoverflow/Names.csv", "/home/devendra/Videos/stackoverflow/Names.json")

Extract strings from text file and write it to Excel

I was trying to write a Python script to extract texts from text file and write it into excel file.
The question is I do not know how to extract the strings next to equal.
I am new to Python, at this stage just manage to open the file.
The data looks like below:
ADD IUACCAREALST: AREA=RA, MCC="510", MNC="28", LAC="0x020a", RAC="0x68", RACRANGE="0x73", SUBRANGE=SPECIAL_IMSI_RANGE, BEGIMSI="511100001243", ENDIMSI="53110100270380", CTRLTYPE=REJECT, CAUSE=ROAMING_NOT_ALLOWED_IN_LA;
ADD IUACCAREALST: AREA=RA, MCC="510", MNC="28", LAC="0x01Fa", RAC="0x67", RACRANGE="0x63", SUBRANGE=SPECIAL_IMSI_RANGE, BEGIMSI="", ENDIMSI="", CTRLTYPE=REJECT, CAUSE=ROAMING_NOT_ALLOWED_IN_LA;
Output should be like below:
#!/usr/bin/python
import csv
import re
fieldnames = ['AREA', 'MCC', 'MNC']
re_fields = re.compile(r'({})\s+=\s(.*)'.format('|'.join(fieldnames)), re.I)
with open('input.txt') as f_input, open('output.csv', 'wb') as f_output:
csv_output = csv.DictWriter(f_output, fieldnames= fieldnames)
csv_output.writeheader()
Your corrected pattern is HERE
I would break text like that into BLOCKS and then find the matches in each block:
import csv
import re
fieldnames = ['AREA', 'MCC', 'MNC']
re_fields = re.compile(r'({})\s*=\s*([^,]+),'.format('|'.join(fieldnames)))
with open(fn) as f_input:
data=f_input.read()
for block in re.finditer(r'(?<=ADD IUACCAREALST:)(.*?)(?=ADD IUACCAREALST:|\Z)', data, flags=re.S | re.M):
print(re_fields.findall(block.group(1)))
Prints:
[('AREA', 'RA'), ('MCC', '"510"'), ('MNC', '"28"')]
[('AREA', 'RA'), ('MCC', '"510"'), ('MNC', '"28"')]
At that point, use each list of tuples to create a dict forming that csv record; write it to the csv file. Done!

Only outputting a few lines into a text file, instead of all of them

I've made a Python script that grabs information from a .csv archive, and outputs it into a text file as a list. The original csv file has over 200,000 fields to input and output from, yet when I run my program it only outputs 36 into the .txt file.
Here's the code:
import csv
with open('OriginalFile.csv', 'r') as csvfile:
emailreader = csv.reader(csvfile)
f = open('text.txt', 'a')
for row in emailreader:
f.write(row[1] + "\n")
And the text file only lists up to 36 strings. How can I fix this? Is maybe the original csv file too big?
After many comments, the original problem was encoding of characters in the csv file. If you specify the encoding in pandas it will read it just fine.
Any time you are dealing with a csv file (or excel, sql or R) I would use Pandas DataFrames for this. The syntax is shorter and easier to know what is going on.
import pandas as pd
csvframe = pd.read_csv('OriginalFile.csv', encoding='utf-8')
with open('text.txt', 'a') as output:
# I think what you wanted was the 2nd column from each row
output.write('\n'.join(csvframe.ix[:,1].values))
# the ix is for index and : is for all the rows and the 1 is only the first column
You might have luck with something like the following:
with open('OriginalFile.csv', 'r') as csvfile:
emailreader = csv.reader(csvfile)
with open('text.txt','w') as output:
for line in emailreader:
output.write(line[1]+'\n')

CSV file to JSON file in Python

I have read quite a lot of posts here and elsewhere, but I can't seem to find the solution. And I do not want to convert it online.
I would like to convert a CSV file to a JSON file (no nesting, even though I might need it in the future) with this code I found here:
import csv
import json
f = open( 'sample.csv', 'r' )
reader = csv.DictReader( f, fieldnames = ( "id","name","lat","lng" ) )
out = json.dumps( [ row for row in reader ] )
print out
Awesome, simple, and it works. But I do not get a .csv file, but a text output that if I copy and paste, is one long line.
I would need a .json that is readable and ideally saved to a .json file.
Is this possible?
To get more readable JSON, try the indent argument in dumps():
print json.dumps(..., indent=4)
However - to look more like the original CSV file, what you probably want is to encode each line separately, and then join them all up using the JSON array syntax:
out = "[\n\t" + ",\n\t".join([json.dumps(row) for row in reader]) + "\n]"
That should give you something like:
[
{"id": 1, "name": "foo", ...},
{"id": 2, "name": "bar", ...},
...
]
If you need help writing the result to a file, try this tutorial.
If you want a more readable format of the JSON file, use it like this:
json.dump(output_value, open('filename','w'), indent=4, sort_keys=False)
Here's a full script. This script uses the comma-separated values of the first line as the keys for the JSON output. The output JSON file will be automatically created or overwritten using the same file name as the input CSV file name just with the .csv file extension replaced with .json.
Example CSV file:
id,longitude,latitude
1,32.774,-124.401
2,32.748,-124.424
4,32.800,-124.427
5,32.771,-124.433
Python script:
csvfile = open('sample.csv', 'r')
jsonfile = open('sample.csv'.replace('.csv', '.json'), 'w')
jsonfile.write('{"' + 'sample.csv'.replace('.csv', '') + '": [\n') # Write JSON parent of data list
fieldnames = csvfile.readline().replace('\n','').split(',') # Get fieldnames from first line of csv
num_lines = sum(1 for line in open('sample.csv')) - 1 # Count total lines in csv minus header row
reader = csv.DictReader(csvfile, fieldnames)
i = 0
for row in reader:
i += 1
json.dump(row, jsonfile)
if i < num_lines:
jsonfile.write(',')
jsonfile.write('\n')
jsonfile.write(']}')

Delete blank rows from CSV?

I have a large csv file in which some rows are entirely blank. How do I use Python to delete all blank rows from the csv?
After all your suggestions, this is what I have so far
import csv
# open input csv for reading
inputCSV = open(r'C:\input.csv', 'rb')
# create output csv for writing
outputCSV = open(r'C:\OUTPUT.csv', 'wb')
# prepare output csv for appending
appendCSV = open(r'C:\OUTPUT.csv', 'ab')
# create reader object
cr = csv.reader(inputCSV, dialect = 'excel')
# create writer object
cw = csv.writer(outputCSV, dialect = 'excel')
# create writer object for append
ca = csv.writer(appendCSV, dialect = 'excel')
# add pre-defined fields
cw.writerow(['FIELD1_','FIELD2_','FIELD3_','FIELD4_'])
# delete existing field names in input CSV
# ???????????????????????????
# loop through input csv, check for blanks, and write all changes to append csv
for row in cr:
if row or any(row) or any(field.strip() for field in row):
ca.writerow(row)
# close files
inputCSV.close()
outputCSV.close()
appendCSV.close()
Is this ok or is there a better way to do this?
Use the csv module:
import csv
...
with open(in_fnam, newline='') as in_file:
with open(out_fnam, 'w', newline='') as out_file:
writer = csv.writer(out_file)
for row in csv.reader(in_file):
if row:
writer.writerow(row)
If you also need to remove rows where all of the fields are empty, change the if row: line to:
if any(row):
And if you also want to treat fields that consist of only whitespace as empty you can replace it with:
if any(field.strip() for field in row):
Note that in Python 2.x and earlier, the csv module expected binary files, and so you'd need to open your files with e 'b' flag. In 3.x, doing this will result in an error.
Surprised that nobody here mentioned pandas. Here is a possible solution.
import pandas as pd
df = pd.read_csv('input.csv')
df.to_csv('output.csv', index=False)
Delete empty row from .csv file using python
import csv
...
with open('demo004.csv') as input, open('demo005.csv', 'w', newline='') as output:
writer = csv.writer(output)
for row in csv.reader(input):
if any(field.strip() for field in row):
writer.writerow(row)
Thankyou
You have to open a second file, write all non blank lines to it, delete the original file and rename the second file to the original name.
EDIT: a real blank line will be like '\n':
for line in f1.readlines():
if line.strip() == '':
continue
f2.write(line)
a line with all blank fields would look like ',,,,,\n'. If you consider this a blank line:
for line in f1.readlines():
if ''.join(line.split(',')).strip() == '':
continue
f2.write(line)
openning, closing, deleting and renaming the files is left as an exercise for you. (hint: import os, help(open), help(os.rename), help(os.unlink))
EDIT2: Laurence Gonsalves brought to my attention that a valid csv file could have blank lines embedded in quoted csv fields, like 1, 'this\n\nis tricky',123.45. In this case the csv module will take care of that for you. I'm sorry Laurence, your answer deserved to be accepted. The csv module will also address the concerns about a line like "","",""\n.
Doing it with pandas is very simple. Open your csv file with pandas:
import pandas as pd
df = pd.read_csv("example.csv")
#checking the number of empty rows in th csv file
print (df.isnull().sum())
#Droping the empty rows
modifiedDF = df.dropna()
#Saving it to the csv file
modifiedDF.to_csv('modifiedExample.csv',index=False)
python code for remove blank line from csv file without create another file.
def ReadWriteconfig_file(file):
try:
file_object = open(file, 'r')
lines = csv.reader(file_object, delimiter=',', quotechar='"')
flag = 0
data=[]
for line in lines:
if line == []:
flag =1
continue
else:
data.append(line)
file_object.close()
if flag ==1: #if blank line is present in file
file_object = open(file, 'w')
for line in data:
str1 = ','.join(line)
file_object.write(str1+"\n")
file_object.close()
except Exception,e:
print e
Here is a solution using pandas that removes blank rows.
import pandas as pd
df = pd.read_csv('input.csv')
df.dropna(axis=0, how='all',inplace=True)
df.to_csv('output.csv', index=False)
I need to do this but not have a blank row written at the end of the CSV file like this code unfortunately does (which is also what Excel does if you Save-> .csv). My (even simpler) code using the CSV module does this too:
import csv
input = open("M51_csv_proc.csv", 'rb')
output = open("dumpFile.csv", 'wb')
writer = csv.writer(output)
for row in csv.reader(input):
writer.writerow(row)
input.close()
output.close()
M51_csv_proc.csv has exactly 125 rows; the program always outputs 126 rows, the last one being blank.
I've been through all these threads any nothing seems to change this behaviour.
In this script all the CR / CRLF are removed from a CSV file then has lines like this:
"My name";mail#mail.com;"This is a comment.
Thanks!"
Execute the script https://github.com/eoconsulting/lr2excelcsv/blob/master/lr2excelcsv.py
Result (in Excel CSV format):
"My name",mail#mail.com,"This is a comment. Thanks!"
Replace the PATH_TO_YOUR_CSV with your
import pandas as pd
df = pd.read_csv('PATH_TO_YOUR_CSV')
new_df = df.dropna()
df.dropna().to_csv('output.csv', index=False)
or in-line:
import pandas as pd
pd.read_csv('data.csv').dropna().to_csv('output.csv', index=False)
I had the same, problem.
I converted the .csv file to a dataframe and after that I converted the dataframe back to the .csv file.
The initial .csv file with the blank lines was the 'csv_file_logger2.csv' .
So, i do the following process
import csv
import pandas as pd
df=pd.read_csv('csv_file_logger2.csv')
df.to_csv('out2.csv',index = False)

Categories