Python convert JSON to CSV ignoring "\n" - python

for my university project i have to collect some data from github using the API. I save the result of my api call into a json file and after that i have to convert the json file into a csv file.
i use the following code to conver the json file to a csv:
with open ("data.json", "r") as f:
data = json.load(f)
with open('data.csv', 'w') as f:
fieldnames = data[0].keys()
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
for res in range(len(data)):
writer.writerow(data[res])
My problem is that in the json file i have some key/value pair as i follow:
"title" : "Hello \n World"
The "\n" is taken as newline i think because it will split the row of my csv file. How solve this problem? Anyway to make my code to ignore the "\n"?
bad output
output that i want

Did you check the string.replace() method like mystring.replace('\n', ' ')?

pandas can handle this:
import pandas as pd
df = pd.read_json('data.json')
df.to_csv('data.csv')
Or since you are opening the file in Excel you could write to xlsx directly:
df.to_excel('data.xlsx')
If you still wish to remove the newlines you can use any of these solutions prior to saving the dataframe.

Related

Saving output to a csv file

Trying to save the output to a csv file. Below prints the information to the screen fine but when I try to save it to a csv or text file, I get one letter at a time. Trying to understand why.
data = json.loads(response.text)
info = data['adapterInstancesInfoDto']
for x in range(len(info)):
val = info[x]['resourceKey']['name']
print(val)
Tried writing to a csv and text file same issue. Tried Pandas same result. I am thinking I need to convert it into a tuple or diction to save to a csv file.
Use the built-in module csv for working with csv files:
Here's a example for writing to the file:
import csv
with open('filename.csv', 'w', newline='') as file:
writer = csv.writer(file)
writer.writerow(["SNo", "Name"])
writer.writerow([1, "Python"])
writer.writerow([2, "Csv"])

Store Json data to JSON file and save them in the CSV file

I tried this way but did not work
with open("data.json", "a", encoding='utf-8') as f:
json.dump(data, f,ensure_ascii=False, indent=4 )
But this problem occurs
#2
I want to convert from json to CSV
An example of what I want
Please tell me if this is possible
Both can be done with pandas
To store json data in a .json file, use pandas.DataFrame.to_json
To save json data in a .csv file, first use pandas.read_json to read the data into a dataframe and then use pandas.DataFrame.to_csv
You can't append JSON files together into a new JSON, because of the nature of the JSON format.
Instead of writing each object individually to the JSON file, you should collect all of the objects into a list, and write the list to the JSON file:
lst = []
for data in ...:
lst.append(data)
with open("data.json", "w", encoding='utf-8') as f:
# ^ notice "a" was changed to "w" here
json.dump(lst, f, ensure_ascii=False, indent=4)

Convert .xlsx to .txt with python? or format .txt file to fix columns indentation?

I have an excel file with many rows/columns and when I convert the file directly from .xlsx to .txt with excel, the file ends up with a weird indentation (the columns are not perfectly aligned like in an excel file) and due to some requirements, I really need them to be.
So, is there a better way to write from excel to txt using python? or format the txt file so the columns perfectly align?
I found this code in a previous question but I am getting the following error:
TypeError: a bytes-like object is required, not 'str'
Code:
import xlrd
import csv
# open the output csv
with open('my.csv', 'wb') as myCsvfile:
# define a writer
wr = csv.writer(myCsvfile, delimiter="\t")
# open the xlsx file
myfile = xlrd.open_workbook('myfile.xlsx')
# get a sheet
mysheet = myfile.sheet_by_index(0)
# write the rows
for rownum in range(mysheet.nrows):
wr.writerow(mysheet.row_values(rownum))
is there a better way to write from excel to txt using python?
I'm not sure if it's a better way, but you could write the contents of xlsx file to txt this way:
import pandas as pd
with open('test.txt', 'w') as file:
pd.read_excel('test.xlsx').to_string(file, index=False)
Edit:
to convert date column to a desired format, you could try the following:
with open('test.txt', 'w') as file:
df = pd.read_excel('test.xlsx')
df['date'] = pd.to_datetime(df['date']).dt.strftime('%Y%m%d')
df.to_string(file, index=False, na_rep='')
The problem lies in this row:
with open('my.csv', 'wb') as myCsvfile:
'wb' suggests you will be writing bytes, but in reality, you will be writing regular characters. Change it to 'w'. Perhaps the best practice would be to also use with block for Excel file:
import xlrd
import csv
# open the output csv
with open('my.csv', 'w') as myCsvfile:
# define a writer
wr = csv.writer(myCsvfile, delimiter="\t")
# open the xlsx file
with xlrd.open_workbook('myfile.xlsx') as myXlsxfile:
# get a sheet
mysheet = myXlsxfile.sheet_by_index(0)
# write the rows
for rownum in range(mysheet.nrows):
wr.writerow(mysheet.row_values(rownum))
import pandas as pd
read_file = pd.read_excel (r'your excel file name.xlsx', sheet_name='your sheet name')
read_file.to_csv (r'Path to store the txt file\File name.txt', index = None, header=True)

Extracting Rows of Data from a CSV-like File Using Python

I have a large file from a proprietary archive format. Unzipping this archive gives a file that has no extension, but the data inside is comma-delimited. Adding a .csv extension or simply opening the file with Excel will work.
I have about 375-400 of these files, and I'm trying to extract a chunk of rows (about 13,500 out of 1.2M+ rows) between a keyword "Point A" and another keyword "Point B".
I found some code on this site that I think is extracting the data correctly, but I'm getting an error:
AttributeError: 'list' object has no attribute 'rows'
when trying to save out the file. Can somebody help me get this data to save into a csv?
import re
import csv
import time
print(time.ctime())
file = open('C:/Users/User/Desktop/File with No Extension That\'s Very Similar to CSV', 'r')
data = file.read()
x = re.findall(r'Point A(.*?)Point B', data,re.DOTALL)
name = "C:/Users/User/Desktop/testoutput.csv"
with open(name, 'w', newline='') as file2:
savefile = csv.writer(file2)
for i in x.rows:
savefile.writerow([cell.value for cell in i])
print(time.ctime())
Thanks in advance, any help would be much appreciated.
The following should work nicely. As mentioned, your regex usage was almost correct. It is possible to still use the Python CSV library to do the CSV processing by converting the found text into a StringIO object and passing that to the CSV reader:
import re
import csv
import time
import StringIO
print(time.ctime())
input_name = "C:/Users/User/Desktop/File with No Extension That's Very Similar to CSV"
output_name = "C:/Users/User/Desktop/testoutput.csv"
with open(input_name, 'r') as f_input, open(output_name, 'wb') as f_output:
# Read whole file in
all_input = f_input.read()
# Extract interesting lines
ab_input = re.findall(r'Point A(.*?)Point B', all_input, re.DOTALL)[0]
# Convert into a file object and parse using the CSV reader
fab_input = StringIO.StringIO(ab_input)
csv_input = csv.reader(fab_input)
csv_output = csv.writer(f_output)
# Iterate a row at a time from the input
for input_row in csv_input:
# Skip any empty rows
if input_row:
# Write row at a time to the output
csv_output.writerow(input_row)
print(time.ctime())
You have not given us an example from your CSV file, so if there are problems, you might need to configure the CSV 'dialect' to process it better.
Tested using Python 2.7
You have 2 problems here: the first is related to the regular expression and the other is about the list syntax.
Getting what you want
The way you are using the regular expression will return to you a list with a single value (all lines into an unique string).
Probably there is a better way of doing this but I would go now with something like this:
with open('bla', 'r') as input:
data = input.read()
x = re.findall(r'Point A(.*?)Point B', data, re.DOTALL)[0]
x = x.splitlines(False)[1:]
That's not pretty but will return a list with all values between those two points.
Working with lists
There is no rows attribute inside lists. You just have to iterate over it:
for i in x:
do what you have to do
See, I'm not familiar to the csv library but it looks that you will have to perform some manipulations to the i value before adding it to the library.
IMHO, I would avoid using CSV format since it is kind of "locale dependent" so it may not work as expected depending the settings your end-users may have on OS.
Updating the code so that #Martin Evans answer works on the latest Python version.
import re
import csv
import time
import io
print(time.ctime())
input_name = "C:/Users/User/Desktop/File with No Extension That's Very Similar to CSV"
output_name = "C:/Users/User/Desktop/testoutput.csv"
with open(input_name, 'r') as f_input, open(output_name, 'wt') as f_output:
# Read whole file in
all_input = f_input.read()
# Extract interesting lines
ab_input = re.findall(r'Point A(.*?)Point B', all_input, re.DOTALL)[0]
# Convert into a file object and parse using the CSV reader
fab_input = io.StringIO(ab_input)
csv_input = csv.reader(fab_input)
csv_output = csv.writer(f_output)
# Iterate a row at a time from the input
for input_row in csv_input:
# Skip any empty rows
if input_row:
# Write row at a time to the output
csv_output.writerow(input_row)
print(time.ctime())
Also, by using 'wt' instead of 'wb' one can avoid
"TypeError: a bytes-like object is required, not 'str'"

How to add a header to a csv file in Python?

I've tried many solutions to add a header to my csv file, but nothing's working properly. Here they are :
I used the writerow method, but my data are overwriting the first row.
I used the DictWriter method, but I don't know how to fill it correctly. Here is my code:
csv = csv.DictWriter(open(directory +'/csv.csv', 'wt'), fieldnames = ["stuff1", "stuff2", "stuff3"], delimiter = ';')
csv.writeheader(["stuff1", "stuff2", "stuff3"])
I got a "2 arguments instead of one" error and I really don't know why.
Any advice?
All you need to do is call DictWriter.writeheader() without arguments:
with open(os.path.join(directory, 'csv.csv'), 'wb') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames = ["stuff1", "stuff2", "stuff3"], delimiter = ';')
writer.writeheader()
You already told DictWriter() what your headers are.
I encountered a similar problem when writing the CSV file.
I had to read the csv file and modify some of the fields in it.
To write the header in the CSV file, i used the following code:
reader = csv.DictReader(open(infile))
headers = reader.fieldnames
with open('ChartData.csv', 'wb') as outcsv:
writer1 = csv.writer(outcsv)
writer1.writerow(headers)
and when you write the data rows, you can use a DictWriter in the following way
writer = csv.DictWriter(open("ChartData.csv", 'a' ), headers)
In the above code "a" stands for appending.
In conclusion - > use a to append data to csv after you have written your header to the same file

Categories