I have with me a script that converts jsonl files in a selected directory to csv files in another specified location. However, upon converting the files to csv format, the final created csv file contains a .jsonl extension before the .csv (Think file.jsonl.csv) Any ideas on how to remove the .jsonl extension before adding the csv extension at the back? I hope I can be able to get rid of the .jsonl extension for the csv file as it may be confusing in future. Thank you!
Sample CSV file created:
20210531_CCXT_FTX_DOGEPERP.jsonl.csv
My script:
import glob
import json
import csv
import time
start = time.time()
#import pandas as pd
from flatten_json import flatten
#Path of jsonl file
File_path = (r'C:\Users\Natthanon\Documents\Coding 101\Python\JSONL')
#reading all jsonl files
files = [f for f in glob.glob( File_path + "**/*.jsonl", recursive=True)]
i = 0
for f in files:
with open(f, 'r') as F:
#creating csv files
with open(r'C:\Users\Natthanon\Documents\Coding 101\Python\CSV\\' + f.split("\\")[-1] + ".csv", 'w' , newline='') as csv_file:
thewriter = csv.writer(csv_file)
thewriter.writerow(["symbol", "timestamp", "datetime","high","low","bid","bidVolume","ask","askVolume","vwap","open","close","last","previousClose","change","percentage","average","baseVolume","quoteVolume"])
for line in F:
#flatten json files
data = json.loads(line)
data_1 = flatten(data)
#headers should be the Key values from json files that make Column header
thewriter.writerow([data_1['symbol'],data_1['timestamp'],data_1['datetime'],data_1['high'],data_1['low'],data_1['bid'],data_1['bidVolume'],data_1['ask'],data_1['askVolume'],data_1['vwap'],data_1['open'],data_1['close'],data_1['last'],data_1['previousClose'],data_1['change'],data_1['percentage'],data_1['average'],data_1['baseVolume'],data_1['quoteVolume']])
The problem is because you are not getting rid of the extension when writing to the new file, something like this to replace your creation of the csv file should fix it
file_name = f.rsplit("\\", 1)[-1].replace('.jsonl', '')
with open(r'C:\Users\Natthanon\Documents\Coding 101\Python\CSV\\' + file_name + ".csv", 'w' , newline='') as csv_file:
Related
I have a lot of JSON files, I put them in my folder, I want to convert them to CSV format,
Should I use import glob? ? I am a novice, how can I modify my codeļ¼
#-*-coding:utf-8-*-
import csv
import json
import sys
import codecs
def trans(path):
jsonData = codecs.open('C:/Users/jeri/Desktop/1', '*.json', 'r', 'utf-8')
# csvfile = open(path+'.csv', 'w')
# csvfile = open(path+'.csv', 'wb')
csvfile = open('C:/Users/jeri/Desktop/1.csv', 'w', encoding='utf-8', newline='')
writer = csv.writer(csvfile, delimiter=',')
flag = True
for line in jsonData:
dic = json.loads(line)
if flag:
keys = list(dic.keys())
print(keys)
flag = False
writer.writerow(list(dic.values()))
jsonData.close()
csvfile.close()
if __name__ == '__main__':
path=str(sys.argv[0])
print(path)
trans(path)
Yes using glob would be a good way to iterate through the .json files in your folder! But glob doesn't have anything to do with the reading/writing of files. After importing glob, you can use it like this:
for curr_file in glob.glob("*.json"):
# Process each file here
I see that you've used the json module to read in your code snippet. I'd say the better way to go about it is to use pandas.
df = pd.read_json()
I say this because with the pandas library, you can simply convert from .json to .csv using
df.to_csv('file_name.csv')
Combining the three together, it would look like this:
for curr_file in glob.glob("*.json"):
# Process each file here
df = pd.read_json(curr_file)
df.to_csv('file_name.csv')
Also, note that if your json has nested objects, it can't be directly converted to csv, you'll have to settle the organization of data prior to the conversion.
I'm new to Python and the task I am performing is to extract a specific key value from a list of .iris ( which contains the list of nested dictionary format) files in a specific directory.
I wanted to extract the specific value and save it as a new .csv file and repeat it for all other files.
Below is my sample of .iris file from which I should extract only for the these keys ('uid','enabled','login','name').
{"streamType":"user",
"uid":17182,
"enabled":true,
"login":"xyz",
"name":"abcdef",
"comment":"",
"authSms":"",
"email":"",
"phone":"",
"location":"",
"extraLdapOu":"",
"mand":997,
"global":{
"userAccount":"View",
"uid":"",
"retention":"No",
"enabled":"",
"messages":"Change"},
"grants":[{"mand":997,"role":1051,"passOnToSubMand":true}],
I am trying to convert the .iris file to .json and reading the files one by, but unfortunately, I am not getting the exact output as desired.
Please, could anyone help me?
My code (added from comments):
import os
import csv
path = ''
os.chdir(path)
# Read iris File
def read_iris_file(file_path):
with open(file_path, 'r') as f:
print(f.read())
# iterate through all files
for file in os.listdir():
# Check whether file is in iris format or not
if file.endswith(".iris"):
file_path = f"{path}\{file}"
# call read iris file function
print(read_iris_file(file_path))
Your files contain data in JSON format, so we can use built-in json module to parse it. To iterate over files with certain extension you can use pathlib.glob() with next pattern "*.iris". Then we can use csv.DictWriter() and pass "ignore" to extrasaction argument which will make DictWriter ignore keys which we don't need and write only those which we passed to fieldnames argument.
Code:
import csv
import json
from pathlib import Path
path = Path(r"path/to/folder")
keys = "uid", "enabled", "login", "name"
with open(path / "result.csv", "w", newline="") as out_f:
writer = csv.DictWriter(out_f, fieldnames=keys, extrasaction='ignore')
writer.writeheader()
for file in path.glob("*.iris"):
with open(file) as inp_f:
data = json.load(inp_f)
writer.writerow(data)
Try the below (the key point here is loading the iris file using ast)
import ast
fields = ('uid','enabled','login','name')
with open('my.iris') as f1:
data = ast.literal_eval(f1.read())
with open('my.csv','w') as f2:
f2.write(','.join(fields) + '\n')
f2.write(','.join(data[f] for f in fields) + '\n')
my.csv
uid,enabled,login,name
17182,true,xyz,abcdef
I'm currently working on a script that converts a jsonl to csv format. However, upon running the code on visual studio code's terminal, I get the following error:
Traceback (most recent call last):
File "C:\Users\Natthanon\Documents\Coding 101\Python\test.py", line 24, in <module>
with open(r'C:\Users\Natthanon\Documents\Coding 101\Python\CSV', 'a' , newline='') as f:
PermissionError: [Errno 13] Permission denied: 'C:\\Users\\Natthanon\\Documents\\Coding 101\\Python\\CSV'
This is my python script below. If anyone has any clue on why I am receiving the permission error as shown above, do let me know if there are any solutions to this. I'm quite new to Python and I hope someone experienced will be able to help me out with this issue. Thanks!
import glob
import json
import csv
import time
start = time.time()
#import pandas as pd
from flatten_json import flatten
#Path of jsonl file
File_path = (r'C:\Users\Natthanon\Documents\Coding 101\Python\JSONL')
#reading all jsonl files
files = [f for f in glob.glob( File_path + "**/*.jsonl", recursive=True)]
i=0
for f in files:
with open(f, 'r') as F:
for line in F:
#flatten json files
data = json.loads(line)
data_1=flatten(data)
#creating csv files
with open(r'C:\Users\Natthanon\Documents\Coding 101\Python\CSV', 'a' , newline='') as f:
thewriter = csv.writer(f)
#headers should be the Key values from json files that make Coulmn header
thewriter.writerow([data_1['header1'],data_1['header2']])
Seems a duplicated of PermissionError: [Errno 13] in python.
What you are trying to do is to open a directory as a file, which will fail.
Guessing you could try something like: create a new csv on CSV folder for every .jsonl on JSONL.
import glob
import json
import csv
import time
start = time.time()
#import pandas as pd
from flatten_json import flatten
#Path of jsonl file
File_path = (r'C:\Users\Natthanon\Documents\Coding 101\Python\JSONL')
#reading all jsonl files
files = [f for f in glob.glob( File_path + "**/*.jsonl", recursive=True)]
i=0
for f in files:
with open(f, 'r') as F:
for line in F:
#flatten json files
data = json.loads(line)
data_1=flatten(data)
#creating csv files
with open(r'C:\Users\Natthanon\Documents\Coding 101\Python\CSV\\' + f.split("\\")[-1] +".csv", 'a' , newline='') as csv_file:
thewriter = csv.writer(csv_file)
#headers should be the Key values from json files that make Coulmn header
thewriter.writerow([data_1['header1'],data_1['header2']])
On line
with open(r'C:\Users\Natthanon\Documents\Coding 101\Python\CSV\\' + f.split("\\")[-1] +".csv", 'a' , newline='') as csv_file:
you are taking the name of the jsonl file (the split is to get rid of all the path and just to get the filename) and creating on the "CSV" folder a pair file with a .csv extension.
New user here.
I am doing some work on (twitter) json data using python.
I want to read each lines from multiple json files in a directory and copy only the lines i want into a new json file. I want to copy the data having the "created_at" time and discard the data having "deleted" data.
import json
import os
path = ''
filelist = os.listdir(path)
towrite = ''
for filename in filelist:
if filename.endswith(".json"):
with open(path + filename, 'r') as file:
lines = file.readlines()
for line in lines:
try:
if line.startswith('{"created_at":'):
towrite += json.dumps(json.loads(line)) + '\n'
with open('01_00_clean.json', 'w') as file:
file.write(towrite)
except ValueError:
pass
The code would run but wont copy the data into the new file. Can anyone please help me with the program?
I want to take a PDF File as an input. And as an output file I want a csv file to show. So all the textual data which is there in the pdf file should be converted to a csv file. But I am not understanding how would this happen..I need your help at the earliest as I've tried to do but couldn't do it.
what ive done is used a library called Tabula-py which converts pdf to csv file. It does create a csv format but there are no contents being copied to the csv file from the pdf file.
heres the code
from tabula import convert_into,read_pdf
import tabula
df = tabula.read_pdf("crimestory.pdf", spreadsheet=True,
pages='all',output_format="csv")
df.to_csv('crimestoryy.csv', index=False)
the output should come as a csv file where the data is present.
what i am getting is a blank csv file.
I have find answer to this question by my own
To tackle this issue I came up with converting the pdf file into a text file. Then I converted this text file to a csv file.here's my code.
conversion.py
import os.path
import csv
import pdftotext
#Load your PDF
with open("crimestory.pdf", "rb") as f:
pdf = pdftotext.PDF(f)
# Save all text to a txt file.
with open('crimestory.txt', 'w') as f:
f.write("\n\n".join(pdf))
save_path = "/home/mayureshk/PycharmProjects/NLP/"
completeName_in = os.path.join(save_path, 'crimestory' + '.txt')
completeName_out = os.path.join(save_path, 'crimestoryycsv' + '.csv')
file1 = open(completeName_in)
In_text = csv.reader(file1, delimiter=',')
file2 = open(completeName_out, 'w')
out_csv = csv.writer(file2)
file3 = out_csv.writerows(In_text)
file1.close()
file2.close()
Try this, hope it will works
import tabula
# convert PDF into CSV
tabula.convert_into("crimestory.pdf", "crimestory.csv", output_format="csv", pages='all')
or
df = tabula.read_pdf("crimestory.pdf", encoding='utf-8', spreadsheet=True, pages='all')
df.to_csv('crimestory.csv', encoding='utf-8')
or
from tabula import read_pdf
df = read_pdf("crimestory.pdf")
df
#make sure df displays your pdf contents in the output
from tabula import convert_into
convert_into("crimestory.pdf", "crimestory.csv", output_format="csv")
!cat.crimestory.csv