I have a text file with data of 6000 records in this format
{"id":"1001","user":"AB1001","first_name":"David ","name":"Shai","amount":"100","email":"me#no.mail","phone":"9999444"}
{"id":"1002","user":"AB1002","first_name":"jone ","name":"Miraai","amount":"500","email":"some1#no.mail","phone":"98894004"}
I want to export all data to excel file as shown bellow example
I would recommend reading in the text file, then converting to a dictionary with json, and using pandas to save a .csv file that can be opened with excel.
In the example below, I copied your text into a text file, called "myfile.txt", and I saved the data as "myfile2.csv".
import pandas as pd
import json
# read lines of text file
with open('myfile.txt') as f:
lines=f.readlines()
# remove empty lines
lines2 = [line for line in lines if not(line == "\n")]
# convert to dictionaries
dicts = [json.loads(line) for line in lines2]
# save to .csv
pd.DataFrame(dicts ).to_csv("myfile2.csv", index = False)
You can use VBA and a json-parser
Your two lines are not a valid JSON. However, it is easy to convert it to a valid JSON as shown in the code below. Then it is a relatively simple matter to parse it and write it to a worksheet.
The code assumes no blank lines in your text file, but it is easy to fix if that is not the case.
Using your data on two separate lines in a windows text file (if not windows, you may have to change the replacement of the newline token with a comma depending on what the generating system uses for newline.
I used the JSON Converter by Tim Hall
'Set reference to Microsoft Scripting Runtime or
' use late binding
Option Explicit
Sub parseData()
Dim JSON As Object
Dim strJSON As String
Dim FSO As FileSystemObject, TS As TextStream
Dim I As Long, J As Long
Dim vRes As Variant, v As Variant, O As Object
Dim wsRes As Worksheet, rRes As Range
Set FSO = New FileSystemObject
Set TS = FSO.OpenTextFile("D:\Users\Ron\Desktop\New Text Document.txt", ForReading, False, TristateUseDefault)
'Convert to valid JSON
strJSON = "[" & TS.ReadAll & "]"
strJSON = Replace(strJSON, vbLf, ",")
Set JSON = parsejson(strJSON)
ReDim vRes(0 To JSON.Count, 1 To JSON(1).Count)
'Header row
J = 0
For Each v In JSON(1).Keys
J = J + 1
vRes(0, J) = v
Next v
'populate the data
I = 0
For Each O In JSON
I = I + 1
J = 0
For Each v In O.Keys
J = J + 1
vRes(I, J) = O(v)
Next v
Next O
'write to a worksheet
Set wsRes = Worksheets("sheet6")
Set rRes = wsRes.Cells(1, 1)
Set rRes = rRes.Resize(UBound(vRes, 1) + 1, UBound(vRes, 2))
Application.ScreenUpdating = False
With rRes
.EntireColumn.Clear
.Value = vRes
.Style = "Output"
.EntireColumn.AutoFit
End With
End Sub
Results from your posted data
Try using the pandas module in conjunction with the eval() function:
import pandas as pd
with open('textfile.txt', 'r') as f:
data = f.readlines()
df = pd.DataFrame(data=[eval(i) for i in data])
df.to_excel('filename.xlsx', index=False)
Related
i've done got my outputs for the csv file, but i dont know how to write it into csv file because output result is numpy array
def find_mode(np_array) :
vals,counts = np.unique(np_array, return_counts=True)
index = np.argmax(counts)
return(vals[index])
folder = ("C:/Users/ROG FLOW/Desktop/Untuk SIDANG TA/Sudah Aman/testbikincsv/folderdatacitra/*.jpg")
for file in glob.glob(folder):
a = cv2.imread(file)
rows = a.shape[0]
cols = a.shape[1]
middlex = cols/2
middley = rows/2
middle = [middlex,middley]
titikawalx = middlex - 10
titikawaly = middley - 10
titikakhirx = middlex + 10
titikakhiry = middley + 10
crop = a[int(titikawaly):int(titikakhiry), int(titikawalx):int(titikakhirx)]
c = cv2.cvtColor(crop, cv2.COLOR_BGR2HSV)
H,S,V = cv2.split(c)
hsv_split = np.concatenate((H,S,V),axis=1)
Modus_citra = (find_mode(H)) #how to put this in csv
my outputs is modus citra which is array np.uint8, im trying to put it on csv file but im still confused how to write it into csv because the result in loop.
can someone help me how to write it into csv file ? i appreciate every help
Run your loop, and put the data into lists
eg. mydata = [result1,result2,result3]
Then use csv.writerows(mydata) to write your list into csv rows
https://docs.python.org/3/library/csv.html#csv.csvwriter.writerows
You can save your NumPy arrays to CSV filesĀ using the savetxt() function. This function takes a filename and array as arguments and saves the array into CSV format. You must also specify the delimiter; this is the character used to separate each variable in the file, most commonly a comma. For example:
import numpy as np
my_array = np.array([1,2,3,4,5,6,7,8,9,10])
my_file = np.savetxt('randomtext.csv', my_array, delimiter = ',', fmt = '%d')
print(my_file)
So, I am quite new to python and have been googling a lot but have not found a good solution. What I am looking to do is automate text to columns using python in an excel document without headers.
Here is the excel sheet I have
it is a CSV file where all the data is in one column without headers
ex. hi ho loe time jobs barber
jim joan hello
009 00487 08234 0240 2.0348 20.34829
delimeter is space and comma
What I want to come out is saved in another excel with the first two rows deleted and seperated into columns
( this can be done using text to column in excel but i would like to automate this for several excel sheets)
009 | 00487 | 08234 | 0240 | 2.0348 | 20.34829
the code i have written so far is like this:
import pandas as pd
import csv
path = 'C:/Users/ionan/OneDrive - Universiteit Utrecht/Desktop/UCU/test_excel'
os.chdir(path)
for root, dirs, files in os.walk(path):
for f in files:
df = pd.read_csv(f, delimiter='\t' + ';', engine = 'python')
Original file with name as data.xlsx:
This means all the data we need is under the column Data.
Code to split data into multiple columns for a single file:
import pandas as pd
import numpy as np
f = 'data.xlsx'
# -- Insert the following code in your `for f in files` loop --
file_data = pd.read_excel(f)
# Since number of values to be split is not known, set the value of `num_cols` to
# number of columns you expect in the modified excel file
num_cols = 20
# Create a dataframe with twenty columns
new_file = pd.DataFrame(columns = ["col_{}".format(i) for i in range(num_cols)])
# Change the column name of the first column in new_file to "Data"
new_file = new_file.rename(columns = {"col_0": file_data.columns[0]})
# Add the value of the first cell in the original file to the first cell of the
# new excel file
new_file.loc[0, new_file.columns[0]] = file_data.iloc[0, 0]
# Loop through all rows of original excel file
for index, row in file_data.iterrows():
# Skip the first row
if index == 0:
continue
# Split the row by `space`. This gives us a list of strings.
split_data = file_data.loc[index, "Data"].split(" ")
print(split_data)
# Convert each element to a float (a number) if we want numbers and not strings
# split_data = [float(i) for i in split_data]
# Make sure the size of the list matches to the number of columns in the `new_file`
# np.NaN represents no value.
split_data = [np.NaN] + split_data + [np.NaN] * (num_cols - len(split_data) - 1)
# Store the list at a given index using `.loc` method
new_file.loc[index] = split_data
# Drop all the columns where there is not a single number
new_file.dropna(axis=1, how='all', inplace=True)
# Get the original excel file name
new_file_name = f.split(".")[0]
# Save the new excel file at the same location where the original file is.
new_file.to_excel(new_file_name + "_modified.xlsx", index=False)
This creates a new excel file (with a single sheet) of name data_modified.xlsx:
Summary (code without comments):
import pandas as pd
import numpy as np
f = 'data.xlsx'
file_data = pd.read_excel(f)
num_cols = 20
new_file = pd.DataFrame(columns = ["col_{}".format(i) for i in range(num_cols)])
new_file = new_file.rename(columns = {"col_0": file_data.columns[0]})
new_file.loc[0, new_file.columns[0]] = file_data.iloc[0, 0]
for index, row in file_data.iterrows():
if index == 0:
continue
split_data = file_data.loc[index, "Data"].split(" ")
split_data = [np.NaN] + split_data + [np.NaN] * (num_cols - len(split_data) - 1)
new_file.loc[index] = split_data
new_file.dropna(axis=1, how='all', inplace=True)
new_file_name = f.split(".")[0]
new_file.to_excel(new_file_name + "_modified.xlsx", index=False)
I have a file which I read in as a string. In sublime the file looks like this:
Filename
Dataset
Level
Duration
Accuracy
Speed Ratio
Completed
file_001.mp3
datasetname_here
value
00:09:29
0.00%
7.36x
2019-07-18
file_002.mp3
datasetname_here
value
00:22:01
...etc.
in Bash:
['Filename\n', 'Dataset\n', 'Level\n', 'Duration\n', 'Accuracy\n', 'Speed Ratio\n', 'Completed\n', 'file_001.mp3\n', 'datasetname_here\n', 'value\n', '00:09:29\n', '0.00%\n', '7.36x\n', '2019-07-18\n', 'file_002.mp3\n', 'datasetname_here\n', 'L1\n', '00:20:01\n', ...etc.
I want to split this into a 7 column csv. As you can see, the values repeat every 8th line. I know I can use a for loop and modulus to read each line. I have done this successfully before.
How can I use pandas to read things into columns?
I don't know how to approach the Pandas library. I have looked at other examples and all seem to start with csv.
import sys
parser = argparse.ArgumentParser()
parser.add_argument('file' , help = "this is the file you want to open")
args = parser.parse_args()
print("file name:" , args.file)
with open(args.file , 'r') as word:
print(word.readlines()) ###here is where i was making sure it read in properly
###here is where I will start to manipulate the data
This is the Bash output:
['Filename\n', 'Dataset\n', 'Level\n', 'Duration\n', 'Accuracy\n', 'Speed Ratio\n', 'Completed\n', 'file_001.mp3\n', 'datasetname_here\n', 'value\n', '00:09:29\n', '0.00%\n', '7.36x\n', '2019-07-18\n', 'file_002.mp3\n', 'datasetname_here\n', 'L1\n', '00:20:01\n', ...]
First remove '\n':
raw_data = ['Filename\n', 'Dataset\n', 'Level\n', 'Duration\n', 'Accuracy\n', 'Speed Ratio\n', 'Completed\n', 'file_001.mp3\n', 'datasetname_here\n', 'value\n', '00:09:29\n', '0.00%\n', '7.36x\n', '2019-07-18\n', 'file_002.mp3\n', 'datasetname_here\n', 'L1\n', '00:20:01\n', '0.01%\n', '7.39x\n', '2019-07-20\n']
raw_data = [string.replace('\n', '') for string in raw_data]
Then pack your data in 7-length arrays inside a big array:
data = [raw_data[x:x+7] for x in range(0, len(raw_data),7)]
Finally read your data as a DataFrame, the first row contains the name of the columns:
df = pd.DataFrame(data[1:], columns=data[0])
print(df.to_string())
Filename Dataset Level Duration Accuracy Speed Ratio Completed
0 file_001.mp3 datasetname_here value 00:09:29 0.00% 7.36x 2019-07-18
1 file_002.mp3 datasetname_here L1 00:20:01 0.01% 7.39x 2019-07-20
Try This
import numpy as np
import pandas as pd
with open ("data.txt") as f:
list_str = f.readlines()
list_str = map(lambda s: s.strip(), list_str) #Remove \n
n=7
list_str = [list_str[k:k+n] for k in range(0, len(list_str), n)]
df = pd.DataFrame(list_str[1:])
df.columns = list_str[0]
df.to_csv("Data_generated.csv",index=False)
Pandas is not a library to read into columns. It supports many formats to read and write (One of them is comma separated values) and mainly used as python based data analysis tool.
Best place to learn is see their documentation and practice.
Output of above code
I think you don't have to use pandas or any other library. My approach:
data = []
row = []
with open(args.file , 'r') as file:
for line in file:
row.append(line)
if len(row) == 7:
data.append(row)
row = []
How does it work?
The for loop reads the file line by line.
Add the line to row
When row's length is 7, it's completed and you can add the row to data
Create a new list for row
Repeat
I'm pretty new to python and coding in general, so sorry in advance for any dumb questions. My program needs to split an existing log file into several *.csv files (run1,.csv, run2.csv, ...) based on the keyword 'MYLOG'. If the keyword appears it should start copying the two desired columns into the new file till the keyword appears again. When finished there need to be as many csv files as there are keywords.
53.2436 EXP MYLOG: START RUN specs/run03_block_order.csv
53.2589 EXP TextStim: autoDraw = None
53.2589 EXP TextStim: autoDraw = None
55.2257 DATA Keypress: t
57.2412 DATA Keypress: t
59.2406 DATA Keypress: t
61.2400 DATA Keypress: t
63.2393 DATA Keypress: t
...
89.2314 EXP MYLOG: START BLOCK scene [specs/run03_block01.csv]
89.2336 EXP Imported specs/run03_block01.csv as conditions
89.2339 EXP Created sequence: sequential, trialTypes=9
...
[EDIT]: The output per file (run*.csv) should look like this:
onset type
53.2436 EXP
53.2589 EXP
53.2589 EXP
55.2257 DATA
57.2412 DATA
59.2406 DATA
61.2400 DATA
...
The program creates as much run*.csv as needed, but i can't store the desired columns in my new files. When finished, all I get are empty csv files. If I shift the counter variable to == 1 it creates just one big file with the desired columns.
Thanks again!
import csv
QUERY = 'MYLOG'
with open('localizer.log', 'rt') as log_input:
i = 0
for line in log_input:
if QUERY in line:
i = i + 1
with open('run' + str(i) + '.csv', 'w') as output:
reader = csv.reader(log_input, delimiter = ' ')
writer = csv.writer(output)
content_column_A = [0]
content_column_B = [1]
for row in reader:
content_A = list(row[j] for j in content_column_A)
content_B = list(row[k] for k in content_column_B)
writer.writerow(content_A)
writer.writerow(content_B)
Looking at the code there's a few things that are possibly wrong:
the csv reader should take a file handler, not a single line.
the reader delimiter should not be a single space character as it looks like the actual delimiter in your logs is a variable number of multiple space characters.
the looping logic seems to be a bit off, confusing files/lines/rows a bit.
You may be looking at something like the code below (pending clarification in the question):
import csv
NEW_LOG_DELIMITER = 'MYLOG'
def write_buffer(_index, buffer):
"""
This function takes an index and a buffer.
The buffer is just an iterable of iterables (ex a list of lists)
Each buffer item is a row of values.
"""
filename = 'run{}.csv'.format(_index)
with open(filename, 'w') as output:
writer = csv.writer(output)
writer.writerow(['onset', 'type']) # adding the heading
writer.writerows(buffer)
current_buffer = []
_index = 1
with open('localizer.log', 'rt') as log_input:
for line in log_input:
# will deal ok with multi-space as long as
# you don't care about the last column
fields = line.split()[:2]
if not NEW_LOG_DELIMITER in line or not current_buffer:
# If it's the first line (the current_buffer is empty)
# or the line does NOT contain "MYLOG" then
# collect it until it's time to write it to file.
current_buffer.append(fields)
else:
write_buffer(_index, current_buffer)
_index += 1
current_buffer = [fields] # EDIT: fixed bug, new buffer should not be empty
if current_buffer:
# We are now out of the loop,
# if there's an unwritten buffer then write it to file.
write_buffer(_index, current_buffer)
You can use pandas to simplify this problem.
Import pandas and read in log file.
import pandas as pd
df = pd.read_fwf('localizer2.log', header=None)
df.columns = ['onset', 'type', 'event']
df.set_index('onset', inplace=True)
Set Flag where third column == 'MYLOG'
df['flag'] = 0
df.loc[df.event.str[:5] == 'MYLOG', 'flag'] = 1
df.flag = df['flag'].cumsum()
Save each run as a separate run*.csv file
for i in range(1, df.flag.max()+1):
df.loc[df.flag == i, 'event'].to_csv('run{0}.csv'.format(i))
EDIT:
Looks like your format is different than I originally assumed. Changed to use pd.read_fwf. my localizer.log file was a copy and paste of your original data, hope this works for you. I assumed by the original post that it did not have headers. If it does have headers then remove header=None and df.columns = ['onset', 'type', 'event'].
I am trying to read in an Excel file using xlrd, and I am wondering if there is a way to ignore the cell formatting used in Excel file, and just import all data as text?
Here is the code I am using for far:
import xlrd
xls_file = 'xltest.xls'
xls_workbook = xlrd.open_workbook(xls_file)
xls_sheet = xls_workbook.sheet_by_index(0)
raw_data = [['']*xls_sheet.ncols for _ in range(xls_sheet.nrows)]
raw_str = ''
feild_delim = ','
text_delim = '"'
for rnum in range(xls_sheet.nrows):
for cnum in range(xls_sheet.ncols):
raw_data[rnum][cnum] = str(xls_sheet.cell(rnum,cnum).value)
for rnum in range(len(raw_data)):
for cnum in range(len(raw_data[rnum])):
if (cnum == len(raw_data[rnum]) - 1):
feild_delim = '\n'
else:
feild_delim = ','
raw_str += text_delim + raw_data[rnum][cnum] + text_delim + feild_delim
final_csv = open('FINAL.csv', 'w')
final_csv.write(raw_str)
final_csv.close()
This code is functional, but there are certain fields, such as a zip code, that are imported as numbers, so they have the decimal zero suffix. For example, is there is a zip code of '79854' in the Excel file, it will be imported as '79854.0'.
I have tried finding a solution in this xlrd spec, but was unsuccessful.
That's because integer values in Excel are imported as floats in Python. Thus, sheet.cell(r,c).value returns a float. Try converting the values to integers but first make sure those values were integers in Excel to begin with:
cell = sheet.cell(r,c)
cell_value = cell.value
if cell.ctype in (2,3) and int(cell_value) == cell_value:
cell_value = int(cell_value)
It is all in the xlrd spec.
I know this isn't part of the question, but I would get rid of raw_str and write directly to your csv. For a large file (10,000 rows) this will save loads of time.
You can also get rid of raw_data and just use one for loop.