I have data written in a csv file in the format below:
[(789,255,25,33.0),(855,275,25,33.0)............]
I want it to be converted into a format like:
1. 789,255,25,33.0
2. 855,275,25,33.0
..............
So all i want is convert the tuples in the list into a new csv file with each tuple in a new line. The values in the list are in string and i want to convert it into float as well how do i accomplish it?
Using the csv module and enumerate.
Ex:
import csv
data = [(789,255,25,33.0),(855,275,25,33.0)]
with open(filename, "w") as outfile:
writer = csv.writer(outfile)
for i, line in enumerate(data, 1):
writer.writerow([i]+ list(line))
Using Pandas
import pandas as pd
data = [(789,255,25,33.0),(855,275,25,33.0)]
df = pd.DataFrame(data)
df.to_csv(filename, header=None)
Related
I have a .csv file which looks something like this:
-73.933087,40.6960679
-84.39591587,39.34949003
-111.2325173,47.49438049
How can I read that .csv file in python to get format like this(2 numbers between quotes seperated by comma):
numbers = ["-73.933087,40.6960679",
"-84.39591587,39.34949003",
"-111.2325173,47.49438049"]
I managed to load .csv in list, but I formatting is the problem.
import csv
with open('coordinates.csv', newline='') as f:
reader = csv.reader(f)
my_list = list(reader)
print(my_list)
input("Press enter to exit.")
Where I get output like this:
[['-73.933087', '40.6960679'],
['-84.39591587', '39.34949003'],
['-111.2325173', '47.49438049']]
So I need to remove single quotes here, and to change square brackets for double quotes.
Just use join to combine each line. You were 95% there with your code already.
import csv
numbers = []
with open('coordinates.csv', newline='') as f:
reader = csv.reader(f)
for row in reader:
nums = ",".join(row)
numbers.append(nums)
I think you should simply be able to store it in a pandas dataframe like this:
import pandas as pd
numbers = pd.read_csv (r'Path where the CSV file is stored\File name.csv')
print (numbers)
Then you can convert it to a numpy array or whatever you like.
I have a nested json file (100k rows), which looks like this:
{"UniqueId":"4224f3c9-323c-e911-a820-a7f2c9e35195","TransactionDateUTC":"2019-03-01 15:00:52.627 UTC","Itinerary":"MUC-CPH-ARN-MUC","OriginAirportCode":"MUC","DestinationAirportCode":"CPH","OneWayOrReturn":"Return","Segment":[{"DepartureAirportCode":"MUC","ArrivalAirportCode":"CPH","SegmentNumber":"1","LegNumber":"1","NumberOfPassengers":"1"},{"DepartureAirportCode":"ARN","ArrivalAirportCode":"MUC","SegmentNumber":"2","LegNumber":"1","NumberOfPassengers":"1"}]}
I am trying to create a csv, so that it can easily be loaded in a rdbms. I am trying to use json_normalize() in pandas but even before I get there I am getting below error.
with open('transactions.json') as data_file:
data = json.load(data_file)
JSONDecodeError: Extra data: line 2 column 1 (char 466)
If your problem originates in reading the json file itself, then i would just use:
json.loads()
and then use
pd.read_csv()
If your problem originates in the conversion from your json dict to dataframe you can use this:
test = {"UniqueId":"4224f3c9-323c-e911-a820-a7f2c9e35195","TransactionDateUTC":"2019-03-01 15:00:52.627 UTC","Itinerary":"MUC-CPH-ARN-MUC","OriginAirportCode":"MUC","DestinationAirportCode":"CPH","OneWayOrReturn":"Return","Segment":[{"DepartureAirportCode":"MUC","ArrivalAirportCode":"CPH","SegmentNumber":"1","LegNumber":"1","NumberOfPassengers":"1"},{"DepartureAirportCode":"ARN","ArrivalAirportCode":"MUC","SegmentNumber":"2","LegNumber":"1","NumberOfPassengers":"1"}]}
import json
import pandas
# convert json to string and read
df = pd.read_json(json.dumps(test), convert_axes=True)
# 'unpack' the dict as series and merge them with original df
df = pd.concat([df, df.Segment.apply(pd.Series)], axis=1)
# remove dict
df.drop('Segment', axis=1, inplace=True)
That would be my approach but there might be more convenient approaches.
Step one: loop over a file of records
Since your file has one JSON record per line, you need to loop over all the records in your file, which you can do like this:
with open('transactions.json', encoding="utf8") as data_file:
for line in data_file:
data = json.loads(line)
# or
df = pd.read_json(line, convert_axes=True)
# do something with data or df
Step two: write the CSV file
Now, you can combine this with a csv.writer to convert the file into a CSV file.
with open('transactions.csv', "w", encoding="utf8") as csv_file:
writer = csv.writer(csv_file)
#Loop for each record, somehow:
#row = build list with row contents
writer.writerow(row)
Putting it all together
I'll read the first record once to get the keys to display and output them as a CSV header, and then I'll read the whole file and convert it one record at a time:
import copy
import csv
import json
import pandas as pd
# Read the first JSON record to get the keys that we'll use as headers for the CSV file
with open('transactions.json', encoding="utf8") as data_file:
keys = list(json.loads(next(data_file)).keys())
# Our CSV headers are going to be the keys from the first row, except for
# segments, which we'll replace (arbitrarily) by three numbered segment column
# headings.
keys.pop()
base_keys = copy.copy(keys)
keys.extend(["Segment1", "Segment2", "Segment3"])
with open('transactions.csv', "w", encoding="utf8") as csv_file:
writer = csv.writer(csv_file)
writer.writerow(keys) # Write the CSV headers
with open('transactions.json', encoding="utf8") as data_file:
for line in data_file:
data = json.loads(line)
row = [data[k] for k in base_keys] + data["Segment"]
writer.writerow(row)
The resulting CSV file will still have a JSON record in each Segmenti column. If you want to format each segment differently, you could define a format_segment(segment) function and replace data["Segment"] by this list comprehension: [format_segment(segment) for segment in data["Segment"]]
I'd like to take a csv file, sort it and then save it as a csv. This is what I have so far and can't figure out how to write it to a csv file
import csv
with open('test.csv','r') as f:
sample = csv.reader(f)
sort = sorted(sample)
for eachline in sort:
print (eachline)
You don't need pandas for something simple like this:
# Read the input file and sort it
with open('input.csv') as f:
data = sorted(csv.reader(f))
# write to the output file
with open('output.csv', 'w', newline='\n') as f:
csv.writer(f).writerows(data)
Tuples in python sort lexicographically, meaning they sort by the first value, and if those are equal by the second. You can supply a key function to sorted to sort by a specific value.
I think something like this should do the trick:
import pandas as pd
path = "C:/Your/file/path/file.csv"
df = pd.read_csv(path)
df = df.sort_values("variablename_by_which_to_sort", axis=0, ascending=True/False)
df.to_csv(path)
How can i store data of a specific column in a csv file for example Name, in a list using python?
When i try to output
It output repeatedly
enter image description here
Please help
The easiest way is to use pandas:
import pandas as pd
df = pd.read_csv('names.csv')
names = df['Name'].tolist()
One possibility would be to use a list comprehension.
import csv
with open("names.csv", "r") as csv_file:
csv_reader = csv.reader(csv_file, delimiter=",")
# List comprehension
csv_list = [line[0] for line in csv_reader]
The "0" in line[0] can be changed to be whichever column is desired.
One way would be to use pandas.
import pandas as pd
df = pd.read_csv('filepath_here')
your_list = df['Name'].tolist()
I have a line of code in a script that imports data from a text file with lots of spaces between values into an array for use later.
textfile = open('file.txt')
data = []
for line in textfile:
row_data = line.strip("\n").split()
for i, item in enumerate(row_data):
try:
row_data[i] = float(item)
except ValueError:
pass
data.append(row_data)
I need to change this from a text file to a csv file. I don't want to just change this text to split on commas (since some values can have commas if they're in quotes). Luckily I saw there is a csv library I can import that can handle this.
import csv
with open('file.csv', 'rb') as csvfile:
???
How can I load the csv file into the data array?
If it makes a difference, this is how the data will be used:
row = 0
for row_data in (data):
worksheet.write_row(row, 0, row_data)
row += 1
Assuming the CSV file is delimited with commas, the simplest way using the csv module in Python 3 would probably be:
import csv
with open('testfile.csv', newline='') as csvfile:
data = list(csv.reader(csvfile))
print(data)
You can specify other delimiters, such as tab characters, by specifying them when creating the csv.reader:
data = list(csv.reader(csvfile, delimiter='\t'))
For Python 2, use open('testfile.csv', 'rb') to open the file.
You can use pandas library or numpy to read the CSV file. If your file is tab-separated then use '\t' in place of comma in both sep and delimiter arguments below.
import pandas as pd
myFile = pd.read_csv('filepath', sep=',')
Or
import numpy as np
myFile = np.genfromtxt('filepath', delimiter=',')
I think the simplest way to do this is via Pandas:
import pandas as pd
data = pd.read_csv(FILE).values
This returns a Numpy array of values from a DataFrame created from the CSV. See the documentation here.
This method also works for me.
Example: Having random data, and each data point starting on a newline like below:
'dog',5,2
'cat',5,7,1
'man',5,7,3,'banana'
'food',5,8,9,4,'girl'
import csv
with open('filePath.csv', 'r') as readData:
readCsv = csv.reader(readData)
data = list(readCsv)