Sorting CSV file and saving result as a CSV - python

I'd like to take a csv file, sort it and then save it as a csv. This is what I have so far and can't figure out how to write it to a csv file
import csv
with open('test.csv','r') as f:
sample = csv.reader(f)
sort = sorted(sample)
for eachline in sort:
print (eachline)

You don't need pandas for something simple like this:
# Read the input file and sort it
with open('input.csv') as f:
data = sorted(csv.reader(f))
# write to the output file
with open('output.csv', 'w', newline='\n') as f:
csv.writer(f).writerows(data)
Tuples in python sort lexicographically, meaning they sort by the first value, and if those are equal by the second. You can supply a key function to sorted to sort by a specific value.

I think something like this should do the trick:
import pandas as pd
path = "C:/Your/file/path/file.csv"
df = pd.read_csv(path)
df = df.sort_values("variablename_by_which_to_sort", axis=0, ascending=True/False)
df.to_csv(path)

Related

Read CSV file in custom string format

I have a .csv file which looks something like this:
-73.933087,40.6960679
-84.39591587,39.34949003
-111.2325173,47.49438049
How can I read that .csv file in python to get format like this(2 numbers between quotes seperated by comma):
numbers = ["-73.933087,40.6960679",
"-84.39591587,39.34949003",
"-111.2325173,47.49438049"]
I managed to load .csv in list, but I formatting is the problem.
import csv
with open('coordinates.csv', newline='') as f:
reader = csv.reader(f)
my_list = list(reader)
print(my_list)
input("Press enter to exit.")
Where I get output like this:
[['-73.933087', '40.6960679'],
['-84.39591587', '39.34949003'],
['-111.2325173', '47.49438049']]
So I need to remove single quotes here, and to change square brackets for double quotes.
Just use join to combine each line. You were 95% there with your code already.
import csv
numbers = []
with open('coordinates.csv', newline='') as f:
reader = csv.reader(f)
for row in reader:
nums = ",".join(row)
numbers.append(nums)
I think you should simply be able to store it in a pandas dataframe like this:
import pandas as pd
numbers = pd.read_csv (r'Path where the CSV file is stored\File name.csv')
print (numbers)
Then you can convert it to a numpy array or whatever you like.

How to convert nested json in csv with pandas

I have a nested json file (100k rows), which looks like this:
{"UniqueId":"4224f3c9-323c-e911-a820-a7f2c9e35195","TransactionDateUTC":"2019-03-01 15:00:52.627 UTC","Itinerary":"MUC-CPH-ARN-MUC","OriginAirportCode":"MUC","DestinationAirportCode":"CPH","OneWayOrReturn":"Return","Segment":[{"DepartureAirportCode":"MUC","ArrivalAirportCode":"CPH","SegmentNumber":"1","LegNumber":"1","NumberOfPassengers":"1"},{"DepartureAirportCode":"ARN","ArrivalAirportCode":"MUC","SegmentNumber":"2","LegNumber":"1","NumberOfPassengers":"1"}]}
I am trying to create a csv, so that it can easily be loaded in a rdbms. I am trying to use json_normalize() in pandas but even before I get there I am getting below error.
with open('transactions.json') as data_file:
data = json.load(data_file)
JSONDecodeError: Extra data: line 2 column 1 (char 466)
If your problem originates in reading the json file itself, then i would just use:
json.loads()
and then use
pd.read_csv()
If your problem originates in the conversion from your json dict to dataframe you can use this:
test = {"UniqueId":"4224f3c9-323c-e911-a820-a7f2c9e35195","TransactionDateUTC":"2019-03-01 15:00:52.627 UTC","Itinerary":"MUC-CPH-ARN-MUC","OriginAirportCode":"MUC","DestinationAirportCode":"CPH","OneWayOrReturn":"Return","Segment":[{"DepartureAirportCode":"MUC","ArrivalAirportCode":"CPH","SegmentNumber":"1","LegNumber":"1","NumberOfPassengers":"1"},{"DepartureAirportCode":"ARN","ArrivalAirportCode":"MUC","SegmentNumber":"2","LegNumber":"1","NumberOfPassengers":"1"}]}
import json
import pandas
# convert json to string and read
df = pd.read_json(json.dumps(test), convert_axes=True)
# 'unpack' the dict as series and merge them with original df
df = pd.concat([df, df.Segment.apply(pd.Series)], axis=1)
# remove dict
df.drop('Segment', axis=1, inplace=True)
That would be my approach but there might be more convenient approaches.
Step one: loop over a file of records
Since your file has one JSON record per line, you need to loop over all the records in your file, which you can do like this:
with open('transactions.json', encoding="utf8") as data_file:
for line in data_file:
data = json.loads(line)
# or
df = pd.read_json(line, convert_axes=True)
# do something with data or df
Step two: write the CSV file
Now, you can combine this with a csv.writer to convert the file into a CSV file.
with open('transactions.csv', "w", encoding="utf8") as csv_file:
writer = csv.writer(csv_file)
#Loop for each record, somehow:
#row = build list with row contents
writer.writerow(row)
Putting it all together
I'll read the first record once to get the keys to display and output them as a CSV header, and then I'll read the whole file and convert it one record at a time:
import copy
import csv
import json
import pandas as pd
# Read the first JSON record to get the keys that we'll use as headers for the CSV file
with open('transactions.json', encoding="utf8") as data_file:
keys = list(json.loads(next(data_file)).keys())
# Our CSV headers are going to be the keys from the first row, except for
# segments, which we'll replace (arbitrarily) by three numbered segment column
# headings.
keys.pop()
base_keys = copy.copy(keys)
keys.extend(["Segment1", "Segment2", "Segment3"])
with open('transactions.csv', "w", encoding="utf8") as csv_file:
writer = csv.writer(csv_file)
writer.writerow(keys) # Write the CSV headers
with open('transactions.json', encoding="utf8") as data_file:
for line in data_file:
data = json.loads(line)
row = [data[k] for k in base_keys] + data["Segment"]
writer.writerow(row)
The resulting CSV file will still have a JSON record in each Segmenti column. If you want to format each segment differently, you could define a format_segment(segment) function and replace data["Segment"] by this list comprehension: [format_segment(segment) for segment in data["Segment"]]

How can I open csv files and read them and sort them based on the data inside it?

So I'm trying to find how to open csv files and sort all the details in it...
so an example of data contained in a CSV file is...
2,8dac2b,ewmzr,jewelry,phone0,9759243157894736,us,69.166.231.58,vasstdc27m7nks3
1,668d39,aeqok,furniture,phone1,9759243157894736,in,50.201.125.84,jmqlhflrzwuay9c
3,622r49,arqek,doctor,phone2,9759544365415694736,in,53.001.135.54,weqlhrerreuert6f
and so I'm trying to let a function sortCSV(File) to open the CSV file and sort it based on the very first number, which is 0, 1 ....
so the output should be
1,668d39,aeqok,furniture,phone1,9759243157894736,in,50.201.125.84,jmqlhflrzwuay9c
2,8dac2b,ewmzr,jewelry,phone0,9759243157894736,us,69.166.231.58,vasstdc27m7nks3
3,622r49,arqek,doctor,phone2,9759544365415694736,in,53.001.135.54,weqlhrerreuert6f
Here is my code so far, which clearly doesn't work....
import csv
def CSV2List(csvFilename: str):
f = open(csvFilename)
q = list(f)
return q.sort()
What changes should I make to my code to make sure my code works??
using pandas, set the first column as index and use sort_index to sort based on your index column:
import pandas as pd
file_path = '/data.csv'
df = pd.read_csv(file_path,header=None,index_col=0)
df = df.sort_index()
print(df)
There's a number of ways you could handle this but one of the easiest would be to install Pandas (https://pandas.pydata.org/).
First off you most likely will need some titles of each column which should be on the first row of you CSV file. When you've added the column titles and installed pandas:
With pandas:
import pandas as pd
dataframe = pd.read_csv(filepath, index=0)
This will set the first column as the index column and will be sorting on the index.
Another way I've had to handle CSV:s with difficult formatting (aka exporting form excel etc) is by reading the file as a regular file and then iterating the rows to handle them on my own.
final_data = []
with open (filepath, "r") as f:
for row in f:
# Split the row
row_data = row.split(",")
# Add to final data array
final_data.append(row_data
# This sorts the final data based on first row
final_data.sort(key = lambda row: row[0])
# This returns a sorted list of rows of your CSV
return final_data
try csv.reader(Filename)
import csv
def CSV2List(csvFilename: str):
f = open(csvFilename)
q = csv.reader(f)
return q.sort(key=lambda x: x[0])
Using the csv module:
import csv
def csv_to_list(filename: str):
# use a context manager here
with open(filename) as fh:
reader = csv.reader(fh)
# convert the first item to an int for sorting
rows = [[int(num), *row] for num, *row in reader]
# sort the rows based on that value
return sorted(rows, key=lambda row: row[0])
This is not the best way to deal with CSV files but:
def CSV2List(csvFilename: str):
f = open(csvFilename,'r')
l = []
for line in f:
l.append(line.split(','))
for item in l:
item[0] = int(item[0])
return sorted(l)
print(CSV2List('data.csv'))
However I would probably use pandas instead, it is a great module

How to store data of a specific column from a csv file to a list in Python

How can i store data of a specific column in a csv file for example Name, in a list using python?
When i try to output
It output repeatedly
enter image description here
Please help
The easiest way is to use pandas:
import pandas as pd
df = pd.read_csv('names.csv')
names = df['Name'].tolist()
One possibility would be to use a list comprehension.
import csv
with open("names.csv", "r") as csv_file:
csv_reader = csv.reader(csv_file, delimiter=",")
# List comprehension
csv_list = [line[0] for line in csv_reader]
The "0" in line[0] can be changed to be whichever column is desired.
One way would be to use pandas.
import pandas as pd
df = pd.read_csv('filepath_here')
your_list = df['Name'].tolist()

Save tuple in a list into a new file [duplicate]

I have data written in a csv file in the format below:
[(789,255,25,33.0),(855,275,25,33.0)............]
I want it to be converted into a format like:
1. 789,255,25,33.0
2. 855,275,25,33.0
..............
So all i want is convert the tuples in the list into a new csv file with each tuple in a new line. The values in the list are in string and i want to convert it into float as well how do i accomplish it?
Using the csv module and enumerate.
Ex:
import csv
data = [(789,255,25,33.0),(855,275,25,33.0)]
with open(filename, "w") as outfile:
writer = csv.writer(outfile)
for i, line in enumerate(data, 1):
writer.writerow([i]+ list(line))
Using Pandas
import pandas as pd
data = [(789,255,25,33.0),(855,275,25,33.0)]
df = pd.DataFrame(data)
df.to_csv(filename, header=None)

Categories