Read CSV file in custom string format - python

I have a .csv file which looks something like this:
-73.933087,40.6960679
-84.39591587,39.34949003
-111.2325173,47.49438049
How can I read that .csv file in python to get format like this(2 numbers between quotes seperated by comma):
numbers = ["-73.933087,40.6960679",
"-84.39591587,39.34949003",
"-111.2325173,47.49438049"]
I managed to load .csv in list, but I formatting is the problem.
import csv
with open('coordinates.csv', newline='') as f:
reader = csv.reader(f)
my_list = list(reader)
print(my_list)
input("Press enter to exit.")
Where I get output like this:
[['-73.933087', '40.6960679'],
['-84.39591587', '39.34949003'],
['-111.2325173', '47.49438049']]
So I need to remove single quotes here, and to change square brackets for double quotes.

Just use join to combine each line. You were 95% there with your code already.
import csv
numbers = []
with open('coordinates.csv', newline='') as f:
reader = csv.reader(f)
for row in reader:
nums = ",".join(row)
numbers.append(nums)

I think you should simply be able to store it in a pandas dataframe like this:
import pandas as pd
numbers = pd.read_csv (r'Path where the CSV file is stored\File name.csv')
print (numbers)
Then you can convert it to a numpy array or whatever you like.

Related

Sorting CSV file and saving result as a CSV

I'd like to take a csv file, sort it and then save it as a csv. This is what I have so far and can't figure out how to write it to a csv file
import csv
with open('test.csv','r') as f:
sample = csv.reader(f)
sort = sorted(sample)
for eachline in sort:
print (eachline)
You don't need pandas for something simple like this:
# Read the input file and sort it
with open('input.csv') as f:
data = sorted(csv.reader(f))
# write to the output file
with open('output.csv', 'w', newline='\n') as f:
csv.writer(f).writerows(data)
Tuples in python sort lexicographically, meaning they sort by the first value, and if those are equal by the second. You can supply a key function to sorted to sort by a specific value.
I think something like this should do the trick:
import pandas as pd
path = "C:/Your/file/path/file.csv"
df = pd.read_csv(path)
df = df.sort_values("variablename_by_which_to_sort", axis=0, ascending=True/False)
df.to_csv(path)

convert items in csv column to list using python

So i have been reading answers on StackOverflow and haven't been able to find this specific doubt that i have.
I have a csv with a single column with values as follows:
**Values**
abc
xyz
bcd,fgh
tew,skdh,fsh
As you can see above some cells have more than one value separated by commas,
i used the following code:
with open('dat.csv', 'rb') as inputfile:
reader = csv.reader(inputfile)
colnames=['Keywords']
data = pandas.read_csv('dat.csv', names=colnames)
lkn=data.values.tolist()
print lkn
The output i got was: [['abc'],['xyz'],['bcd,fgh'],['tew,skdh,fsh']]
i would like to have the output as:
[['abc'],['xyz'],['bcd','fgh'],['tew','skdh','fsh']]
which i believe is a proper list of list format(fairly new to list of lists). Please do provide guidance in the right direction.
Thanks!.
NB:csv file with how cells are arranged (image)
Looking at your attached image, I'd bet that the cells have been quoted (although, to be sure, open the CSV file in a text editor, not in Excel) so you have to do the manual splitting yourself:
import csv
with open("file.csv", "r") as f:
reader = csv.reader(f)
your_list = [e[0].strip().split(",") for e in reader if e]
Try something like this :
import csv
with open('file.csv', 'r') as f:
reader = csv.reader(f)
your_list = list(reader)
for item in your_list:
item = list(item)
print(your_list)
Credit : Python import csv to list

Better way to parse CSV into list or array

Is there a better way to create a list or a numpy array from this csv file? What I'm asking is how to do it and parse more gracefully than I did in the code below.
fname = open("Computers discovered recently by discovery method.csv").readlines()
lst = [elt.strip().split(",")[8:] for elt in fname if elt != "\n"][4:]
lst2 = []
for row in lst:
print(row)
if row[0].startswith("SMZ-") or row[0].startswith("MTR-"):
lst2.append(row)
print(*lst2, sep = "\n")
You can always use Pandas. As an example,
import pandas as pd
import numpy as np
df = pd.read_csv('pandas_dataframe_importing_csv/example.csv')
To convert it, you will have to convert it to your favorite numeric type. I guess you can write the whole thing in one line:
result = numpy.array(list(df)).astype("float")
You can also do the following:
from numpy import genfromtxt
my_data = genfromtxt('my_file.csv', delimiter=',')
You can use pandas and specify header column to make it work correctly on you sample file
import pandas as pd
df = pd.read_csv('Computers discovered recently by discovery method.csv', header=2)
You can check your content using:
>>> df.head()
You can check headers using
>>> df.columns
And to convert it to numpy array you can use
>>> np_arr = df.values
It comes with a lot of options to parse and read csv files. For more information please check the docs
I am not sure what you want but try this
import csv
with open("Computers discovered recently by discovery method.csv", 'r') as f:
reader = csv.reader(f)
ll = list(reader)
print (ll)
this should read the csv line by line and store it as a list
You should never parse CSV structures manually unless you want to tackle all possible exceptions and CSV format oddities. Python has you covered in that regard with its csv module.
The main problem, in your case, stems from your data - there seems to be two different CSV structures in a single file so you first need to find where your second structure begins. Plus, from your code, it seems you want to filter out all columns before Details_Table0_Netbios_Name0 and include only rows whose Details_Table0_Netbios_Name0 starts with SMZ- or MTR-. So something like:
import csv
with open("Computers discovered recently by discovery method.csv") as f:
reader = csv.reader(f) # create a CSV reader
for row in reader: # skip the lines until we encounter the second CSV structure/header
if row and row[0] == "Header_Table0_Netbios_Name0":
break
index = row.index("Details_Table0_Netbios_Name0") # find where your columns begin
result = [] # storage for the rows we're interested in
for row in reader: # read the rest of the CSV row by row
if row and row[index][:4] in {"SMZ-", "MTR-"}: # only include these rows
result.append(row[index:]) # trim and append to the `result` list
print(result[10]) # etc.
# ['MTR-PC0BXQE6-LB', 'PR2', 'anisita', 'VALUEADDCO', 'VALUEADDCO', 'Heartbeat Discovery',
# '07.12.2017 17:47:51', '13']
should do the trick.
Sample Code
import csv
csv_file = 'sample.csv'
with open(csv_file) as fh:
reader = csv.reader(fh)
for row in reader:
print(row)
sample.csv
name,age,salary
clado,20,25000
student,30,34000
sam,34,32000

How to import a csv-file into a data array?

I have a line of code in a script that imports data from a text file with lots of spaces between values into an array for use later.
textfile = open('file.txt')
data = []
for line in textfile:
row_data = line.strip("\n").split()
for i, item in enumerate(row_data):
try:
row_data[i] = float(item)
except ValueError:
pass
data.append(row_data)
I need to change this from a text file to a csv file. I don't want to just change this text to split on commas (since some values can have commas if they're in quotes). Luckily I saw there is a csv library I can import that can handle this.
import csv
with open('file.csv', 'rb') as csvfile:
???
How can I load the csv file into the data array?
If it makes a difference, this is how the data will be used:
row = 0
for row_data in (data):
worksheet.write_row(row, 0, row_data)
row += 1
Assuming the CSV file is delimited with commas, the simplest way using the csv module in Python 3 would probably be:
import csv
with open('testfile.csv', newline='') as csvfile:
data = list(csv.reader(csvfile))
print(data)
You can specify other delimiters, such as tab characters, by specifying them when creating the csv.reader:
data = list(csv.reader(csvfile, delimiter='\t'))
For Python 2, use open('testfile.csv', 'rb') to open the file.
You can use pandas library or numpy to read the CSV file. If your file is tab-separated then use '\t' in place of comma in both sep and delimiter arguments below.
import pandas as pd
myFile = pd.read_csv('filepath', sep=',')
Or
import numpy as np
myFile = np.genfromtxt('filepath', delimiter=',')
I think the simplest way to do this is via Pandas:
import pandas as pd
data = pd.read_csv(FILE).values
This returns a Numpy array of values from a DataFrame created from the CSV. See the documentation here.
This method also works for me.
Example: Having random data, and each data point starting on a newline like below:
'dog',5,2
'cat',5,7,1
'man',5,7,3,'banana'
'food',5,8,9,4,'girl'
import csv
with open('filePath.csv', 'r') as readData:
readCsv = csv.reader(readData)
data = list(readCsv)

Only outputting a few lines into a text file, instead of all of them

I've made a Python script that grabs information from a .csv archive, and outputs it into a text file as a list. The original csv file has over 200,000 fields to input and output from, yet when I run my program it only outputs 36 into the .txt file.
Here's the code:
import csv
with open('OriginalFile.csv', 'r') as csvfile:
emailreader = csv.reader(csvfile)
f = open('text.txt', 'a')
for row in emailreader:
f.write(row[1] + "\n")
And the text file only lists up to 36 strings. How can I fix this? Is maybe the original csv file too big?
After many comments, the original problem was encoding of characters in the csv file. If you specify the encoding in pandas it will read it just fine.
Any time you are dealing with a csv file (or excel, sql or R) I would use Pandas DataFrames for this. The syntax is shorter and easier to know what is going on.
import pandas as pd
csvframe = pd.read_csv('OriginalFile.csv', encoding='utf-8')
with open('text.txt', 'a') as output:
# I think what you wanted was the 2nd column from each row
output.write('\n'.join(csvframe.ix[:,1].values))
# the ix is for index and : is for all the rows and the 1 is only the first column
You might have luck with something like the following:
with open('OriginalFile.csv', 'r') as csvfile:
emailreader = csv.reader(csvfile)
with open('text.txt','w') as output:
for line in emailreader:
output.write(line[1]+'\n')

Categories