How to convert list elements in column - python

import csv
import pandas as pd
imp=[]
feature1 = []
issued = []
used = []
with open("lmutil_lmstat.txt", "r") as input:
f=open("lmutil_lmstat.txt","r")
found = False
for x in f.readlines():
if ("Users" in x):
found = True
feature1.append(x.split(" ")[2][:-1])
issued.append(x.split(" ")[6][:])
used.append(x.split(" ")[12][:])
#print(x)
data = pd.DataFrame({'Feature_Name': [feature1], 'Licesence_Issued': [issued], 'Licesence_Used':[used]})
data_frame = pd.DataFrame(data)
with open('license_summery1.csv', 'w', newline='') as file:
writer = csv.writer(file)
writer.writerow(data_frame)
This is my code I am taking specific data from file and store it in the list. while creating data frame I am getting the output (1, 3) but I want to create a table with output (57,3)
Please check the above code and give suggestions. Any help will be appreciated.

I have actually just answered the same question here: how can change numpy array to single value?
You can use explode on your dataFrame. If you have multiple values in every element, it will expand your rows with single elements.
data_frame = data_frame.apply(pd.Series.explode)

Related

How do I add values from a csv file to a list?

x,y
6.1101,17.592
5.5277,9.1302
8.5186,13.662
7.0032,11.854
5.8598,6.8233
8.3829,11.886
7.4764,4.3483
8.5781,12
6.4862,6.5987
5.0546,3.8166
5.7107,3.2522
14.164,15.505
How do I put each value for x in a list and the same for y values ?
I'm basically trying to create a plot.
You could do:
import csv
from collections import defaultdict
columns = defaultdict(list)
with open("my.csv") as fin:
dr = csv.DictReader(fin)
for row in dr:
for key, val in row.items():
columns[key].append(float(val))
print(columns["x"])
print(columns["y"])
Gives:
[6.1101, 5.5277, 8.5186, 7.0032, 5.8598, 8.3829, 7.4764, 8.5781, 6.4862, 5.0546, 5.7107]
[17.592, 9.1302, 13.662, 11.854, 6.8233, 11.886, 4.3483, 12.0, 6.5987, 3.8166, 3.2522]
Obviously this is assuming that the contents will be numeric data that needs to be converted to float (as the question says that you are trying to create a plot). If there were non-numeric values, this would raise a ValueError, so if this might be the case then you would need to test for this or handle the exception.
use pandas
import pandas as pd
df=pd.read_csv('myfile.csv', sep=',',header=None)
Use pandas, an example below.
import pandas as pd
df = pd.read_csv('data.csv')
x = df.x.tolist()
y = df.y.tolist()
x and y variables will contain values from column x and y in your CSV as lists respectively.
you can use pandas to do this:
import pandas as pd
df = pd.read_csv('cord.csv', sep=',')
x = df['x'].tolist()
y = df['y'].tolist()
output:
[6.1101, 5.5277, 8.5186, 7.0032, 5.8598, 8.3829, 7.4764, 8.5781, 6.4862, 5.0546, 5.7107, 14.164]
[17.592, 9.1302, 13.662, 11.854, 6.8233, 11.886, 4.3483, 12.0, 6.5987, 3.8166, 3.2522, 15.505]
import pandas as pd
df = pd.read_csv('data.csv', header = None)
print(type(df.columns))
print(type(df.index))
then once you know the default type of the data sets, you can use
df.columns.tolist()
df.index.tolist()
print(type(data.columns.tolist()))
print(type(data.index.tolist()))
Easy way is to use csv module functionnality.
First create a csv reader function:
import csv
def csv_dict_reader(file, has_header=False, skip_comment_char=None, **kwargs):
"""
Reads CSV file into memory
:param file: (str) path to csv file to read
:param has_header: (bool) skip first line
:param skip_comment_char: (str) optional character which, if found on first row, will skip row
:param delimiter: (char) CSV delimiter char
:param fieldnames: (list) CSV field names for dictionnary creation
:param kwargs:
:return: csv object that can be iterated
"""
with open(file) as fp:
csv_data = csv.DictReader(fp, **kwargs)
# Skip header
if has_header:
next(csv_data)
fieldnames = kwargs.get('fieldnames')
for row in csv_data:
# Skip commented out entries
if fieldnames is not None:
if skip_comment_char is not None:
if not row[fieldnames[0]].startswith(skip_comment_char):
yield row
else:
yield row
else:
# list(row)[0] is key from row, works with Python 3.7+
if skip_comment_char is not None:
if not row[list(row)[0]].startswith(skip_comment_char):
yield row
else:
yield row
The above function returns a generator that can be iterated over, which is useful if your csv file is very large, so it hasn't to fit in memory at once.
Then use that function to read your data and iterate over the values
fieldnames = ('x', 'y')
data = csv_dict_reader('/path/to/my/file')
x_list = []
y_list = []
for row in data:
x_list.append(row['x'])
y_list.append(row['y'])
Btw, perhaps using two separate lists isn't the most optimized way.
You could remove both lists and simply use row['x'] directly.

How to return a specific data structure with inner dictionary of lists

I have a csv file (image attached) and to take the CSV file and create a dictionary of lists with the format "{method},{number},{orbital_period},{mass},{distance},{year}" .
So far I have code :
import csv
with open('exoplanets.csv') as inputfile :
reader = csv.reader(inputfile)
inputm = list(reader)
print(inputm)
but my output is coming out like ['Radial Velocity', '1', '269.3', '7.1', '77.4', '2006']
when I want it to look like :
"Radial Velocity" : {"number":[1,1,1], "orbital_period":[269.3, 874.774, 763.0], "mass":[7.1, 2.21, 2.6], "distance":[77.4, 56.95, 19.84], "year":[2006.0, 2008.0, 2011.0] } , "Transit" : {"number":[1,1,1], "orbital_period":[1.5089557, 1.7429935, 4.2568], "mass":[], "distance":[200.0, 680.0], "year":[2008.0, 2008.0, 2008.0] }
Any ideas on how I can alter my code?
Hey SKR01 welcome to Stackoverflow!
I would suggest working with the pandas library. It is meant for table like contents that you have there. What you are then looking for is a groupby on your #method column.
import pandas as pd
def remove_index(row):
d = row._asdict()
del d["Index"]
return d
df = pd.read_csv("https://docs.google.com/uc?export=download&id=1PnQzoefx-IiB3D5BKVOrcawoVFLIPVXQ")
{row.Index : remove_index(row) for row in df.groupby('#method').aggregate(list).itertuples()}
The only thing that remains is removing the nan values from the resulting dict.
If you don't want to use Pandas, maybe something like this is what you're looking for:
import csv
with open('exoplanets.csv') as inputfile :
reader = csv.reader(inputfile)
inputm = list(reader)
header = inputm.pop(0)
del header[0] # probably you don't want "#method"
# create and populate the final dictionary
data = {}
for row in inputm:
if row[0] not in data:
data[row[0]] = {h:[] for h in header}
for i, h in enumerate(header):
data[row[0]][h].append(row[i+1])
print(data)
This is a bit complex, and I'm questioning why you want the data this way, but this should get you the output format you want without requiring any external libraries like Pandas.
import csv
with open('exoplanets.csv') as input_file:
rows = list(csv.DictReader(input_file))
# Create the data structure
methods = {d["#method"]: {} for d in rows}
# Get a list of fields, trimming off the method column
fields = list(rows[1])[1:]
# Fill in the data structure
for method in methods:
methods[method] = {
# Null-trimmed version of listcomp
# f: [r[f] for r in rows if r["#method"] == method and r[f]]
f: [r[f] for r in rows if r["#method"] == method]
for f
in fields
}
Note: This could be one multi-tiered list/dict comprehension, but I've broken it apart for clarity.

Sort a column in Worksheet using Python

In the below program i have created a workbook which contains a worksheet named sort
where i have placed words in one column and Numbers in another column
Now i have successfully outputed the .xlsxv file
But i need the numbers should be sorted from DESCENDING TO ASCENDING ORDER.
I don't know how to place the code for that.
Code
=====
import csv
import xlsxwriter
import re
workbook = xlsxwriter.Workbook('wordsandnumbers.xlsx')
worksheet = workbook.add_worksheet('sort')
with open('sort.csv') as f:
reader = csv.reader(f)
alist = list(reader)
worksheet.write(2,0,'words')
worksheet.write(2,1,'Numbers')
newlist = []
for values in alist:
convstr = str(values)
convstr = convstr.split(",")
newlist.extend(convstr)
a=3
for i in range(3,10):
newlist[a] = re.sub('[^a-zA-Z]','',newlist[a])
worksheet.write(i,0,newlist[a].strip('['))
a=a+1
newlist[a] = re.sub('[^0-9]','',newlist[a])
int(newlist[a])
worksheet.write(i,1,newlist[a])
a=a+1
workbook.close()
The Output i'm getting in .xlsx sheet is :
Needed output:
(The corresponding words which is in the same row of number should also be sorted)
I would recommend loading your original csv as a dataframe and then sorting it by a particular column. I've provided a fully reproducible example below that illustrates this.
I make my own version of sort.csv for demonstration purposes, then read it in as a dataframe using pandas.read_csv, and then sort using pandas.DataFrame.sort_values.
import pandas as pd
sort = open('sort.csv', 'w+')
sort.write('May, 5227\n')
sort.write('June, 417\n')
sort.write('Jan, 4\n')
sort.write('Feb, 424\n')
sort.write('Dec, 36\n')
sort.write('Mar, 4981\n')
sort.write('Apr, 3460\n')
sort.close()
df = pd.read_csv('sort.csv', names = ['words', 'Numbers'])
df = df.sort_values(['Numbers'], ascending=[False])
writer = pd.ExcelWriter('wordsandnumbers.xlsx', engine='xlsxwriter')
df.to_excel(writer, index=False, startrow=2)
writer.save()
Outputted sort.csv:
Outputted wordsandnumbers.xlsx:
Once you get the data into the array its straightforward to sort it and maintain order. You can just use the built in sort but give it a key which is the value you want the list sorted based on. See this.
import csv
import xlsxwriter
import re
workbook = xlsxwriter.Workbook('wordsandnumbers.xlsx')
worksheet = workbook.add_worksheet('sort')
with open('./sort.csv') as f:
reader = csv.reader(f)
alist = list(reader)
worksheet.write(2,0,'words')
worksheet.write(2,1,'Numbers')
#Here convert the number to an integer
newerlist = [[x[0], int(x[1])] for x in alist[1:]]
print(newerlist)
#key is the function applied to the arguments to get the answer and lambda
#is just a 1 line way to write a function f(x) which returns x[1] (the number in the rows)
newerlist.sort(key = lambda x : x[1], reverse = True)
a=3
for i in range(3,9):
for j in range(0,2):
worksheet.write(i,j,str(newerlist[i-a][j]))
workbook.close()

Why is the cdc_list getting updated after calling the function read_csv() in total_list?

# Program to combine data from 2 csv file
The cdc_list gets updated after second call of read_csv
overall_list = []
def read_csv(filename):
file_read = open(filename,"r").read()
file_split = file_read.split("\n")
string_list = file_split[1:len(file_split)]
#final_list = []
for item in string_list:
int_fields = []
string_fields = item.split(",")
string_fields = [int(x) for x in string_fields]
int_fields.append(string_fields)
#final_list.append()
overall_list.append(int_fields)
return(overall_list)
cdc_list = read_csv("US_births_1994-2003_CDC_NCHS.csv")
print(len(cdc_list)) #3652
total_list = read_csv("US_births_2000-2014_SSA.csv")
print(len(total_list)) #9131
print(len(cdc_list)) #9131
I don't think the code you pasted explains the issue you've had, at least it's not anywhere I can determine. Seems like there's a lot of code you did not include in what you pasted above, that might be responsible.
However, if all you want to do is merge two csvs (assuming they both have the same columns), you can use Pandas' read_csv and Pandas DataFrame methods append and to_csv, to achieve this with 3 lines of code (not including imports):
import pandas as pd
# Read CSV file into a Pandas DataFrame object
df = pd.read_csv("first.csv")
# Read and append the 2nd CSV file to the same DataFrame object
df = df.append( pd.read_csv("second.csv") )
# Write merged DataFrame object (with both CSV's data) to file
df.to_csv("merged.csv")

How to Perform Mathematical Operation on One Value of a CSV file?

I am dealing with a csv file that contains three columns and three rows containing numeric data. The csv data file simply looks like the following:
Colum1,Colum2,Colum3
1,2,3
1,2,3
1,2,3
My question is how to write a python code that take a single value of one of the column and perform a specific operation. For example, let say I want to take the first value in 'Colum1' and subtract it from the sum of all the values in the column.
Here is my attempt:
import csv
f = open('columns.csv')
rows = csv.DictReader(f)
value_of_single_row = 0.0
for i in rows:
value_of_single_Row += float(i) # trying to isolate a single value here!
print value_of_single_row - sum(float(r['Colum1']) for r in rows)
f.close()
Based on the code you provided, I suggest you take a look at the doc to see the preferred approach on how to read through a csv file. Take a look here:
How to use CsvReader
with that being said, you can modify the beginning of your code slightly to this:
import csv
with open('data.csv', 'rb') as f:
rows = csv.DictReader(f)
for row in rows:
# perform operation per row
From there you now have access to each row.
This should give you what you need to do proper row-by-row operations.
What I suggest you do is play around with printing out your rows to see what your data looks like. You will see that each row being outputted is a dictionary.
So if you were going through each row, you can just simply do something like this:
for row in rows:
row['Colum1'] # or row.get('Colum1')
# to do some math to add everything in Column1
s += float(row['Column1'])
So all of that will look like this:
import csv
s = 0
with open('data.csv', 'rb') as f:
rows = csv.DictReader(f)
for row in rows:
s += float(row['Colum1'])
You can do pretty much all of this with pandas
from pandas import DataFrame, read_csv
import matplotlib.pyplot as plt
import pandas as pd
import sys
import os
Location = r'path/test.csv'
df = pd.read_csv(Location, names=['Colum1','Colum2','Colum3'])
df = df[1:] #Remove the headers since they're unnecessary
print df
df.xs(1)['Colum1']=int(df.loc[1,'Colum1'])+5
print df
You can write back to your csv using df.to_csv('File path', index=False,header=True) Having headers=True will add the headers back in.
To do this more along the lines of what you have you can do it like this
import csv
Location = r'C:/Users/tnabrelsfo/Documents/Programs/Stack/test.csv'
data = []
with open(Location, 'r') as f:
for line in f:
data.append(line.replace('\n','').replace(' ','').split(','))
data = data[1:]
print data
data[1][1] = 5
print data
it will read in each row, cut out the column names, and then you can modify the values by index
So here is my simple solution using pandas library. Suppose we have sample.csv file
import pandas as pd
df = pd.read_csv('sample.csv') # df is now a DataFrame
df['Colum1'] = df['Colum1'] - df['Colum1'].sum() # here we replace the column by subtracting sum of value in the column
print df
df.to_csv('sample.csv', index=False) # save dataframe back to csv file
You can also use map function to do operation to one column, for example,
import pandas as pd
df = pd.read_csv('sample.csv')
col_sum = df['Colum1'].sum() # sum of the first column
df['Colum1'] = df['Colum1'].map(lambda x: x - col_sum)

Categories