starting at index 1 of row when using writerow at python - python

i have a script to create matrix of size n and write it to csv file.
i want the matrix to have "boarders" at size of n.
my code:
a = []
firstRow = []
for i in range(n):
row = []
row.append(i+1)
firstRow.append(i+1)
for j in range(n):
row.append(random.randint(x,y))
a.append(row)
writer.writerow(firstRow)
writer.writerows(a)
output when using n = 3
1,2,3
1,74,82,68
2,87,70,72
3,68,71,74
i need the output to be like this:
1, 2, 3
1,74,82,68
2,87,70,72
3,68,71,74
with blank box at the csv index 0,0.
also i need the all matrix to start at row 1 instead of 0

Using pandas we can get the following valid csv with a few lines of easy-to-understand code:
,1,2,3
1,91,66,70
2,82,24,79
3,57,56,73
Example code used:
import pandas as pd
import numpy as np
# Create random numbers 0-99, 3x3
data = np.random.randint(0,100, size=(3,3))
df = pd.DataFrame(data)
# Add 1 to index and columns
df.columns = df.columns + 1
df.index = df.index + 1
#df.to_csv('output.csv') # Uncomment this row to write to file.
print(df.to_csv())
And if you insist that you want to remove the leadning ,:
with open('output.csv', 'w') as f:
f.write(df.to_csv()[1:])

Related

How to parse the log data which is in form of nested [key=value] format using python pandas

I have huge Sensor log data which is in form of [key=value] pair I need to parse the data column wise
i found this code for my problem
import pandas as pd
lines = []
with open('/path/to/test.txt', 'r') as infile:
for line in infile:
if "," not in line:
continue
else:
lines.append(line.strip().split(","))
row_names = []
column_data = {}
max_length = max(*[len(line) for line in lines])
for line in lines:
while(len(line) < max_length):
line.append(f'{len(line)-1}=NaN')
for line in lines:
row_names.append(" ".join(line[:2]))
for info in line[2:]:
(k,v) = info.split("=")
if k in column_data:
column_data[k].append(v)
else:
column_data[k] = [v]
df = pd.DataFrame(column_data)
df.index = row_names
print(df)
df.to_csv('/path/to/test.csv')
the above code is suitable when the data is in form of "Priority=0, X=776517049" but my data is something like this [Priority=0][X=776517049] and there is no separator in between two columns how can i do it in python and i am sharing the link of sample data here raw data and bilow that expected parsed data which i done manually https://docs.google.com/spreadsheets/d/1EVTVL8RAkrSHhZO48xV1uEGqOzChQVf4xt7mHkTcqzs/edit?usp=sharing kindly check this link
I've downloaded as csv.
Since your file has multiple tables on one sheet, I've limited to 100 rows, you can remove that parameter.
raw = pd.read_csv(
"logdata - Sheet1.csv", # filename
skiprows=1, # skip the first row
nrows=100, # use 100 rows, remove in your example
usecols=[0], # only use the first column
header=None # your dataset has no column names
)
Then you can use a regex to extract the values:
df = raw[0].str.extract(r"\[Priority=(\d*)\] \[GPS element=\[X=(\d*)\] \[Y=(\d*)\] \[Speed=(\d*)\]")
and set column names:
df.columns = ["Priority", "X", "Y", "Speed"]
result:
Priority X Y Speed
0 0 776517049 128887449 4
1 0 776516816 128887733 0
2 0 776516816 128887733 0
3 0 776516833 128887166 0
4 0 776517200 128886133 0
5 0 776516883 128885933 8
.....................................
99 0 776494483 128908783 0

How do you order row values from an Excel file in Python with a dictionary?

Let's say I have an excel sheet with 2 rows:
0.296178 0.434362 0.033033 0.758968
0.559323 0.455792 0.780323 0.770423
How could I go about putting each rows values in order from highest to lowest with a dictionary?
For example, the dictionary input for row 1 should look like: {1:[4,2,1,3]} since the 4th value in row 1 is highest and the 3rd value in row 1 is lowest.
(not indexing from 0 due to it being an Excel file)
For that, first, you need a module to import excel files. I recommend pandas, as it is widely used. (install it using 'pip install pandas', if you haven't)
after that use this code:
import pandas as pd
path = r'C:\Users\tanuj\Desktop\temp.xlsx' # replace it with your file path
df = pd.read_excel(path, header = None)
df.head() # to visualise the file
#And then, use this simple logic to get the required dictionary
d = {}
for x in range(df.shape[0]):
temp = {}
values = list(df.iloc[x])
for y in range(len(values)):
temp[df.loc[x][y]] = y+1
l = []
for t in sorted(temp):
l.append(temp[t])
l.reverse()
d[x+1] = l
print(d)
argsort function in numpy will do the trick. Consider this code:
import numpy as np
import pandas as pd
df = pd.read_csv('excel.csv', delimiter=',', header=None)
i = 0
dict = {}
for row in df.values:
arg = np.argsort(row)
iarg = list(map(lambda x: x+1, arg))
iarg.reverse()
dict[i]=iarg
i = i + 1
print(dict)
It reads input data as formatted csv and gives you the desired output.
After Reading Your Question I think you want to Read row values from an excel sheet and store it in dictionary and then want to sort values from dictionary from highest to lowest order...
So First you have to read excel file that store such value for that u can use
openpyxl module
from openpyxl import load_workbook
wb = load_workbook("values.xlsx")
ws = wb['Sheet1']
for row in ws.iter_rows():
print([cell.value for cell in row])
the above code will generate a list of values that are in excel file
In your case:
[0.296178, 0.434362, 0.033033, 0.758968]
[0.559323, 0.455792, 0.780323, 0.770423]
now you have to store it in dictionary and now sort it...
from openpyxl import load_workbook
wb = load_workbook("values.xlsx")
ws = wb['Sheet1']
value_dict={}
n=1
#extracting value from excel
for row in ws.iter_rows():
values=[cell.value for cell in row]
value_dict[n]=values
n=n+1
print(value_dict)
#Sorting Values
for keys,values in value_dict.items():
values.sort(reverse=True)
print("Row "+ str(keys),values)
The Above Code Perform The same task that you want to perform...
Output Image
For each row in a df you can compare each element to a sorted version of that row and get the indexes.
import pandas as pd
a = [0.296178, 0.434362, 0.033033, 0.758968]
b = [0.559323, 0.455792, 0.780323, 0.770423]
df = pd.DataFrame(columns = ['1', '2'], data = zip(a, b)).T
def compare_sort(x):
x = list(x)
y = sorted(x.copy())[::-1]
return([x.index(y[count]) +1 for count, _ in enumerate(y) ])
print(df.apply(compare_sort, axis=1)) # apply func. to each row of df
1 [4, 2, 1, 3]
2 [3, 4, 1, 2]
# Get data by row name
df = df.apply(compare_sort, axis=1)
print(df['1'])
[4, 2, 1, 3]
Useful links.
get-indices-of-items
reverse-a-list

Python 3:what is the best way to iterate for each value of a column?

I am new to Python and would like some advice on what is the simplest way for me to iterate on a given column of data.
My input file looks like this:
Col1,Col2,Col3<br/>
593457863416,959345934754,9456968345233455<br/>
487593748734,485834896965,4958558475345<br/>
694568245543,34857495345,494589589209<br/>
...
What I would like to do is add 100 to all items in column 2. So the output would like this:
Col1,Col2,Col3<br/>
593457863416,959345934854,9456968345233455<br/>
487593748734,485834897065‬,4958558475345<br/>
694568245543,34857495445,494589589209<br/>
...
Here is my code so far:
import csv
with open("C:/Users/r00t/Desktop/test/sample.txt") as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
line_count = 0
output_list = []
for row in csv_reader:
if line_count == 0:
print(f'{", ".join(row)}')
line_count += 1
else:
temp_list = []
output_row = int(row[1])
output_row = output_row + 100
temp_list =[row[0], row[1], row[2]]
output_list = [[row[0], output_row, row[2]]]
print(output_list)
line_count += 1
The code seems not optimal. Is there a way to not specify index for row? What happens when my file has more than 3 columns?
Thank you!
-r
I suggest using csv.DictReader(). Each row will now be in a dictionary, with keys being the column name, and the value being the row value.
You can use Series based value addition. Or you can use location or you can use it in place updation using pandas.
Simplest Way(in pandas)
df["column2"] = df["column2"] + 100
ILocation(in pandas)
df.iloc[:, 1] = df.iloc[:, 1] + 100
Without Pandas
file_read = csv.reader(open('/tmp/test.csv'))
file_data_in_list = list(file_read)
# Since now you have three columns,
# you can just simply go through 1st index and add 1 there
for index in range(len(file_data_in_list):
if index > 0:
file_data_in_list[index][1] += 100 # Adds hundred to each line of 2nd column.
# Now you can use file_data_in_list, it won't require you extra variables and the replacment is in place.
IT is better to use Column based Data Structure for this operations.
Here I have used pandas
import pandas as pd
df = pd.read_csv('C:/Users/r00t/Desktop/test/sample.txt')
# df1 = df+100
edit-1
df['col2'] = df['col2'] + 100
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.add.html
Here is a suggestion on how to do it. Use pandas which is very handy for dealing with data.
import pandas as pd
df = pd.read_csv("sample.txt")
print(df)
# I am basicly getting all the rows of column index 1 (wich is the Col2)
df.iloc[:, 1] = df.iloc[:, 1] + 100
print(df)
# I could also use the column name
df["Col3"] = df["Col3"] + 1

Comparing 2 lists and return mismatches

im struggling with 2 csv files which I have imported
the csv files look like this:
csv1
planet,diameter,discovered,color
sceptri,33.41685587,28-11-1611 05:15, black
...
csv2
planet,diameter,discovered,color
sceptri,33.41685587,28-11-1611 05:15, blue
...
in both csv files, there are the same planets but in a different order and sometimes with different values (a mismatch)
the data for each planet (diameter, discovered and color) has been entered independently. I wanted to Cross-check the two sheets and find all the fields that are mismatched. Then I want to generate a new file that contains one line per error with a description of the error.
for example:
sceptri: mismatch (black/blue)
here is my code so far
with open('planets1.csv') as csvfile:
a = csv.reader(csvfile, delimiter=',')
data_a= list(a)
for row in a:
print(row)
with open('planets2.csv') as csvfile:
b = csv.reader(csvfile, delimiter=',')
data_b= list(b)
for row in b:
print(row)
print(data_a)
print(data_b)
c= [data_a]
d= [data_b]```
thank you in advance for your help!
Assuming the name of planets are correct in both files, here is my proposal
# Working with list of list, which could be get csv file reading:
csv1 = [["sceptri",33.41685587,"28-11-1611 05:15", "black"],
["foo",35.41685587,"29-11-1611 05:15", "black"],
["bar",38.7,"29-11-1611 05:15", "black"],]
csv2 = [["foo",35.41685587,"29-11-1611 05:15", "black"],
["bar",38.17,"29-11-1611 05:15", "black"],
["sceptri",33.41685587,"28-11-1611 05:15", "blue"]]
# A list to contain the errors:
new_file = []
# A dict to check if a planet has already been processed:
a_dict ={}
# Let's read all planet data:
for planet in csv1+csv2:
# Check if planet is already as a key in a_dict:
if planet[0] in a_dict:
# Yes, sir, need to check discrepancies.
if a_dict[planet[0]] != planet[1:]:
# we have some differences in some values.
# Put both set of values in python sets to differences:
error = set(planet[1:]) ^ set(a_dict[planet[0]])
# Append [planet_name, diff.param1, diff_param2] to new_file:
new_file.append([planet[0]]+list(error))
else:
# the planet name becomes a dict key, other param are key value:
a_dict[planet[0]] = planet[1:]
print(new_file)
# [['bar', 38.17, 38.7], ['sceptri', 'black', 'blue']]
The list new_file may be saved as new file, see Writing a list to file
I'd suggest using Pandas for a task like this.
Firstly, you'll need to read the csv contents into dataframe objects. This can be done as follows:
import pandas as pd
# make a dataframe from each csv file
df1 = pd.read_csv('planets1.csv')
df2 = pd.read_csv('planets2.csv')
You may want to declare names for each column if your CSV file doesn't have them.
colnames = ['col1', 'col2', ..., 'coln']
df1 = pd.read_csv('planets1.csv', names=colnames, index_col=0)
df2 = pd.read_csv('planets2.csv', names=colnames, index_col=0)
# use index_col=0 if csv already has an index column
For the sake of reproducible code, I will define dataframe objects without a csv below:
import pandas as pd
# example column names
colnames = ['A','B','C']
# example dataframes
df1 = pd.DataFrame([[0,3,6], [4,5,6], [3,2,5]], columns=colnames)
df2 = pd.DataFrame([[1,3,1], [4,3,6], [3,6,5]], columns=colnames)
Note that df1 looks like this:
A B C
---------------
0 0 3 6
1 4 5 6
2 3 2 5
And df2 looks like this:
A B C
---------------
0 1 3 1
1 4 3 6
2 3 6 5
The following code compares dataframes, concatenate the comparison to a new dataframe, and then saves the result to a CSV:
# define the condition you want to check for (i.e., mismatches)
mask = (df1 != df2)
# df1[mask], df2[mask] will replace matched values with NaN (Not a Number), and leave mismatches
# dropna(how='all') will remove rows filled entirely with NaNs
errors_1 = df1[mask].dropna(how='all')
errors_2 = df2[mask].dropna(how='all')
# add labels to column names
errors_1.columns += '_1' # for planets 1
errors_2.columns += '_2' # for planets 2
# you can now combine horizontally into one big dataframe
errors = pd.concat([errors_1,errors_2],axis=1)
# if you want, reorder the columns of `errors` so compared columns are next to each other
errors = errors.reindex(sorted(errors.columns), axis=1)
# if you don't like the clutter of NaN values, you can replace them with fillna()
errors = errors.fillna('_')
# save to a csv
errors.to_csv('mismatches.csv')
The final result looks something like this:
A_1 A_2 B_1 B_2 C_1 C_2
-----------------------------
0 0 1 _ _ 6 1
1 _ _ 5 3 _ _
2 _ _ 2 6 _ _
Hope this helps.
This kind of problem can be solved by sorting the rows from the csv files, and then comparing the corresponding rows to see if there are differences.
This approach uses a functional style to perform the comparisons and will compare any number of csv files.
It assumes that the csvs contain the same number of records, and that the columns are in the same order.
import contextlib
import csv
def compare_files(readers):
colnames = [next(reader) for reader in readers][0]
sorted_readers = [sorted(r) for r in readers]
for gen in [compare_rows(colnames, rows) for rows in zip(*sorted_readers)]:
yield from gen
def compare_rows(colnames, rows):
col_iter = zip(*rows)
# Be sure we're comparing the same planets.
planets = set(next(col_iter))
assert len(planets) == 1, planets
planet = planets.pop()
for (colname, *vals) in zip(colnames, col_iter):
if len(set(*vals)) > 1:
yield f"{planet} mismatch {colname} ({'/'.join(*vals)})"
def main(outfile, *infiles):
with contextlib.ExitStack() as stack:
csvs = [stack.enter_context(open(fname)) for fname in infiles]
readers = [csv.reader(f) for f in csvs]
with open(outfile, 'w') as out:
for result in compare_files(readers):
out.write(result + '\n')
if __name__ == "__main__":
main('mismatches.txt', 'planets1.csv', 'planets2.csv')

Finding same values in a row of csv in python

I have a code that looks for numbers within a csv file that are within 1.0 decimal places of each other in the same row. Although, when I run it, it prints everything. Not just rows that have the condition that I want i.e. that the values from the 2nd and 3rd column be within 1.0 of each other. I want to run the code and have it display, the first column (the time at which it was recorded or better yet the column number), the 2nd and the 3rd column because they should be within 1.0 of each other. This is what the data file looks like:
Time Chan1 Chan2
04:07.0 52.31515503 16.49450684
04:07.1 23.55230713 62.48802185
04:08.0 46.06217957 24.94955444
04:08.0 41.72077942 31.32516479
04:08.0 19.80723572 25.73182678
Here's my code:
import numpy as np
from matplotlib import *
from pylab import *
filename = raw_input("Enter file name: ") + '.csv'
filepath = '/home/david/Desktop/' + filename
data = np.genfromtxt(filepath, delimiter=',', dtype=float)
first=[row[0] for row in data]
rownum1=[row[1] for row in data]
rownum2=[row[2] for row in data]
for row in data:
if ((abs(row[1]-row[2]) <= 1.0)):
print("The values in row 0 are 1 and 2, are within 1.0 of each other.", first, rownum1, rownum2)
This is my output:
26.3460998535, 44.587371826199998, 42.610519409200002, 24.7272491455, 89.397918701199998, 25.479614257800002, 30.991180419900001, 25.676086425800001
But I want this as an output:
4:09.0, 23.456, 22.5
You can do that like this:
data = np.genfromtxt(filepath, names=True, dtype=None)
idx = np.abs(data['Chan1'] - data['Chan2'])<1
print data[idx]

Categories