I am looking to take a CSV file and sort the file using python 2.7 to get an individual value based on two columns for a block and lot. My data looks like now in the link below:
Beginning
I want to be able on the lot value to create extra lines using Python to automate this into a new CSV where the values will look like this when drawn out on the new CSV
End Result
So I know that I need read the row and the column and based on the cell value for the lot column if there is a "," then the row will be copied to the next row in the other csv and all the values before the first column will be copied only and then the second, third etc.
After the Commas are separated out, then the ranges will be managed in a similar way in a third CSV. If there is a single value, the whole row will be copied as is.
Thank you for the help in advanced.
This should work.
On Windows open files in binary mode or else you get double new lines.
I assumed rows are separated by ; because cells contains ,
First split by ,, then check for ranges
print line is for debugging
Error checking is left as an exercise for the reader.
Code:
import csv
file_in = csv.reader(open('input.csv', 'rb'), delimiter=';')
file_out = csv.writer(open('output.csv', 'wb'), delimiter=';')
for i, line in enumerate(file_in):
if i == 0:
# write header
file_out.writerow(line)
print line
continue
for j in line[1].split(','):
if len(j.split('-')) > 1:
# lines with -
start = int(j.split('-')[0])
end = int(j.split('-')[1])
for k in xrange(start, end + 1):
line[1] = k
file_out.writerow(line)
print line
else:
# lines with ,
line[1] = j
file_out.writerow(line)
print line
Related
How to find a specific word in a column and write its position in next column in Python.
For example I have a CSV File
And I want to find a "(correct)" word and then put its position in next column, and continue to loop through whole file like this.
In this example, banana has word "(correct)" and is in 3rd in column so G is edited as 3.
And the same goat has the word "(correct)" so we added 4 in G column
Also replace the "(correct)" with empty string after each loop
Without more sample data to test, and if all you want is a really quick solution to get you started... see below.
with open(csvfile) as csvf: # open csv file
lines = csvf.read().split("\n") # split contents into lines
for i, line in enumerate(lines):
row = line.split(",") # split lines into columns
for j, col in enumerate(row):
if "(correct)" in col: # check if keyword in column
row.append(str(j)) # append the row to last column
lines[i] = ','.join(row)
with open(csvfile, 'wt') as csvfw:
csvfw.write('\n'.join(lines)) # write lines back to file.
There is also the csv module in the python stdlib you may want to check out.
https://docs.python.org/3/library/csv.html#module-csv
I am trying to write a CSV file with pythons file.write but some of the index entries are so long that they are making new lines in the CSV file. I am using the .format() method to fill my columns with their respective input in a loop. Ideally I would like the CSV to accept long entries and just change the column width instead of bumping them to a new row.
with tf.Session(config=sess_config) as sess:
...
fclog = open(os.path.join(log_dir, args.fpred + '.csv'), 'w') # Initialize csv with col headers
fclog.write("fname,itp,tp_prob,its,ts_prob\n")
for i in range(len(fname_batch)):
fclog.write(
"{},{},{},{},{}\n".format(fname_batch[i].decode(), picks_batch[i][0][0], picks_batch[i][0][1],
picks_batch[i][1][0], picks_batch[i][1][1]))
...
fclog.close()
The image above is a sample of rows from the resultant csv file. Notice that the first row of entries is not overfull and works as expected. However the second row of entries contains an overfull entry in the tp_prob column and bumps the rest of the entries to a new line. The third row of entries works as expected again.
Thank you!
I discovered that for some reason line breaks \n were being added inside the actual row strings when the data was too long instead of just at the end of the row. To fix this I used the .replace function to replace the line breaks in the row string and then added a single line break back to the end of the row.
with tf.Session(config=sess_config) as sess:
...
fclog = open(os.path.join(log_dir, args.fpred + '.csv'), 'w') # Initialize csv with col headers
fclog.write("fname,itp,tp_prob,its,ts_prob\n")
for i in range(len(fname_batch)):
my_str = "{},{},{},{},{}".format(fname_batch[i].decode(), picks_batch[i][0][0],
picks_batch[i][0][1],
picks_batch[i][1][0],
picks_batch[i][1][1]).replace("\n", "")
fclog.write(my_str + "\n")
...
fclog.close()
I'm sure there is a cleaner way to do this so I welcome more solutions. I also still do not understand why these line breaks are being automatically inserted.
I need to access a .txt file and add up the integers in all of the last columns using an accumulation pattern. I know I've accessed and opened the file correctly, however, it's the aggregation of the last column that's stumped me. The current code is giving me a 0 (while playing around with it, I've run into a few different errors.)
I'm aware that each line is a string and that I need to split the lines into a list of values in order to continue. Any suggestions/help would be extremely helpful.
the_File = open("DoT_Info.txt", "r")
num_accidents = 0
for char in the_File.readlines():
new_splt = char.split(',')
num_accidents += int(new_splt[-1])
print('Total Incidents: ', num_accidents)
the_File.close()
Something like this ought to work, assuming the last element in each row is always the number of accidents.
import csv
with open("DoT_Info.txt", "r") as f:
reader = csv.reader(f)
# next(reader) - do this if there is a header row
num_accidents = sum(int(row[-1]) for row in reader)
print('Total Incidents: ', num_accidents)
I'm new to Python but I need help creating a script that will take in three different csv files, combine them together, remove duplicates from the first column as well as remove any rows that are blank, then change a revenue area to a number.
The three CSV files are setup the same.
The first column is a phone number and the second column is a revenue area (city).
The first column will need all duplicates & blank values removed.
The second column will have values like "Macon", "Marceline", "Brookfield", which will need to be changed to a specific value like:
Macon = 1
Marceline = 8
Brookfield = 4
And then if it doesn't match one of those values put a default value of 9.
Welcome to Stack Overflow!
Firstly, you'll want to be using the csv library for the "reader" and "writer" functions, so import the csv module.
Then, you'll want to open the new file to be written to, and use the csv.writer function on it.
After that, you'll want to define a set (I name it seen). This will be used to prevent duplicates from being written.
Write your headers (if you need them) to the new file using the writer.
Open your first old file, using csv module's "reader". Iterate through the rows using a for loop, and add the rows to the "seen" set. If a row has been seen, simply "continue" instead of writing to the file. Repeat this for the next two files.
To assign the values to the cities, you'll want to define a dictionary that holds the old names as the keys, and new values for the names as the values.
So, your code should look something like this:
import csv
myDict = {'Macon' : 1, 'Marceline' : 8, 'Brookfield' : 4}
seen = set()
newFile = open('newFile.csv', 'wb', newline='') #newline argument will prevent the writer from writing extra newlines, preventing empty rows.
writer = csv.writer(newFile)
writer.writerow(['Phone Number', 'City']) #This will write a header row for you.
#Open the first file, read each row, skip empty rows, skip duplicate rows, change value of "City", write to new file.
with open('firstFile.csv', 'rb') as inFile:
for row in csv.reader(inFile):
if any(row):
row[1] = myDict[row[1]]
if row in seen:
continue
seen.add(row)
writer.writerow(row)
#Open the second file, read each row, skip if row is empty, skip duplicate rows, change value of "City", write to new file.
with open('secondFile.csv', 'rb') as inFile:
for row in csv.reader(inFile):
if any(row):
row[1] = myDict[row[1]]
if row in seen:
continue
seen.add(row)
writer.writerow(row)
#Open the third file, read each row, skip empty rows, skip duplicate rows, change value of "City", write to new file.
with open('thirdFile.csv', 'rb') as inFile:
for row in csv.reader(inFile):
if any(row):
row[1] = myDict[row[1]]
if row in seen:
continue
seen.add(row)
writer.writerow(row)
#Close the output file
newFile.close()
I have not tested this myself, but it is very similar to two different programs that I wrote, and I have attempted to combine them into one. Let me know if this helps, or if there is something wrong with it!
-JCoder96
I need to filter and do some math on data coming from CSV files.
I've wrote a simple Pyhton script to isolate the rows I need to get (they should contain certain keywords like "Kite"), but my script does not work and I can't find why. Can you tell me what is wrong with it? Another thing: once I get to the chosen row/s, how can I point to each (comma separated) column?
Thanks in advance.
R.
import csv
with open('sales-2013.csv', 'rb') as csvfile:
sales = csv.reader(csvfile)
for row in sales:
if row == "Kite":
print ",".join(row)
You are reading the file in bytes. Change the open('filepathAndName.csv, 'r') command or convert your strings like "Kite".encode('UTF-8'). The second mistake could be that you are looking for a line with the word "Kite", but if "Kite" is a substring of that line it will not be found. In this case you have to use if "Kite" in row:.
with open('sales-2013.csv', 'rb') as csvfile: # <- change 'rb' to 'r'
sales = csv.reader(csvfile)
for row in sales:
if row == "Kite": # <- this would be better: if "Kite" in row:
print ",".join(row)
Read this:
https://docs.python.org/2/tutorial/inputoutput.html#reading-and-writing-files
To find the rows than contain the word "Kite", then you should use
for row in sales: # here you iterate over every row (a *list* of cells)
if "Kite" in row:
# do stuff
Now that you know how to find the required rows, you can access the desired cells by indexing the rows. For example, if you want to select the second cell of a row, you simply do
cell = row[1] # remember, indexes start with 0