I have a problem according to taking and fetching input from a CSV file, let's say for example I have a CSV file as follows:-
product_name,category
canon camera,DSLR
hikvision camera,security
cp plus camera,security
The above is my CSV file now I have to write a python script from which I can sort product_names according to the category.
I just want that I have some list as securitys, DSLRs, etc. I just want that there should be 's' after the category value as securitys etc. and they all must be a list and in securitys list, the values must be like product names hikivison camera, cp plus camera, etc..and DSLRs list value is canon camera, and I want it dynamic means in future if I add some more categories along with their product_names I can dynamically get more lists according to their categories + 's' and the list contains the value of product names which fall in the category.
I tried too many times with different approaches but it is not working.
need serious help!
Thank you!
Related
i am creating a text based game in python. in this, i will be using a CSV file to store the different tiles on the map. i would like to know what code i would need to essentially request the 'co-ordinates' of the map tile.
for example, if i was to create a tile with the co-ordinates x = 5, y = 6; it would store the information (GRASS1S2s1w, for example) in the 5th column and the sixth row.
i would also like to know how to call the specific cell in which the data is stored.
any alternate ways of doing this (not CSV) will be ignored. this is for a school project and i am too far through to change from CSV (i would have to change a lot of words in my plan.)
note: GRASS1S2I3Sc means 'grass tile' (GRASS), "stone" (1S), "scrap" (2S) and "wood" (1W)
Make a 2d list containing all the information. That way you can access a value of a specific coordinate like
list[x][y]
Then save the list with csv.writer
You can read the existing csv file as a list similarly to access the info.
I am attempting to write a python program for a personalised recommendation service for books based on a similarity algorithm, where recommendations are made based on ratings of a number of books from other users/readers.
I want to write a section of the program that reads data from two input files:
books.txt which includes a list of 55 books in an author,title format, one entry per line. I want convert this file into a list in the form;
[["Author", "title"], [...]]
The second file ratings.txt includes usernames to represent users of the service followed by a list of 55 integers, each representing a rating for each book from books.txt, in the same order. The file is structured using the following format:
user_a\n user_a_rating_1 ... user_a_rating_55 \n
And I want to convert the file into a dictionary in the form;
{"username":[0, 1, 2, 3], "user":[ratings...]}
Any suggestions or help would be greatly appreciated!
I would definitely check out pandas and the read_table function. That puts each time into a data frame and from there you can call the to_dict method on each column if you need to.
I have a csv file that is very big, containing a load of different people. Some of these people come up twice. Something like this:
Name,Colour,Date
John,Red,2017
Dave,Blue,2017
Tom,Blue,2017
Amy,Green,2017
John,Red,2016
Dave,Green,2016
Tom,Blue,2016
John,Green,2015
Dave,Green,2015
Tom,Blue,2015
Rebecca,Blue,2015
I want a csv file that contains only the most recent colour for each person. For example, for John, Dave, Tom and Amy I am only interested in the row for 2017. For Rebecca I will need the value from 2015.
The csv file is huge, containing over 10 million records (all people have a unique ID so repeated names don't matter). I've tried something along the lines of the following:
Open csv file
Read line 1.
If person is not in "seen" list, add to csv file 2
Add person to "Seen" list.
Read line 2...
The problem is the "seen" list gets massive and I run out of memory. The other issue is sometimes the dates are not in order so an old entry gets into the "seen" list and then the new entry won't overwrite it. This would be easy to solve if I could sort the data by descending date, but I'm struggling to sort it with the size of the file.
Any suggestions?
If the whole csv file can be stored in a list like:
csv_as_list = [
(unique_id, color, year),
…
]
then you can sort this list by:
import operator
# first sort by year descending
csv_as_list.sort(key=operator.itemgetter(2), reverse=True)
# then, since the Python sort is stable, by unique_id
csv_as_list.sort(key=operator.itemgetter(0))
and then you can:
from __future__ import print_function
import operator, itertools
for unique_id, group in itertools.groupby(csv_as_list, operator.itemgetter(0)):
latest_color = next(group)[1]
print(unique_id, latest_color)
(I just used print here, but you get the gist.)
If the csv file cannot be loaded in-memory as a list, you'll have to go through an intermediate step that uses disk (e.g. SQLite).
Open your csv file to read.
Read line by line, append user to final_list if his ID is not already found in there. If it is found, check the year of your current_data, with your final_list data. If the current data has a more recent entry, just change the date of your user in final_list, along with the color associated with it.
Only then, when your final_list is done, will you write a new csv file.
If you want this task to be faster, you want to...
Optimize your loops.
Use standard python functions and/or libraries coded in C.
If this is still not optimized enough... learn C. Reading a csv file in C, parsing it with a separator, and iterating through an array is not hard, even in C.
I see two obvious ways to solve this that don't involve keeping huge amounts of data in memory:
Use a database instead of CSV files
Reorganise your CSV files to facilitate sorting.
Using a database is fairly straightforward. I expect you could could even use the SQLite that comes with Python. This would be my preferred option, I think. To get the best performance, create an index of (person, date).
The second involves letting the first column of your CSV file be the person ID and the second column be the date. Then you could sort the CSV file from the commandline, i.e. sort myfile.csv. This will group all entries for a particular person together, and provided your date is in a proper format (e.g. YYYY-MM-DD), the entry of interest will be the last one. The Unix sort command is not known for its speed, but it's very robust.
I have 100 csv files, each contains publication data of different institutions and I would like to perform the same manipulation on all of them:
1.Get the Institution name from cell B1. This is always after 'at' or 'at the'. For example 'Publications at Tohoku University'
2.Vlookup the matching InstitutionCode from another csv file called 'Codes'.
For example '1286'. (for Tohoku University).
3.Delete rows 1-14 (including the Insitution name in cell B1)
4. Insert two extra columns (column A and B) to the file with he following headers: 'Institution' and 'InstitutionCode' and fill it with the relevant information for all rows where I have data.
(In the above example Tohoku University and 1286).
I am new to Python and find it hard to put together this script from the resources I have found.
Can anyone please help me?
Below is image of the data in original format
Below is the image of the result required
I could give you the code, but instead, I'll explain to you how you can write it yourself.
Read the Codes file and store the institutions and codes in a dictionary.
You can read more about reading csv files here: https://pymotw.com/2/csv/ or here: https://pymotw.com/3/csv/.
Each row will be represented as a list of strings, so you can access cell elements by their index. Make the Institution names the keys and the codes the values.
Read the csv files one by one in a for loop. I'll call these the input files. Open a new file for writing for each input file that you read. I'll call these the output files.
Loop over the rows in the csv file. You can keep track of the row numbers by using enumerate. You can find info on this here for example: http://book.pythontips.com/en/latest/enumerate.html.
Get the contents of cell B1 by taking element 1 from row 0.
Find the Institution name by using a regular expression. More info here for example: http://dev.tutorialspoint.com/python/python_reg_expressions.htm
And get the Institution code from the dictionary you made in step 1.
Keep looping over the rows, until the first element equals 'Title'. This row contains the headers. Write "Institution" and "InstitutionCode" to the output file, followed by the headers you just found. To do this, convert your row (a list of strings) to a tuple (http://www.tutorialspoint.com/python/python_tuples.htm) and give that as an argument to the writerow method of the csv writer object (see the links in step 1).
Then for each row after the header row, make a tuple of the Institution name and code, followed by the information from the row from the input file you just read, and give that as an argument to the writerow method of the csv writer object.
Close output file.
One thing to think about is whether you want quotes around the cell contents in the output files. You can read about this in the links in step 1. The same goes for the field delimiters. If you don't specify anything, they are assumed to be commas, but you can change this.
I hope this helps!
My preference would be for this to be in Python since I am working on learning more. If you can provide help in bash that would still be helpful, though.
I've looked around Stack Overflow and found some helpful things but not enough for me to finish this.
I have two CSV files with some shared fields. The data is not INT. I would like to join based on matching 3 specific fields and write it out to a new output.csv when all the processing is done.
sourceA.csv looks like this:
fieldname_1,fieldname_2,fieldname_3,fieldname_4,fieldname_5,fieldname_6,fieldname_7,fieldname_8,fieldname_9,fieldname_10,fieldname_11,fieldname_12,fieldname_13,fieldname_14,fieldname_15,fieldname_16
sourceB.csv looks like this:
fieldname_4,fieldname_5,fieldname_OTHER,fieldname_8,fieldname_16
As you can see, sourceB.csv has 4 field names that are also in sourceA.csv and one field name that does not. The data in fieldname_OTHER will need to replace the data in sourceA[fieldname_6].
The whole process should go like this:
Replace data in sourceA[fieldname_6] with data from sourceB[fieldname_OTHER] if all of the following criteria are met:
data in sourceA[fieldname_4]=sourceB[fieldname_4]
data in sourceA[fieldname_8]=sourceB[fieldname_8]
data in sourceA[fieldname_16]=sourceB[fieldname_16]
(The data in sourceB[fieldname_5] does not need to be evaluated.)
If the above criteria aren't met, just replace sourceA[fieldname_6] with the text ANY.
Write each processed line out to output.csv.
A sample of what I would like the output to be based on the input CSVs and processing outlined above:
dataA,dataB,dataC,dataD,dataE,dataOTHER,dataG,dataH,dataI,dataJ,dataK,dataL,dataM,dataN,dataO,dataP
I hope the details I've provided haven't made it more confusing than it needs to be. Thank you for all your help!
I'm not sure I'd bother with SQL for a one-off merger like this. It's straightforward in python.
Read in both files with the csv module, to get two lists. Index sourceA into a dictionary whose key is the tuple of fields that need to be matched. You can then loop over sourceB, find the matching row instantly, and merge into it from sourceB.
When you're done, you can just output the list you read from sourceA: the dict and the list point to the same values, which you've now updated.