Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I've been studying python for data science for about 5 months now. But I get really stucked when it comes to matplotlib. There's always so many options to do anything, and I can't see a well defined path to do anything. Does anyone have this problem too and knows how to deal with it?
I had the same problem sometime back. I just picked the Boston Housing Prices dataset and kept practicing on that. If you work on it enough you will be able to create all types of plots for the EDA and get good practice. Of course after a certain point it can get boring , thats when you jump to a dataset in an area of your interests, in my case it was movie reviews.
Below is the link to the housing prices data.
https://www.kaggle.com/c/house-prices-advanced-regression-techniques
I think your question is stating that you are bored and do not have any projects to make. If that is correct, there are many datasets available on sites like Kaggle that have open-source datasets for practice programmers.
in programming in general " There's always so many options to do anything".
i recommend to you that read library and understand their functions and classes in a glance, then go and solve some problems from websites or give a real project if you can. if your code works do not worry and go ahead.
after these try and error you have a lot of real idea about various problems and you recognize difference between these options and pros and cons of them. like me three years ago.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I just finished reading some books about pandas, numpy & matplotlib and thought about getting some practice.
Unfortunately, I am a bit lost on how to start.
Can anyone recommend a site, which provides csv-files to practice on?
I have found some sites like kaggle or data.gov, but they just have files, which are already cleaned etc.
I'm also open to other ways on how to practice those libraries.
Grateful for every answer.
Best regards
Just Search the Web. Using e.g. Google seach for Pandas tutorial,
Numpy tutorial and so on.
Even the home site of matplotlib contains a couple of introductory
tutorials. See e.g. https://matplotlib.org/tutorials/index.html
A good source of knowledge is also stackoverflow itself.
You could try finding some interesting datasets on: icpsr.umich.edu
In my experience, these sets often contain missing values and values that need formatting, which could be good for practice.
Some datasets are public, but many require will require special credentials. If you are a university student, this often qualifies.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
Does anyone know of any useful resources to learn the structure of python code? When I say structure, I'm referring to the ingredients necessary to make the code correct, so a combination of syntax and the order of terms etc.
For example, to understand the type of a pandas dataframe, df, (I know, I've already identified that it's a dataframe, but just hypothetically), we type: type(df). For the shape, we write df.shape.
So, in the latter, why do we put df first, as in type(df)? Yet in the former, we put df last, and in parenthesis as in type(df). So, mixing these commands up, one may write df.type and expect to get an answer, but they wouldn't. So how do we learn these rules? I have searched high and low for some guidance, but I find very little; a lot of the guides assume you have an appreciation of these rules and don't explain them.
I'm not asking for help on this particular problem. Rather, I'm looking for a resource that would enable me to understand this, and the multitude of other rules better.
I don't have a Computer Science background, so my answer might be unorthodox or incomplete. There are things which come across as inconsistent when you use Python, and these things often trick me when I write new code. So you can easily find yourself looking up the same stuff time and time again.
However, I would say that the more you practice, the less likely it is to make mistakes or be confused. One good tip is to create a OneNote page with all the snippets that usually trip you up. Maybe by going back to that same snippet you will end up saving yourself time and fix those concepts in memory. It will cost you nothing, so give it a try.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
NLTK is something I've been interested for a fair while, but I have been so far unaware of an efficient way to become (relatively) acquainted with NLTK.
I have done some preliminary research, which has resulted in my awareness of two online courses:
statistics.com/nlp-using-nltk/
statistics.com/nlp-using-nltk/
and this ebook:
nltk.org/book/ch00.html
I would appreciate a reccomendation of an online tutorial or course, preferably in video format, that would be a good tool for me to use. If you have used one of the things that I linked to, please give me your impressions, if possible.
I strongly recommend working your way through the NLTK book, chapter by chapter. That's by far the best way to learn how to use NLTK, and it doesn't require much Python knowledge to get started.
Earlier this year, I put together a very basic introduction to NLTK that some people have found useful: A Smattering of NLP in Python.
I'm not aware of any good video tutorials for NLTK.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
One of my web apps receives a lot of article submissions, some of them written not very well. How possible is it to create a tool to recognize "good" vs. "bad" writing just by providing it with a corpora of good and bad articles?
Note that these articles (at least the ones already processed) have been reviewed and graded, so in theory I could use these numbers to confirm output.
I don't have a background in creating "learning" algorithms, so even just a pointer to a foundational book on the subject would be helpful, particularly one written for the Python language.
I think that this would be a difficult learning algorithm to undertake. However, if you did want to have a go or are just interested to learn about the subject Coursera offer a number of free online courses that are worth checking out.
This course is not currently running for assignments etc, but you can watch the lectures in preview mode, from what I have seen this is well suited to beginners:
https://class.coursera.org/machlearning-001/lecture/preview
If you want some practice then I would highly recommend taking a look at Kaggle (http://www.kaggle.com/) which runs open competitions for data science / machine learning problems. Some of the competitions even have sample code to get you started, the titanic competition has some sample code in Python, although the problem being worked on is considerably simpler than the problem you have proposed.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I stumbled upon the wikidump python library, which I think suits me just fine.
I could get by by looking at the source code, but I'm new at python and I don't want to write BS code as the project I need it for is kind of important to me.
I got the 'wiki-SPECIFICDATE-pages-articles.xml.bz2' file and I would need to use that as my source for single article fetching. Can anyone give me some pointers as to properly achieve this or, even better, point at some documentation? I couldn't find any!
(p.s. if you got any better and properly doc'd lib, please tell me)
Not sure if I understand the question, but if you have the Wikipedia dump and you need to parse the wikicode, I would suggest mwparserfromhell lib.
Another powerful framework is Pywikibot, that is the historic framework for bot users on Wikipedia (thus, it has many scripts dedicated to writing pages, instead of reading and parsing articles). It has a lot of documentation (though, sometimes obsolete) and it uses MediaWiki API.
You can use them both, of course: PWB for fetching articles and mwparserfromhell for parsing.