Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more (by pandas-dev)

Pandas Alternatives

Similar projects and alternatives to Pandas

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better Pandas alternative or higher similarity.

Pandas reviews and mentions

Posts with mentions or reviews of Pandas. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-11-24.
  • Estimation of text complexity
    3 projects | | 24 Nov 2022
    pandas: for faster data processing and manipulation
  • Snowflake and SQLAlchemy tutorial: From installation to example queries
    2 projects | | 23 Nov 2022
    Nowadays, most data professionals working in Python use pandas, a data analysis and manipulation tool centered around the DataFrame. pandas has the built-in capability to translate the results of database queries to a DataFrame, but it requires a connector to a database. While most people are familiar with Open Database Connectivity (ODBC) and Java Database Connectivity (JDBC), you might not know that Snowflake also has its own proprietary connector.
  • Инструменты Python. Библиотеки для анализа данных
    7 projects | | 10 Nov 2022
    - pandas (;
  • In One Minute : Pandas
    6 projects | | 5 Nov 2022
    6 projects | | 5 Nov 2022
    Pandas is a Python library for PANel DAta manipulation and analysis, example: multidimensional time series and cross-sectional data sets commonly found in statistics, experimental science results, econometrics, or finance. Pandas is implemented primarily using NumPy and Cython; it is intended to be able to integrate very easily with NumPy-based scientific libraries, such as statsmodels.
  • SerpApi Demo Project: Walmart Coffee Exploratory Data Analysis
    4 projects | | 25 Oct 2022
    # coffee_types = coffee_df['coffee_type'].fillna(method='ffill') # fill with None coffee_types_new = [] # split by comma and extend the new list with split values # for coffee_type in coffee_types: coffee_types_new.extend(coffee_type.split(','))
  • Machine Learning Pipelines with Spark: Introductory Guide (Part 1)
    5 projects | | 23 Oct 2022
    DataFrames are a Pandas-like, intuitive high-level API for working with data in Spark. It organizes data in a structured and tabular format in rows and columns, similar to a spreadsheet and a relational database management system. If you have worked with Pandas before, you should be familiar with DataFrames.
  • PostgreSQL to DuckDB - There and Quack Again
    3 projects | | 23 Oct 2022
    I built my data pipeline to Extract some data from websites and CSV files, Load it into my database, and Transform it into a reporting-ready schema. I used Python and Pandas to extract and load some of the data and Meltano to load some additional supporting data. All of that data went into a PostgreSQL database hosted in the cloud on Azure where I then used dbt to create data models in the database optimized for reporting. Finally, I use Metabase to visualize the data. (whew! that's a lot of moving parts!)
  • Picking a first programming language
    3 projects | | 14 Oct 2022
    Javascript is used everywhere. It's actually a little frightening since I don't know it as well myself :). But if you're writing code for the web then you're writing Javascript at some point. And of all the different kinds of programming work you could do web development will be the easiest to break into. Python is pretty popular too and cool open-source projects like Django, Pandas, and SQLAlchemy demonstrate its versatility.
  • 13 ways to scrape any public data from any website
    6 projects | | 7 Oct 2022
    Scraping tables is an additinal separate thing that can be done either with parsel or bs4 web scraping libraries. However, pandas simplifies this task a lot by providing a read_html() method that can parse data from the . Installation: $ pip install pandas Enter fullscreen mode Exit fullscreen mode import re dummy_text = ''' Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, discovered the undoubtable source. Lorem Ipsum comes from sections 1.10.32 and 1.10.33 of "de Finibus Bonorum et Malorum" (The Extremes of Good and Evil) by Cicero, written in 45 BC. This book is a treatise on the theory of ethics, very popular during the Renaissance. The first line of Lorem Ipsum, "Lorem ipsum dolor sit amet..", comes from a line in section 1.10.32. ''' dates = re.findall(r'\d{1}\.\d{2}\.\d{2}', dummy_text) # years_bc = re.findall(r'\d+\s?\bBC\b', dummy_text) # print(dates) print(years_bc) # ['1.10.32', '1.10.33', '1.10.32'] # ['45 BC', '45 BC'] Enter fullscreen mode Exit fullscreen mode Practical example using parsel: # import requests, json from parsel import Selector # params = { "query": "minecraft", # search query "where": "web" # web results. works with nexearch as well } # headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/ Safari/537.36" } html = requests.get("", params=params, headers=headers, timeout=30) selector = Selector(html.text) related_results = [] # for index, related_result in enumerate(selector.css(".related_srch .keyword"), start=1): keyword = related_result.css(".tit::text").get().strip() link = f'{related_result.css("a::attr(href)").get()}' related_results.append({ "position": index, # 1,2,3.. "title": keyword, "link": link }) print(json.dumps(related_results, indent=2, ensure_ascii=False)) Enter fullscreen mode Exit fullscreen mode You can create this element and then easily parse table data with pandas read_html() method.Installation: $ pip install bs4 Enter fullscreen mode Exit fullscreen mode A basic example of extracting table data from Wikipedia: import pandas as pd table = pd.read_html('')[0] # [0] = first table df = pd.DataFrame(data=table[['Latest micro version', 'Release date']]) # grabs 2 columns # df.set_index('Latest micro version', inplace=True) # drops default pandas DataFrame indexes, but can't be used in a for loop print(df) for data in df['Latest micro version']: print(data) Enter fullscreen mode Exit fullscreen mode Outputs: Latest micro version Release date 0 0.9.9[2] 1991-02-20[2] 1 1.0.4[2] 1994-01-26[2] 2 1.1.1[2] 1994-10-11[2] 3 NaN 1995-04-13[2] 4 NaN 1995-10-13[2] 5 NaN 1996-10-25[2] 6 1.5.2[42] 1998-01-03[2] 7 1.6.1[42] 2000-09-05[43] 8 2.0.1[44] 2000-10-16[45] 9 2.1.3[44] 2001-04-15[46] 10 2.2.3[44] 2001-12-21[47] 11 2.3.7[44] 2003-06-29[48] 12 2.4.6[44] 2004-11-30[49] 13 2.5.6[44] 2006-09-19[50] 14 2.6.9[27] 2008-10-01[27] 15 2.7.18[32] 2010-07-03[32] 16 3.0.1[44] 2008-12-03[27] 17 3.1.5[52] 2009-06-27[52] 18 3.2.6[54] 2011-02-20[54] 19 3.3.7[55] 2012-09-29[55] 20 3.4.10[56] 2014-03-16[56] 21 3.5.10[58] 2015-09-13[58] 22 3.6.15[60] 2016-12-23[60] 23 3.7.13[61] 2018-06-27[61] 24 3.8.13[62] 2019-10-14[62] 25 3.9.14[63] 2020-10-05[63] 26 3.10.7[65] 2021-10-04[65] 27 3.11.0rc2[66] 2022-10-24[66] 28 NaN 2023-10[64] 29 Legend: Old versionOlder version, still maintainedLate... 30 Italics indicates the latest micro version of ... Italics indicates the latest micro version of ... Enter fullscreen mode Exit fullscreen mode for loop output: 0.9.9[2] 1.0.4[2] 1.1.1[2] nan nan nan 1.5.2[42] 1.6.1[42] 2.0.1[44] 2.1.3[44] 2.2.3[44] 2.3.7[44] 2.4.6[44] 2.5.6[44] 2.6.9[27] 2.7.18[32] 3.0.1[44] 3.1.5[52] 3.2.6[54] 3.3.7[55] 3.4.10[56] 3.5.10[58] 3.6.15[60] 3.7.13[61] 3.8.13[62] 3.9.14[63] 3.10.7[65] 3.11.0rc2[66] nan Legend: Italics indicates the latest micro version of currently supported versions as of 2022-07-11[needs update]. Enter fullscreen mode Exit fullscreen mode Keep in mind that those are just examples and additional data cleaning needs to be applied to make this data usable 🙂 Have a look at the gotchas that could happen with read_html(): Scraping with Regular Expression Scraping with regular expression in Python is possible by re module. Why to scrape data with regular expressions in the first place? if the HTML structure is very, very messy. if there's not CSS selectors and XPath didn't worked also. if the data you want is direclty in the text string. similar reasons to mention above. There're a few main methods that could be used: Method Purpose re.findall() Returns a list of matches. To find all occurences. Returns a first match. To find first occurence. re.match() To find match and the beginning of the string. search() vs match() group() Returns one or more subgroups of the match. Here's a visualization of what is being matched by the regular expressions above: The above regular expressions reads like this: Python Web Scraping Tools In this section, we'll go over most popular Python web scraping tools that can extract data from static and dynamic websites. Python Parsing Libraries There're a few Python web scraping packages/libraries to parse data from websites that are not JavaScript driven as such packages are designed to scrape data from static pages. Parsel Parsel is a library build to extract data from XML/HTML documents with XPath and CSS selectors support, and could be combined with regular expressions. It's usees lxml parser under the hood by default. The great thing I really like about parsel (apart from XPath support) is that it returns None if certain data is not present, so there's no need to create a lot of try/except blocks to the same thing that looks ugly. Installation: $ pip install parsel Enter fullscreen mode Exit fullscreen mode A few examples of extraction methods: variable.css(".X5PpBb::text").get() # returns a text value variable.css(".gs_a").xpath("normalize-space()").get() # variable.css(".gSGphe img::attr(srcset)").get() # returns a attribute value variable.css(".I9Jtec::text").getall() # returns a list of strings values variable.xpath('th/text()').get() # returns text value using xpath Enter fullscreen mode Exit fullscreen mode Code Explanation css() parse data from the passed CSS selector(s). Every CSS query traslates to XPath using csselect package under the hood. ::text or ::attr() extract textual or attribute data from the node. get() get actual data returned from parsel getall() get all a list of matches. .xpath('th/text()') grabs textual data from element BeautifulSoup BeautifulSoup is also a library that build to extract data from HTML/XML documents. It's also could be combined with lxml parser and also can be used in combo with regular expressions. Unlike parsel, BeautifulSoup don't have support for XPath which would pretty handy is some situations. Additionally, it lacks getall() method that returns a list of matches which is shorthand of list comprehesion, and it need a lot of try/except to handle missing data. However, it can create new HTML nodes, for example using wrap() method or other methods for similar things. It's very handy if parts of the data you want to extract not properly structured e.g. HTML table without ... element. A few examples of extraction methods using select() and select_one():'.gs_r.gs_or.gs_scl') # return a list of matches variable.select_one('.gs_rt').text # returns a single text value match variable.select_one('.gs_rt a')['href'] # returns a single attribute value match Enter fullscreen mode Exit fullscreen mode Practical example using BeautifulSoup: # from bs4 import BeautifulSoup import requests, lxml params = { "user": "VxOmZDgAAAAJ", # user-id, in this case Masatoshi Nei "hl": "en", # language "gl": "us", # country to search from "cstart": 0, # articles page. 0 is the first page "pagesize": "100" # articles per page } headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/ Safari/537.36" } all_articles = [] html ="", params=params, headers=headers, timeout=30) soup = BeautifulSoup(html.text, "lxml") for index, article in enumerate("#gsc_a_b .gsc_a_t"), start=1): article_title = article.select_one(".gsc_a_at").text article_link = f'{article.select_one(".gsc_a_at")["href"]}' article_authors = article.select_one(".gsc_a_at+ .gs_gray").text article_publication = article.select_one(".gs_gray+ .gs_gray").text all_articles.append({ "title": article_title, "link": article_link, "authors": article_authors, "publication": article_publication }) Enter fullscreen mode Exit fullscreen mode Python Browser Automation Browser automation is handy when you need to perform some sort of interaction with the webiste, for example scroll, clicks and similar things. Such things could be done without browser automation, this is how we tend to do at SerpApi, however, it could be very complicated but on the flip side the reward is much faster data extraction. Playwright playwright is a modern alternative to selenium. It can perform pretty much all interations as user would do i.e clicks, scrolls and many more. Installation: $ pip install playwright Enter fullscreen mode Exit fullscreen mode Practial example of website interaction using playwright and parsel to extract the data. The following script scrolls through all Google Play app reviews and then extract data: # import time, json, re from parsel import Selector from playwright.sync_api import sync_playwright def run(playwright): page = playwright.chromium.launch(headless=True).new_page() page.goto("") user_comments = [] # if "See all reviews" button present if page.query_selector('.Jwxk6d .u4ICaf button'): print("the button is present.") print("clicking on the button.") page.query_selector('.Jwxk6d .u4ICaf button').click(force=True) print("waiting a few sec to load comments.") time.sleep(4) last_height = page.evaluate('() => document.querySelector(".fysCi").scrollTop') # 2200 while True: print("scrolling..")"End") time.sleep(3) new_height = page.evaluate('() => document.querySelector(".fysCi").scrollTop') if new_height == last_height: break else: last_height = new_height selector = Selector(text=page.content()) page.close() print("done scrolling. Exctracting comments...") for index, comment in enumerate(selector.css(".RHo1pe"), start=1): user_comments.append({ "position": index, "user_name": comment.css(".X5PpBb::text").get(), "app_rating":"\d+", comment.css(".iXRFPc::attr(aria-label)").get()).group(), "comment_date": comment.css(".bp9Aid::text").get(), "developer_comment": { "dev_title": comment.css(".I6j64d::text").get(), "dev_comment": comment.css(".ras4vb div::text").get(), "dev_comment_date": comment.css(".I9Jtec::text").get() } }) print(json.dumps(user_comments, indent=2, ensure_ascii=False)) with sync_playwright() as playwright: run(playwright) Enter fullscreen mode Exit fullscreen mode Selenium selenium is very similar to playwright but a little older with slightly different approaches of doing things. $ pip install selenium Enter fullscreen mode Exit fullscreen mode The following script performs a scroll until hits the very bottom of Google Play search and then extracts each section with games: from selenium import webdriver from import Service from import ChromeDriverManager from import By from import WebDriverWait from import expected_conditions as EC from parsel import Selector import json, time google_play_games = { 'Top charts': { 'Top free': [], 'Top grossing': [], 'Top paid': [] }, } def scroll_page(url): service = Service(ChromeDriverManager().install()) options = webdriver.ChromeOptions() options.add_argument("--headless") options.add_argument("--lang=en") options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/ Safari/537.36") driver = webdriver.Chrome(service=service, options=options) driver.get(url) while True: try: scroll_button = driver.find_element(By.CSS_SELECTOR, '.snByac') driver.execute_script("arguments[0].click();", scroll_button) WebDriverWait(driver, 10000).until(EC.visibility_of_element_located((By.TAG_NAME, 'body'))) break except: driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") WebDriverWait(driver, 10000).until(EC.visibility_of_element_located((By.TAG_NAME, 'body'))) selector = Selector(driver.page_source) driver.quit() return selector def scrape_all_sections(selector): for section in selector.css('.Ubi8Z section'): section_title = section.css('.kcen6d span::text').get() time.sleep(2) google_play_games[section_title] = [] for game in section.css('.TAQqTe'): title = game.css('.OnEJge::text').get() link = '' + game.css('::attr(href)').get() category = game.css('.ubGTjb .sT93pb.w2kbF:not(.K4Wkre)::text').get() rating = game.css('.CKzsaf .w2kbF::text').get() rating = float(rating) if rating else None google_play_games[section_title].append({ 'title': title, 'link': link, 'category': category, 'rating': rating, }) print(json.dumps(google_play_games, indent=2, ensure_ascii=False)) def scrape_google_play_games(): params = { 'device': 'phone', 'hl': 'en_GB', # language 'gl': 'US', # country of the search } URL = f"{params['device']}&hl={params['hl']}&gl={params['gl']}" result = scroll_page(URL) scrape_all_sections(result) if __name__ == "__main__": scrape_google_play_games() Enter fullscreen mode Exit fullscreen mode Python Web Scraping Frameworks Scrapy scrapy is a high-level webscraping framework designed to scrape data at scale and can be used to create a whole ETL pipeline. However, you have to keep in mind that it's bulky, could be quite confusing, and while it provides a lot of things for you, most of that things you may not need. Installation: $ pip install scrapy Enter fullscreen mode Exit fullscreen mode Very simple scrapy script: import scrapy class ScholarAuthorTitlesSpider(scrapy.Spider): name = 'google_scholar_author_titles' def scrapy_request(self): params = { "user": "cp-8uaAAAAAJ", # user-id "hl": "en", # language "gl": "us", # country to search from "cstart": 0, # articles page. 0 is the first page "pagesize": "100" # articles per page } headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/ Safari/537.36" } yield scrapy.Request(url="", method="GET", headers=headers, meta=params, callback=self.parse) def parse(self, response): for index, article in enumerate(response.css("#gsc_a_b .gsc_a_t"), start=1): yield { "position": index, "title": article.css(".gsc_a_at::text").get(), "link": f'{article.css(".gsc_a_at::attr(href)").get()}', "authors": article.css(".gsc_a_at+ .gs_gray::text").get(), "publication": article.css(".gs_gray+ .gs_gray::text").get() } Enter fullscreen mode Exit fullscreen mode XHR Requests XHR request allows to talk to the server by making a request and getting data back in response. It's one of the first things that you can check before writing actual code. Those requests can be used to get data dircetly from website's "source" without the need to use parsing libraries/frameworks. To find a certain XHR request you need: Open browser dev tools (F12) Network Fetch/XHR Refresh page as data may come on page update. Click through every request and see if there's any data you want. If you find the request with the data you want, you can preview the data (example from the How extract data from XHR request When making XHR request, we need to pass URL parameters that server can understand and "reply" to us. Here's a simple illustration of that process: To find those headers and URL query parameters, we need to go to the certain URL and look at Headers and Payload tabs and make sure we see what request method is used (GET, POST, etc). We can do it like so: Copy URL as CURL (Bash) and use it with online CURL runner or tools such as Insomnia. Copy request URL under headers tab. From Insomnia (URL copied from the XHR->Headers tab). 📌Keep in mind that some of the passed URL parameters needs to be scraped and passed to the URL beforehand (before making request to the server/api). URL can have some sort of a unique token or something different and can't be worked without it. If the response is succesfull and you want to make an exact request in the script, those parameters could be automatically genereated with tools such as Insomnia (or other alternatives) where you can test different type of requests with different parameters and headers. Simple example but same approach will be on other websites with or without passing URL parameters and headers: import requests # html = requests.get('').json() print(html['value']) # Once, due to an engine stall of his F-22 Raptor during a Dessert Storm sorte', Chuck Norris had to ejaculate over the Red Sea. Enter fullscreen mode Exit fullscreen mode Page Source This is the next thing that could be checked after Dev Tools -> XHR. It's about looking at page source and trying to find the data there, that are either hidden in the rendered HTML or can't be scraped with selectors because it's being rendered by JavaScript. One of the ways to find if there's the data you want is in the inline JSON or not: select and copy any type of data you want to extract (title, name, etc.) open page source CTRL + U find the data CTRL + F, if some of the occurences will be inside ", # regular expression str("script")[11]), # input from where to search data re.DOTALL, # match any character: )[0] # access `list` from re.finall() ) # convert to `dict` using json.loads() Enter fullscreen mode Exit fullscreen mode After that, we can access it as a dictionary: app_data["basic_info"]["name"] = basic_app_info.get("name") app_data["basic_info"]["type"] = basic_app_info.get("@type") app_data["basic_info"]["url"] = basic_app_info.get("url") Enter fullscreen mode Exit fullscreen mode Full example: from bs4 import BeautifulSoup import requests, lxml, re, json headers = { "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/ Safari/537.36" } params = { "id": "", # app name "gl": "US", # country of the search "hl": "en_GB" # language of the search } # make a request and pass response to BeautifulSoup html = requests.get("", params=params, headers=headers, timeout=30) soup = BeautifulSoup(html.text, "lxml") # where all app data will be stored app_data = { "basic_info": {} } # 👇👇👇 data extraction # [11] index is a basic app information # basic_app_info = json.loads(re.findall(r"", str("script")[11]), re.DOTALL)[0]) app_data["basic_info"]["name"] = basic_app_info.get("name") app_data["basic_info"]["type"] = basic_app_info.get("@type") app_data["basic_info"]["url"] = basic_app_info.get("url") Enter fullscreen mode Exit fullscreen mode Reverse engineering & Debugging The great examples of reverse engineering at our blog: scrape Walmart Search for a specific store reverse engineering Google Finance charts. Make sure to check them both as here we're not going to duplicate the same information. 📌Information about Source and Application tabs is kinda an introductory information as it's a big topics with a lot of steps to reproduce and it will be out of the scope of this blog post. Sources tab One of approaches when something complex needs to be extracted could done under the Source tab. It could be done by degubbing website JS source code from certain files with debugger breakpoints (Dev tools -> sources -> debugger) by trying to trace what is going on in the code and how can we intercept/create by ourselves data and use it the parser. Application tab A similar thing could be done in the Dev tools -> Application tab where we see, for example, cookies data and either intercept it on reverse engineer it by understanding how this cookie was built. Ilya, one of engineers at SerpApi has written in more detail about reverse engineering Location cookies from Walmart and his approach for such task. Links parsel BeautifulSoup lxml requests scrapy playwright selenium SelectorGadget Chrome Extension Join us on Twitter | YouTube
  • A note from our sponsor - Sonar | 1 Dec 2022
    Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work. Learn more →


Basic Pandas repo stats
4 days ago
Truly a developer’s best friend
Scout APM is great for developers who want to find and fix performance issues in their applications. With Scout, we'll take care of the bugs so you can focus on building great things 🚀.