Start free trial Contact Sales

The Search AI Company

Search, Security, Observability

Build tailored experiences with Elastic.

Elastic Search AI Platform overview

Scale your business with Elastic Partners

Partner overview

ELK Stack

Search and analytics, data ingestion, and visualization – all at your fingertips.

ELK Stack overview

By developers, for developers

Elastic Cloud

Unlock the power of real-time insights with Elastic on your preferred cloud provider.

Elastic Cloud overview

Generative AI

Prototype and integrate with LLMs faster using search AI.

Generative AI overview

Search

Discover a world of AI possibilities — built with the power of search.

Search Labs

Search overview

Security

Protect, investigate, and respond to cyber threats with AI-driven security analytics.

Security Labs

Security overview

Observability

Unify app and infrastructure visibility to proactively resolve issues.

Observability Labs

Observability overview

By solution

See how customers search, solve, and succeed — all on one Search AI Platform.

All customer stories

Industries

Exceed customer expectations and go to market faster.

Industries overview

Customer spotlight

Cisco saves 5,000 support engineer hours per month

Sitecore automates 96 percent of security workflows with Elastic

Comcast transforms customer experiences with Elastic Observability

Research

Stay at the forefront of innovation with technical tips from the experts.

Build

Code with other developers to create a better Elastic, together.

Learn

Unleash the possibilities of your data and grow your skill set.

Connect

Keep informed about the latest tech and news from Elastic.

Have questions?

New

The executive guide to generative AI

About us Partners Support|Login

« Installation Machine Learning »

›

Data Frames

Data Frames

eland.DataFrame wraps an Elasticsearch index in a Pandas-like API and defers all processing and filtering of data to Elasticsearch instead of your local machine. This means you can process large amounts of data within Elasticsearch from a Jupyter Notebook without overloading your machine.

>>> import eland as ed
>>> # Connect to 'flights' index via localhost Elasticsearch node
>>> df = ed.DataFrame('http://localhost:9200', 'flights')

# eland.DataFrame instance has the same API as pandas.DataFrame
# except all data is in Elasticsearch. See .info() memory usage.
>>> df.head()
   AvgTicketPrice  Cancelled  ... dayOfWeek           timestamp
0      841.265642      False  ...         0 2018-01-01 00:00:00
1      882.982662      False  ...         0 2018-01-01 18:27:00
2      190.636904      False  ...         0 2018-01-01 17:11:14
3      181.694216       True  ...         0 2018-01-01 10:33:28
4      730.041778      False  ...         0 2018-01-01 05:13:00

[5 rows x 27 columns]

>>> df.info()
<class 'eland.dataframe.DataFrame'>
Index: 13059 entries, 0 to 13058
Data columns (total 27 columns):
 #   Column              Non-Null Count  Dtype
---  ------              --------------  -----
 0   AvgTicketPrice      13059 non-null  float64
 1   Cancelled           13059 non-null  bool
 2   Carrier             13059 non-null  object
...
 24  OriginWeather       13059 non-null  object
 25  dayOfWeek           13059 non-null  int64
 26  timestamp           13059 non-null  datetime64[ns]
dtypes: bool(2), datetime64[ns](1), float64(5), int64(2), object(17)
memory usage: 80.0 bytes
Elasticsearch storage usage: 5.043 MB

# Filtering of rows using comparisons
>>> df[(df.Carrier=="Kibana Airlines") & (df.AvgTicketPrice > 900.0) & (df.Cancelled == True)].head()
     AvgTicketPrice  Cancelled  ... dayOfWeek           timestamp
8        960.869736       True  ...         0 2018-01-01 12:09:35
26       975.812632       True  ...         0 2018-01-01 15:38:32
311      946.358410       True  ...         0 2018-01-01 11:51:12
651      975.383864       True  ...         2 2018-01-03 21:13:17
950      907.836523       True  ...         2 2018-01-03 05:14:51

[5 rows x 27 columns]

# Running aggregations across an index
>>> df[['DistanceKilometers', 'AvgTicketPrice']].aggregate(['sum', 'min', 'std'])
     DistanceKilometers  AvgTicketPrice
sum        9.261629e+07    8.204365e+06
min        0.000000e+00    1.000205e+02
std        4.578263e+03    2.663867e+02

« Installation Machine Learning »

Was this helpful?

Feedback