71 Commits

Author SHA1 Message Date
Ben Eggers
317d547d7a Fix failing example terraform (#1175)
## Description of changes

`lifecycle` blocks don't allow variables. Right now our example
deployment for AWS doesn't work. @tazarov has a fix for this and a few
other things in https://github.com/chroma-core/chroma/pull/1173 but I'd
like to get the basic fix out before the weekend.

## Test plan
local `terraform init` failed before, now works.

## Documentation Changes
*Are all docstrings for user-facing APIs updated if required? Do we need
to make documentation changes in the [docs
repository](https://github.com/chroma-core/docs)?*
2023-09-22 15:49:29 -07:00
Jeff Huber
7d412aef8c [ENH] initial CLI (#1032)
This proposes an initial CLI

The CLI is installed when you installed `pip install chromadb`.

You then call the CLI with

`chroma run --path <persist_dir> --port <port>` where path and port are
optional.

This also adds `chroma help` and `chroma docs` as convenience links -
but I'm open to removing those.

To make this easy - I added `typer` (by the author of FastAPI). I'm not
sure this is the tool that we want to commit to for a fuller featured
CLI, but given the extremely minimal footprint of this - I don't think
it's a one way door.

<img width="1477" alt="Screenshot 2023-08-23 at 4 59 54 PM"
src="https://github.com/chroma-core/chroma/assets/891664/30374228-d303-41e1-8e9e-188b7f8532d4">

***

#### TODO
- [x] test in fresh env - i think i need to add `typer` as a req
- [ ] consider expanding the test to make sure the service is actually
running
- [x] hide the test option from the typer UI
- [x] linking to a getting started guide could be interesting at the top
of the logs
2023-09-11 20:49:25 -07:00
Trayan Azarov
2dd5a15526 [ENH] Added auth and external volume support for GCP (#1107)
## Description of changes

*Summarize the changes made by this PR.*
 - Improvements & Bug fixes
	 - Added external volume for Chroma data
         - Bumped to the latest version (0.4.9)
         - Added auth by default
- Made the template more configurable via variables with sensible
defaults

## Test plan
*How are these changes tested?*

- Tested with terraform

## Documentation Changes
The update contains README with docs.
2023-09-11 08:19:57 -07:00
Trayan Azarov
2ae7b70b9a [ENH]: AWS Deployment (#1086)
Refactored and Rebased version of #1059

## Description of changes

This template includes the following:

- Create a security group with required ports open (22 and 8000)
- Create EC2 instance with Ubuntu 22 and deploy Chroma using docker
compose
- Create a data volume (ESB) for Chroma data
- Mount the data volume to the EC2 instance
- Format the data volume with ext4
- Start Chroma
- Enable (by default) Token Auth with randomly generated token

## Test plan
*How are these changes tested?*

- Terraform tests performed

## Documentation Changes
The template contains README with a tutorial on how to use it.
2023-09-05 19:24:28 +00:00
Trayan Azarov
6dd2d4af0b [ENH]: CIP-4: In and Not In Metadata Filters (#1081)
Cherry-picked from #1029

## Description of changes

*Summarize the changes made by this PR.*
 - Improvements & Bug fixes
	 - Added support for `$in` and `$nin` metadata filters

> Note: See CIP in `docs/` or example notebook for more info

## Test plan
*How are these changes tested?*

- [x] Tests pass locally with `pytest` for python

## Documentation Changes
TBD

---------

Co-authored-by: Hammad Bashir <HammadB@users.noreply.github.com>
2023-09-05 10:42:01 -07:00
Trayan Azarov
35a367e02e [ENH]: Auth Providers - Static API Token (#1051)
Refs: #1027

## Description of changes

*Summarize the changes made by this PR.*
 - New functionality
        - Baseline functionality and tests implemented
        - Example notebook updated
- Minor refactor on the client creds provider to allow for user specific
credentials fetching.

## Test plan
*How are these changes tested?*

- [x] Tests pass locally with `pytest` for python (regression)
- [x] New fixtures added for token-based auth

## Documentation Changes
Docs should be updated to highlight the new supported auth method.
2023-08-28 16:51:41 -07:00
Trayan Azarov
48700dd07f [ENH] CIP-2: Auth Providers Proposal (#986)
## Description of changes

*Summarize the changes made by this PR.*
 - New functionality
	 - Auth Provide Client and Server Side Abstractions
	 - Basic Auth Provider

## Test plan
Unit tests for authorized endpoints

## Documentation Changes
Docs should change to describe how to use auth providers on the client
and server. CIP added in `docs/`
2023-08-22 19:48:55 -07:00
Trayan Azarov
a014870f40 Where filtering with logical operators - updated examples (#966)
## Description of changes

*Summarize the changes made by this PR.*
 - Improvements & Bug fixes
- Added enhanced examples of how to use `where` filtering with logical
operators based on community questions

## Test plan
Run the jupyter notebook
`examples/basic_functionality/where_filtering.ipynb`

## Documentation Changes
No document updates.
2023-08-13 05:18:46 +00:00
Trayan Azarov
68e8cda3b4 feat: Improved GCP deployment instructions (#951)
- The project will now clone the chroma repo and deploy docker-compose
from it
- Introduced a new TF var chroma_release to be able to deploy a any (oh
well almost any, chroma version, defaults to 0.4.5)
- Added output instance_public_ip var to print out the IP address of the
VM

Refs: #950

## Description of changes

*Summarize the changes made by this PR.*
 - Improvements & Bug fixes
- Made it possible to deploy any version of chroma through TF var
chroma_release (defaults to 0.4.5)
- Made it so that the VM would clone the repo and checkout the release
tag then run docker compose from it
	 - Added output of the public IP of the VM after TF deployment
	 - Improved slightly the README.md

## Test plan
*How are these changes tested?*

```bash
gcloud auth application-default login
cd examples/deployments/google-cloud-compute
terraform init
export TF_VAR_project_id=<your_project_id> #take note of this as it must be present in all of the subsequent steps
export TF_VAR_chroma_release=0.4.5 #set the chroma release to deploy
terraform apply -auto-approve
terraform output instance_public_ip
export instance_public_ip=$(terraform output instance_public_ip)
curl -v http://$instance_public_ip:8000/api/v1
terraform destroy -auto-approve
```

## Documentation Changes
*Are all docstrings for user-facing APIs updated if required? Do we need
to make documentation changes in the [docs
repository](https://github.com/chroma-core/docs)?*
2023-08-08 19:40:47 +00:00
Anton Troynikov
c387e1d601 New starter notebook (#881)
## Description of changes

This PR creates a new starter notebook intended to familiarize people
with the very basic, core functionality of embedding retrieval with
Chroma. It's self-contained, and hopefully straightforward and easy to
understand.

There is also a minor fix to the experimental notebook. 

## Test plan

Ran the notebook, also via Colab. 

## Documentation Changes
None. 

## TODO
- [x] https://github.com/chroma-core/chroma/issues/880 Canonical 'chat
with your documents'
2023-08-03 22:36:06 +00:00
Hammad Bashir
46de47945a SQLite Release PR (#808)
## Description of changes
Base PR to release sqlite refactor, which spans many stacked PRs.

Remaining
- [x] Merge this to main
- [x] Layered Persistent Index #761 
- [x] Remove old impls (In #781 )
- [x] Remove persist() API (In #787)
- [x] Add telemetry to SegmentAPI, it was not included. (#788)
- [x] New clients #805 
- [x] locking and soak tests for thread-safety 
- [x] Migration tool
- [x] Fix #739 
- [x] Fix metadata None vs empty
- [x] Fix persist directory (addressed in #761)
- [x] Leave files open in #761 (merge stacked PR)

Post Release
- [ ] Un xfail cross version tests once we cut the release
- [x] Documentation updates for new silent ADD failure.
- [x] Update all documentation for new API instantiation
- [x] Update all documentation for settings changes
- [ ] Update terraform deployment
- [ ] Update cloudformation deployment

---------

Co-authored-by: Luke VanderHart <luke@vanderhart.net>
Co-authored-by: Jeffrey Huber <jeff@trychroma.com>
Co-authored-by: Anton Troynikov <atroyn@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sebastian Sosa <37946988+CakeCrusher@users.noreply.github.com>
Co-authored-by: Russell Pollari <russell@sharpestminds.com>
Co-authored-by: russell-pollari <pollarir@mgail.com>
2023-07-17 14:21:34 -07:00
Jeff Huber
055d6cf6b2 cohere (#743)
Cohere Examples in JS and Python
2023-07-04 13:31:51 -07:00
Jeff Huber
666bfc40f3 Examples folder refactor (#736)
Reorganizes the examples folder and adds guidelines and a scaffold to
flesh it out
2023-06-28 10:26:36 -07:00
Charaf Rezrazi
9294cb4790 fix(gcp): fixed terraform definitions for gcp example (#632)
## Description of changes

*Summarize the changes made by this PR.*
 - Improvements & Bug fixes
- Terraform definition was not correct, and Docker containers would not
boot up
	 - Missing firewall definition to the outside world

## Test plan
*How are these changes tested?*
Run
```angular2html
terraform apply -var="project_id=<your_project_id> -auto-approve"
```


## Documentation Changes
*Are all docstrings for user-facing APIs updated if required? Do we need
to make documentation changes in the [docs
repository](https://github.com/chroma-core/docs)?*
2023-06-23 09:33:13 -07:00
Marvin Yan
9cb8df94e9 Add task instruction pairing to InstructorEmbeddingFunction (#556)
## Description of changes

- Add an optional `instruction` constructor parameter to
InstructorEmbeddingFunction to allow `instruction` and Document pairs to
be encoded.

## Test plan


## Documentation Changes
Added examples to the Alternative Embedding notebook.

Not sure if this is a good implementation, since you'll need a separate
Collection for each instruction you want to use (or reassign
`self._instruction`), but at least the change is pretty minimal. For my
use case, two instructions are enough (one for storing, one for
retrieving). For a scenario where you need lots of different
instructions, perhaps "Represent the <Science|Financial|Political|etc.>
article: ", another solution is needed.

Feature Request #546

---------

Co-authored-by: Jeffrey Huber <jeff@trychroma.com>
2023-05-19 15:06:46 -07:00
Jeff Huber
4e5f0168e2 Merge pull request #321 from fr0th/alternative-embeddings-example
Add example demonstrating using openai & cohere embeddings
2023-05-11 16:45:37 -07:00
fr0th
83547738ff examples/alternative_embeddings.ipynb 2023-04-11 17:47:09 +12:00
fr0th
a480bd4dcf add the extra embeddings WITHOUT additional files 2023-04-11 17:31:15 +12:00
Alvaro Molina
1ed0f35a22 Remove sudo usage from Google Cloud deployment example (#259)
* remove usage of sudo

* upgrade example with new chroma version
2023-04-10 15:20:32 -07:00
Paul Kiage
1f7958bd85 Change spelling of Srart to Start (#310)
chroma/examples/local_persistence.ipynb
2023-04-10 12:03:27 -07:00
Shivankar Pilligundla
9f4178b090 updated n_results parameter in example query (#319) 2023-04-10 10:02:45 -07:00
fr0th
ab78b15b00 Add example demonstrating using openai & cohere embeddings 2023-04-10 19:48:37 +12:00
Alw3ys
84c72473fc WIP: Google Cloud Compute deployment example 2023-03-23 21:48:47 +01:00
Hammad Bashir
f6aa386164 Add filtering example (#159) 2023-02-22 07:49:36 -08:00
Anton Troynikov
d2d7590d9a Persistence example notebook (#154)
* Example notebook

* Title
2023-02-20 20:59:28 -08:00
Jeffrey Huber
54c9401f87 more cleanup 2023-02-10 12:56:46 -08:00
Jeffrey Huber
0e803a937f many changes 2023-02-08 06:23:23 -08:00
Jeffrey Huber
743d5a9807 get_api -> init 2023-02-07 06:21:08 -08:00
Anton Troynikov
9d8436159f API Grooming + Colab notebook (#108)
* Cleaner user method signatures

* Set model_space automatically on process if not supplied.

* New MNIST notebook

* Display MNIST digits we select

* Removed cruft, more docu

* Return 1k results by default

* Updated notebook working dir.

* cwd is not a real command
2022-12-13 12:25:02 -08:00
Jeffrey Huber
1a669d36b4 clean up old examples from pre-refactor 2022-12-02 14:34:59 -08:00
atroyn
4e38e78a75 Scoped index to only training set + minor fixes 2022-12-02 16:24:40 -06:00
Jeffrey Huber
55b007523b switch to arm, we will want to make this a flag 2022-12-01 22:14:26 -08:00
Jeffrey Huber
55240e1236 working with small batch size 2022-12-01 17:57:20 -08:00
Jeffrey Huber
6887c74817 i dont know what is happening 2022-12-01 16:34:02 -08:00
Jeffrey Huber
0678abc1bc bump 2022-12-01 15:37:56 -08:00
Jeffrey Huber
48f040bef2 working wtihout boundary_uncertainty working 2022-12-01 15:36:49 -08:00
Jeffrey Huber
792e072812 add mnist notebook example 2022-11-30 23:47:11 -08:00
Jeffrey Huber
f1f914532e get running with docker' 2022-11-30 23:36:03 -08:00
Luke VanderHart
d9dfd089fb major refactorings WIP 2022-11-24 00:17:24 -05:00
Jeffrey Huber
a553800bb1 add a few missing renames 2022-11-18 10:22:40 -08:00
Jeff Huber
bb563965d5 Merge pull request #80 from chroma-core/jeff/backup-restore
Clickhouse/Index backup and restore scripts
2022-11-18 10:10:32 -08:00
Jeffrey Huber
15915107f4 fix celery 2022-11-18 09:20:25 -08:00
Jeffrey Huber
48cab3be86 alert for some things that they arent supported by in-memory yet 2022-11-16 12:22:49 -08:00
Jeffrey Huber
1a606340d6 rework how processing happens, other cleanup 2022-11-14 14:40:56 -08:00
Jeffrey Huber
d9c47f48ac category_name -> inference_class 2022-11-14 12:53:40 -08:00
Jeffrey Huber
5ab3d29e21 embedding_data -> embedding 2022-11-14 12:51:23 -08:00
Jeffrey Huber
5357608f9e log -> add 2022-11-14 12:43:42 -08:00
Jeffrey Huber
fd0e472c9a many small improvements 2022-11-14 11:13:21 -08:00
Jeffrey Huber
b579099e0d wip 2022-11-14 08:57:14 -08:00
Jeffrey Huber
18544c2d91 wip 2022-11-09 11:50:14 -08:00