Skip to content

index.find() tries to reshape and fails #1822

@nikhilmakan02

Description

@nikhilmakan02

Initial Checks

  • I have read and followed the docs and still think this is a bug

Description

Apologies the title of this is not the best. I have a very odd case and can't seem to understand what is causing it. I have also failed at recreating the issue in a simpler example.

I have a Doc List where each document has been built with the same process however the data is obviously different for each doc. I am using the hnswlib backend.

The issue I have is after I built the doc list with no issues I then try to run a .find() on the individual elements of the doc list, some of which fail and some don't. The error I get on some of these can be seen in the traceback below.

Code Snippet:

class AddressDoc(BaseDoc):
    ELID: int
    FULL_ADDRESS: str
    EMBEDDINGS: NdArray[768]

def build_doc_list(data):
    st = time.time()
    dl = DocList[AddressDoc](
            AddressDoc(
                ELID=0000000,
                FULL_ADDRESS="",
                EMBEDDINGS=d["EMBEDDINGS"],
            )
            for d in data
    )
    logger.info(f"Doc list created... {time.time()-st}")
    return dl

doc_index = HnswDocumentIndex[AddressDoc](work_dir=db_path)
dl = build_doc_list(data)

# This works!
results = doc_index.find(dl[2], search_field="EMBEDDINGS", limit=1)

# This doesn't!
results = doc_index.find(dl[3], search_field="EMBEDDINGS", limit=1)

type(dl[2].EMBEDDINGS) == type(dl[3].EMBEDDINGS) # returns True
type(dl[2].EMBEDDINGS.shape) == type(dl[3].EMBEDDINGS.shape) # returns True

I have compared dl[2] and dl[3] left right and center and can't understand what the issue is. The embeddings array in both documents are the same shape which I have checked with numpy (.shape, .ndims, .size). I can't understand what the difference is between the two that causes the error below.

Traceback below:

File /usr/local/lib/python3.11/site-packages/docarray/index/abstract.py:503, in BaseDocIndex.find(self, query, search_field, limit, **kwargs)
    [501](file:///usr/local/lib/python3.11/site-packages/docarray/index/abstract.py?line=500)     query_vec = query
    [502](file:///usr/local/lib/python3.11/site-packages/docarray/index/abstract.py?line=501) query_vec_np = self._to_numpy(query_vec)
--> [503](file:///usr/local/lib/python3.11/site-packages/docarray/index/abstract.py?line=502) docs, scores = self._find(
    [504](file:///usr/local/lib/python3.11/site-packages/docarray/index/abstract.py?line=503)     query_vec_np, search_field=search_field, limit=limit, **kwargs
    [505](file:///usr/local/lib/python3.11/site-packages/docarray/index/abstract.py?line=504) )
    [507](file:///usr/local/lib/python3.11/site-packages/docarray/index/abstract.py?line=506) if isinstance(docs, List) and not isinstance(docs, DocList):
    [508](file:///usr/local/lib/python3.11/site-packages/docarray/index/abstract.py?line=507)     docs = self._dict_list_to_docarray(docs)

File /usr/local/lib/python3.11/site-packages/docarray/index/backends/hnswlib.py:328, in HnswDocumentIndex._find(self, query, limit, search_field)
    [324](file:///usr/local/lib/python3.11/site-packages/docarray/index/backends/hnswlib.py?line=323) def _find(
...
--> [197](file:///usr/local/lib/python3.11/site-packages/docarray/typing/tensor/ndarray.py?line=196)     return cls._docarray_from_native(x.reshape(source.shape))
    [198](file:///usr/local/lib/python3.11/site-packages/docarray/typing/tensor/ndarray.py?line=197) elif len(source.shape) > 0:
    [199](file:///usr/local/lib/python3.11/site-packages/docarray/typing/tensor/ndarray.py?line=198)     return cls._docarray_from_native(np.zeros(source.shape))

ValueError: cannot reshape array of size 768 into shape (768,768)

Example Code

No response

Python, DocArray & OS Version

0.39.0

Affected Components

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions